The Latest in AI research: The video introduces a powerful new TransformerFAM (Feedback Attention Memory) by https://www.youtube.com/channel/UCK8sQmJBp8GCxrOtXWBpyEA a novel architecture designed to enhance Transformers by incorporating a feedback mechanism that emulates working memory.
Plus the introduction of the new Transformer BSWA (Block Sliding Window Attention).
Based on ring attention by https://www.youtube.com/channel/UCwbsWIWfcOL2FiUZ2hKNJHQ
This design allows the Transformer to maintain awareness of its own latent representations across different blocks of data, improving its ability to process indefinitely long sequences without additional computational overhead. Unlike traditional Transformers that suffer from quadratic complexity with sequence length, TransformerFAM operates with linear complexity, making it better suited for handling extensive data sequences efficiently.
TransformerFAM integrates seamlessly with existing pretrained models and does not introduce new weights, which facilitates the retention and compression of past information within a feedback loop across sequence blocks. This enables the model to manage long-term dependencies effectively, thus enhancing performance on tasks requiring extensive context awareness. The architecture’s feedback loop mimics biological neural networks' mechanisms, proposing a scalable solution to the limitations of current Transformer models regarding long sequence data processing.
00:00 3 videos on infinity context length
01:19 Visualization of new transformerFAM
02:58 Pseudocode for two new transformer
04:14 Basics of Attention calculations
07:00 TransformerBSWA - Block Sliding Window Attention
12:15 TransformerFAM - Feedback Attention Memory
14:47 Symmetries in operational feedback code
20:09 Time series visualization of new FAM and BSWA
23:24 Outlook on Reasoning w/ TransformerFAM
all rights w/ Authors:
https://arxiv.org/pdf/2404.09173.pdf
TransformerFAM: Feedback attention is working memory
#airesearch
#ai
5 Comments