Efficient Infinite Context Transformers #ai #machinelearning #research #llms #science

Elvis Saravia · Beginner ·🧠 Large Language Models ·2y ago

Skills: LLM Foundations90%Prompt Craft60%

Key Takeaways

The video discusses a Google research paper on Efficient Infinite Context Transformers, which integrates compressive memory into a vanilla dot-product attention layer to enable Transformer LLMs to process infinitely long inputs with bounded memory footprint and computation. The proposed Infini-att attention technique incorporates a compressive memory module into a vanilla attention mechanism.

Full Transcript

hi everyone so I have a new paper here and this is a very exciting paper by Google that integrates compressive memory into a vanilla. product attention layer the goal of this approach is to enable Transformer large language models to effectively process infinitely long inputs with Bound in memory footprint and computation so they propose a new attention technique called infin attention which incorporates a compressive memory module into a vanilla attention mechanism it builds in both mask local attention and long-term linear attention into a single Transformer block this allows the infinity Transformer model to efficiently handle both long and short range contextual dependencies this approach will perform SpaceTime models on Long context language moding with a4x compression ratio of memory they also show that a 1 billion large language model can naturally scale to 1 million sequence length and a 8 billion parameter model achieves a new sorta result on a 500K length book summarization task so given how important long context large language moldes are becoming today having an effective memory system could unlock powerful reasoning planning continual adaption and capabilities not seen before in large language models feel free to like and comment if you want to see more of these short summaries see you in the next one

Original Description

Very exciting paper by Google that integrates compressive memory into a vanilla dot-product attention layer. The goal is to enable Transformer LLMs to effectively process infinitely long inputs with bounded memory footprint and computation. They propose a new attention technique called Infini-attention which incorporates a compressive memory module into a vanilla attention mechanism... Paper: https://arxiv.org/abs/2404.07143 #chatgpt #ai #llms #tutorial #programming

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Elvis Saravia · Elvis Saravia · 30 of 60

← Previous Next →

101 ways to solve search (by Pratik Bhavsar)

101 ways to solve search (by Pratik Bhavsar)

TLDR Generation of Scientific Documents | ML Interview #1 with Isabel Cachola

TLDR Generation of Scientific Documents | ML Interview #1 with Isabel Cachola

Sentiment Analysis: Key Milestones, Challenges and New Directions

Sentiment Analysis: Key Milestones, Challenges and New Directions

Discriminative Adversarial Search for Abstractive Summarization (by Thomas Scialom)

Discriminative Adversarial Search for Abstractive Summarization (by Thomas Scialom)

Question Understanding: COVID-Q: 1,600+ Questions about COVID-19

Question Understanding: COVID-Q: 1,600+ Questions about COVID-19

Getting Started with NLP

Getting Started with NLP

Building tools and frameworks for large-scale social media mining (by Dr. Juan M. Banda)

Building tools and frameworks for large-scale social media mining (by Dr. Juan M. Banda)

TextAttack: A Framework for Data Augmentation and Adversarial Training in NLP

TextAttack: A Framework for Data Augmentation and Adversarial Training in NLP

Dive into Deep Learning (Study Group): Introduction to Deep Learning | Session 1

Dive into Deep Learning (Study Group): Introduction to Deep Learning | Session 1

Dive into Deep Learning (Study Group): Multilayer Perceptrons | Session 4

Dive into Deep Learning (Study Group): Multilayer Perceptrons | Session 4

How I read and annotate ML papers

How I read and annotate ML papers

Keep Learning ML (Session 1) | DSV, CompLex, Modern tools for emotions

Keep Learning ML (Session 1) | DSV, CompLex, Modern tools for emotions

Dive into Deep Learning (Study Group): Preliminaries | Session 2

Dive into Deep Learning (Study Group): Preliminaries | Session 2

Keep Learning ML #2 | Language-conditioned policy learning, Effective ML Testing, EagerPy

Keep Learning ML #2 | Language-conditioned policy learning, Effective ML Testing, EagerPy

Dive into Deep Learning (Study Group): Linear Neural Networks | Session 3

Dive into Deep Learning (Study Group): Linear Neural Networks | Session 3

Dive into Deep Learning (Study Group): Multilayer Perceptrons | Session 4

Dive into Deep Learning (Study Group): Multilayer Perceptrons | Session 4

Keep Learning ML #3 | Contrastively Trained Structured World Models

Keep Learning ML #3 | Contrastively Trained Structured World Models

Dive into Deep Learning (Study Group): Deep Learning Computation with PyTorch | Session 5

Dive into Deep Learning (Study Group): Deep Learning Computation with PyTorch | Session 5

Dive into Deep Learning (Study Group): Convolutional Neural Networks | Session 6

Dive into Deep Learning (Study Group): Convolutional Neural Networks | Session 6

Dive into Deep Learning (Study Group): Modern CNNs | Session 7

Dive into Deep Learning (Study Group): Modern CNNs | Session 7

101 ways to solve neural search with Jina

101 ways to solve neural search with Jina

(Hopefully-Reusable) Life Lessons for PhD Students in NLP

(Hopefully-Reusable) Life Lessons for PhD Students in NLP

How to save the world and forward your career in 5 easy steps | Women in NLP Talks

How to save the world and forward your career in 5 easy steps | Women in NLP Talks

Prompt Engineering Overview

Prompt Engineering Overview

Getting Started with the OpenAI Playground

Getting Started with the OpenAI Playground

LM-Guided Chain of Thought

LM-Guided Chain of Thought

Elements of a Prompt

Elements of a Prompt

Reasoning with Intermediate Revision and Search with LLMs #chatgpt #ai #llms #science #programming

Reasoning with Intermediate Revision and Search with LLMs #chatgpt #ai #llms #science #programming

General Tips for Designing Prompts

General Tips for Designing Prompts

Efficient Infinite Context Transformers #ai #machinelearning #research #llms #science

Efficient Infinite Context Transformers #ai #machinelearning #research #llms #science

Best Practices and Lessons Learned on Synthetic Data for Language Models #ai #machinelearning #genai

Best Practices and Lessons Learned on Synthetic Data for Language Models #ai #machinelearning #genai

Reducing Hallucinations in Structured Outputs via RAG #chatgpt #ai #llms #programming

Reducing Hallucinations in Structured Outputs via RAG #chatgpt #ai #llms #programming

Basic Prompt Examples for LLMs

Basic Prompt Examples for LLMs

LLM In Context Recall is Prompt Dependent #llms #ai #chatgpt #machinelearning

LLM In Context Recall is Prompt Dependent #llms #ai #chatgpt #machinelearning

Zero-shot Prompting Explained

Zero-shot Prompting Explained

RAG Faithfulness #llms #ai #gpt4

RAG Faithfulness #llms #ai #gpt4

Understanding LLM Settings

Understanding LLM Settings

Llama 3 is here! | First impressions and thoughts

Llama 3 is here! | First impressions and thoughts

Llama 3 is Here! #ai #llms #llama3

Llama 3 is Here! #ai #llms #llama3

Microsoft introduces Phi-3 | The most capable small language model?

Microsoft introduces Phi-3 | The most capable small language model?

Microsoft introduces Phi-3! #ai #llms #microsoft

Microsoft introduces Phi-3! #ai #llms #microsoft

Make Your LLM Fully Utilize the Context #ai #llms #machinelearning

Make Your LLM Fully Utilize the Context #ai #llms #machinelearning

When to Retrieve? #ai #llms #machinelearning

When to Retrieve? #ai #llms #machinelearning

Training an LLM to effectively use information retrieval

Training an LLM to effectively use information retrieval

State-of-the-art open-source LLM judges #ai #machinelearning #gpt4

State-of-the-art open-source LLM judges #ai #machinelearning #gpt4

Better and Faster LLMs via Multi-token Prediction

Better and Faster LLMs via Multi-token Prediction

AlphaMath Almost Zero #ai #science #machinelearning

AlphaMath Almost Zero #ai #science #machinelearning

SWE-Agent | An LLM-based Software Engineering Agent

SWE-Agent | An LLM-based Software Engineering Agent

[LLM NEWS] AlphaFold 3, xLSTM, OpenAI's Model Spec, DeepSeek-V2, OpenDevin CodeAct 1.0

[LLM NEWS] AlphaFold 3, xLSTM, OpenAI's Model Spec, DeepSeek-V2, OpenDevin CodeAct 1.0

LLM-powered tool for web scraping #ai #chatgpt #engineering

LLM-powered tool for web scraping #ai #chatgpt #engineering

Learn about LLMs in this NEW course #ai #chatgpt #engineering

Learn about LLMs in this NEW course #ai #chatgpt #engineering

[LLM NEWS] KANs, Gemma 10M Context, OpenAI Updates?, Automatic Prompt Engineering, Tokenizer Arena

[LLM NEWS] KANs, Gemma 10M Context, OpenAI Updates?, Automatic Prompt Engineering, Tokenizer Arena

[LLM News] GPT4-o, Project Astra, Veo, Copilot+ PCs, Gemini 1.5 Flash, Chameleon

[LLM News] GPT4-o, Project Astra, Veo, Copilot+ PCs, Gemini 1.5 Flash, Chameleon

Enhancing Answer Selection in LLMs #ai #machinelearning #engineering

Enhancing Answer Selection in LLMs #ai #machinelearning #engineering

On exploring LLMs #ai #promptengineering #chatgpt

On exploring LLMs #ai #promptengineering #chatgpt

Transformers Can Do Arithmetic with the Right Embeddings #ai #machinelearning #engineering

Transformers Can Do Arithmetic with the Right Embeddings #ai #machinelearning #engineering

[LLM News] xAI Series B, Codestral, LLM Guide, AutoGen Course, Symbolic Chain-of-Thought

[LLM News] xAI Series B, Codestral, LLM Guide, AutoGen Course, Symbolic Chain-of-Thought

PR-Agent #ai #gpt4 #software

PR-Agent #ai #gpt4 #software

Extracting features from Claude 3 Sonnet

Extracting features from Claude 3 Sonnet

Has prompt engineering been solved?

Has prompt engineering been solved?

The video discusses a new attention technique called Infini-att, which enables Transformer LLMs to process infinitely long inputs with bounded memory footprint and computation. This approach has the potential to unlock powerful reasoning, planning, and continual adaptation capabilities in large language models. The Infini-att technique incorporates a compressive memory module into a vanilla attention mechanism, allowing for efficient handling of both long and short range contextual dependencies.

Key Takeaways

Read the Google research paper on Efficient Infinite Context Transformers
Understand the Infini-att attention technique and its components
Implement the Infini-att technique in a Transformer LLM
Evaluate the performance of the Infini-att technique on long context language modeling tasks
Optimize the memory footprint and computation of the LLM using the Infini-att technique

💡 The Infini-att attention technique has the potential to revolutionize long context language modeling by enabling Transformer LLMs to process infinitely long inputs with bounded memory footprint and computation.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Auto-generating meeting notes with Python

Auto-generating meeting notes with Python

Related AI Lessons

Sub-10ms AI Workflows: Accelerating sim.ai with On-Device Semantic Search using Moss

Learn how to accelerate AI workflows with on-device semantic search using Moss, achieving sub-10ms response times and improving user experience

Medium · Machine Learning

Stop Guessing: Guaranteed Structured Output from LLMs in Node.js

Learn to guarantee structured output from LLMs in Node.js and stop parsing JSON manually

Dev.to · Hardik Mehta

Spring AI Tutorial — Your First REST Endpoint with OpenAI (2026)

Build a REST endpoint with Spring Boot 3 and OpenAI to create an LLM-powered API, leveraging the power of AI in your applications

Notes: Memory, Context, and Large Language Models (LLMs)

Learn how memory and context work in Large Language Models (LLMs) and potential improvements

Dev.to · Vladimir Panov

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)