LangChain v0.1.0 Launch: Streaming

LangChain · Beginner ·🧠 Large Language Models ·2y ago

Skills: LLM Foundations90%LLM Engineering85%Prompt Craft80%Fine-tuning LLMs70%Agent Foundations70%

Key Takeaways

LangChain v0.1.0 supports streaming of responses from language models, enabling improved UX in generative AI applications, and provides tools for streaming intermediate results and actions taken by agents and language models. The platform utilizes LangChain expression language, retrieval augmented generation (RAG), and agent executors to facilitate streaming, filtering, and logging of intermediate steps and results.

Full Transcript

one of the most common and most important ux paradigms that we see with generative AI applications is streaming often times the calls to the language models can take a while and so having some response to the user that streams as it's coming out is really really important to let them know that stuff is actually happening this becomes even more important when you have chains and sequences and agents that that take a a bunch of calls um some of them to a I some of them to language models and so being able to show that intermediate work again in a streaming manner is really really important so as we've built Lang chain 0.1 um we've really really focused on making streaming a big component of it a lot of that comes from Lang chain expression language which we covered in an earlier video so Lang chain expression language when you create objects with it it exposes a common interface that interface has a few methods that that are really really important for streaming so one is the stream method this streams back tokens um another is the async stream method so if you want to use it in an async setting and then a third is the async streaming of intermediate steps so this is really useful for complex chains agents and and bigger things that you guys are building we've really focused on making sure that whatever chain whatever dag you create with Lang train expression language will have streaming at the end including if you're doing things like parsing outputs into a specific format which can be a little bit tricky so there are a few different things that I want to show off and so I've created a a notebook to kind of walk through a few of them so here's just basic streaming so if we create a really simple chain that's just a prompt um into a model into an output parser we can stream back responses um using the stream method um and we can see that it gets printed out another thing that we've focused on is streaming when you're doing things in parallel um so here uh we can run two chains in parallel one is telling a joke one is writing a poem it's about the same topic so we'll create those individual chains and then we'll create this parallel chain that runs those in parallel and we can stream back the response and we can see that we've got some tokens from poem some tokens from joke they're intermingled um and so we again the the parallelism happen happens uh through Lang chain expression language naturally but then we're focused on making sure that the streaming experience you can get back those results and do things with them so here I have just a really simple um logic that basically looks at what key getting returned and builds up a dictionary um so we can see that over time um we start to build up this dictionary of different uh uh of poem and joke and the different responses that it has and so if we wanted to display this in a UI somewhere um that would be really easy to to do the next thing that I want to highlight is the stream log method so the stream log method is really useful when you want to return some of the intermediate steps and this is really useful when you have intermediate steps that are interesting and potentially take uh a while or cause the chain to take a while so an example of this is with rag so with rag or retrieval augment in generation you have a question you then look up relevant documents and then you pass those documents and the question into an llm and get back a final response and so here it's often useful to stream the or or return the intermediate steps namely the uh documents that you fetch so that you can show them to the user so that you can show the user that some work is being done all of that so here we can create a really simple rag method and we'll cover we'll cover uh retrieval augmented generation in another video but here we'll create a really simple rag where we have a retriever we have a prompt we then have this chain and this chain requires context and question and then we create this other chain that that wraps this chain and it just adds in this context so we'll create this uh rag example and first let's take a look at what streaming looks like um so here we can do what is lsmith and we can stream back a result it takes a little bit um and again it takes a while to even get started that's because there's the search call to the retriever that's happening so if we want to see more of the information as it gets streamed back we can use a stream log so if we do this we start getting back a lot of information and so and that's because it's logging all the steps that happen and so some of these steps provide really valuable information so here we have this uh uh docs so here we call we named this retriever doc so we gave it a run name docs that we easily identify it so we have this output and this is the documents that it's fetching um from the search engine that we're we're using so that's really handy but there's also this other information that's not as handy and so one thing that we can do is basically uh use include names to filter Things based on their name so here we only stream things back from docs or from final output so docs will give us the immediate results um and and and we get those because we specified it with included names and we want those because those are the documents that we fetched we always get the final output results um and those we want because those are the tokens from the language model um and and we'll always get those with the path final output there is a lot more resources on streaming with rag in particular under the use case section so if you go to QA with rag you'll notice uh that we have a a few different um a a few different sources for doing things like streaming sources um uh adding chat history which also involves some streaming B um and other things like that the last thing I want to highlight with streaming is around agents so with agents um there's a few complicating factors first agents they call actions and it's unknown how many actions they will call they could call one they could call zero maybe they'll call five um and and so that that uh places a lot more emphasis on the importance of knowing what those actions are and often times agent to take a while as well so one thing that we've done is we've made it so that the thing that's returned by the agent executor and we'll cover agents in a separate video so if you don't understand the exact specifics of what I'm talking about there'll be a separate video going into what an agent executed is but basically we've made it so that the agent executor when you stream that it Returns the actions that are taken not the tokens and so let's take a look at what that looks like so we can create a simple agent here and then if we stream it um you can see that we get back first uh this action which is saying uh call to villy search results Json what's the weather in San Francisco um we then get back this result from the the agent that has this step this is the result from tavil we then call what's the weather in Los Angeles we then get back the result of what's the weather in Los Angeles and then we get back the final output by the way if we wanted to see what this looks like in Lang Smith we easily could this is what it would look like um you've got the call to the language model first you can click on it you can see that you get back this function call um whether in San Francisco you can then see the result of this function call you can then see the call to the llm it has this function call in it but it realizes it needs to make another one what's the weather in La so it calls that um and then you can see the call to the language model at the end um and it generates a output um down here so that's how you can stream back the intermediate results of an agent which is really important so that you can show the user what steps are being taken um you can also stream the tokens um so here um what we can do is we can set streaming equals to true in the llm and so this is important we have to do this and then we can call a stream log in the agent executor we can then filter to things that are open Ai and that's because this LM is named open AI and then we can start printing out the thing so here we have the streamed output um when it's first occurring um and so here we can start to see that it's building up this function call um thing um so we can start to see that it's slowly building up the function call where we get to San Francisco weather in San Francisco it's this is the query so it's so it's building up um this function call and then if we scroll to the end and um we can see that it's streaming out the the um the final response that it gives so um we can see here it starts here I'm sorry but I couldn't find the current weather in San Francisco so by using this aam log method um we can get back the results of the to at a at a token level for agents um and so you'll Noti that it also combines again uh the uh the the search results as well so we get back both and so we have more information on this as well so if you go to agents um and uh then in the howto guides we have streaming um and so we cover this more heavily here streaming is a really important ux for a lot of llm applications we've put a lot of emphasis on making sure that Lang chain is really really good at streaming let us know if you run into any issues

Original Description

Streaming is an important UX consideration for LLM applications. We've put a lot of work into making sure streaming works for your chains and agents. Jupyter Notebook (to follow along): https://github.com/hwchase17/langchain-0.1-guides/blob/master/streaming.ipynb JavaScript Notebook: https://github.com/bracesproul/langchainjs-0.1-guides/blob/main/streaming.ipynb Links: Streaming with LCEL: https://python.langchain.com/docs/expression_language/interface#stream Streaming with RAG: https://python.langchain.com/docs/use_cases/question_answering/streaming Streaming with Agents: https://python.langchain.com/docs/modules/agents/how_to/streaming

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from LangChain · LangChain · 41 of 60

← Previous Next →

Chat With Your Documents Using LangChain + JavaScript

Chat With Your Documents Using LangChain + JavaScript

LangChain SQL Webinar

LangChain SQL Webinar

LangChain "OpenAI functions" Webinar

LangChain "OpenAI functions" Webinar

LangSmith Launch

LangSmith Launch

LangChain x Pinecone: Supercharging Llama-2 with RAG

LangChain x Pinecone: Supercharging Llama-2 with RAG

LangChain Expression Language

LangChain Expression Language

Building LLM applications with LangChain with Lance

Building LLM applications with LangChain with Lance

Benchmarking Question/Answering Over CSV Data

Benchmarking Question/Answering Over CSV Data

LangChain "RAG Evaluation" Webinar

LangChain "RAG Evaluation" Webinar

Fine-tuning in Your Voice Webinar

Fine-tuning in Your Voice Webinar

Tabular Data Retrieval

Tabular Data Retrieval

Building an LLM Application with Audio by AssemblyAI

Building an LLM Application with Audio by AssemblyAI

Superagent Deepdive Webinar

Superagent Deepdive Webinar

Lessons from Deploying LLMs with LangSmith

Lessons from Deploying LLMs with LangSmith

Shortwave Assistant Deepdive Webinar

Shortwave Assistant Deepdive Webinar

Cognitive Architectures for Language Agents

Cognitive Architectures for Language Agents

Effectively Building with LLMs in the Browser with Jacob

Effectively Building with LLMs in the Browser with Jacob

Data Privacy for LLMs

Data Privacy for LLMs

"Theory of Mind" Webinar with Plastic Labs

"Theory of Mind" Webinar with Plastic Labs

LangChain Templates

LangChain Templates

Using Natural Language to Query Postgres with Jacob

Using Natural Language to Query Postgres with Jacob

Building a Research Assistant from Scratch

Building a Research Assistant from Scratch

Benchmarking RAG over LangChain Docs

Benchmarking RAG over LangChain Docs

Skeleton-of-Thought: Building a New Template from Scratch

Skeleton-of-Thought: Building a New Template from Scratch

Benchmarking Methods for Semi-Structured RAG

Benchmarking Methods for Semi-Structured RAG

LangSmith Highlights: Getting Started

LangSmith Highlights: Getting Started

LangSmith Highlights: Debugging

LangSmith Highlights: Debugging

LangSmith Highlights: Datasets

LangSmith Highlights: Datasets

LangSmith Highlights: Evaluation

LangSmith Highlights: Evaluation

LangSmith Highlights: Human Annotation

LangSmith Highlights: Human Annotation

LangSmith Highlights: Monitoring

LangSmith Highlights: Monitoring

LangSmith Highlights: Hub

LangSmith Highlights: Hub

SQL Research Assistant

SQL Research Assistant

Getting Started with Multi-Modal LLMs

Getting Started with Multi-Modal LLMs

Build a Full Stack RAG App With TypeScript

Build a Full Stack RAG App With TypeScript

Auto-Prompt Builder (with Hosted LangServe)

Auto-Prompt Builder (with Hosted LangServe)

LangChain v0.1.0 Launch: Introduction

LangChain v0.1.0 Launch: Introduction

LangChain v0.1.0 Launch: Observability

LangChain v0.1.0 Launch: Observability

LangChain v0.1.0 Launch: Integrations

LangChain v0.1.0 Launch: Integrations

LangChain v0.1.0 Launch: Composability

LangChain v0.1.0 Launch: Composability

LangChain v0.1.0 Launch: Streaming

LangChain v0.1.0 Launch: Streaming

LangChain v0.1.0 Launch: Output Parsing

LangChain v0.1.0 Launch: Output Parsing

LangChain v0.1.0 Launch: Retrieval

LangChain v0.1.0 Launch: Retrieval

LangChain v0.1.0 Launch: Agents

LangChain v0.1.0 Launch: Agents

Build and Deploy a RAG app with Pinecone Serverless

Build and Deploy a RAG app with Pinecone Serverless

Hosted LangServe + LangChain Templates

Hosted LangServe + LangChain Templates

LangGraph: Intro

LangGraph: Intro

LangGraph: Agent Executor

LangGraph: Agent Executor

LangGraph: Chat Agent Executor

LangGraph: Chat Agent Executor

LangGraph: Human-in-the-Loop

LangGraph: Human-in-the-Loop

LangGraph: Dynamically Returning a Tool Output Directly

LangGraph: Dynamically Returning a Tool Output Directly

LangGraph: Respond in a Specific Format

LangGraph: Respond in a Specific Format

LangGraph: Managing Agent Steps

LangGraph: Managing Agent Steps

LangGraph: Force-Calling a Tool

LangGraph: Force-Calling a Tool

LangGraph: Multi-Agent Workflows

LangGraph: Multi-Agent Workflows

Streaming Events: Introducing a new `stream_events` method

Streaming Events: Introducing a new `stream_events` method

Building a web RAG chatbot: using LangChain, Exa (prev. Metaphor), LangSmith, and Hosted Langserve

Building a web RAG chatbot: using LangChain, Exa (prev. Metaphor), LangSmith, and Hosted Langserve

Open Source RAG with Nomic's New Embedding Model (and ChromaDB and Ollama)

Open Source RAG with Nomic's New Embedding Model (and ChromaDB and Ollama)

LangGraph: Persistence

LangGraph: Persistence

LangChain v0.1.0 provides a powerful platform for building streaming-enabled LLM applications, utilizing retrieval augmented generation and agent executors to facilitate improved UX and intermediate result logging. By following the steps outlined in this lesson, users can create simple chains, stream intermediate results, and filter outputs based on model names.

Key Takeaways

Create a simple chain with a prompt, model, and output parser to stream back responses using the stream method
Run two chains in parallel to stream back responses and intermingled tokens from poem and joke
Use logic to build up a dictionary of different responses and keys returned
Use the stream log method to return intermediate steps, such as documents fetched in a RAG process
Create a simple RAG method and use streaming to get results as they become available
Filter results based on name using include names
Create an agent and stream its actions
Use the agent executor to return the actions taken

💡 LangChain's streaming capabilities enable users to see the steps taken by the agent and language model, providing valuable insights into the decision-making process and facilitating improved UX and debugging.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Auto-generating meeting notes with Python

Auto-generating meeting notes with Python

Related AI Lessons

Embeddings Simplified

Learn the basics of embeddings and how they simplify complex data, a crucial concept in AI and ML

I built a tool that cuts Claude/ChatGPT token usage by 97% — here's how it works

Learn how to build a tool that reduces Claude/ChatGPT token usage by 97%, increasing productivity and efficiency in debugging and development

Dev.to · Rohith Matam

Serverless AI in a Browser Tab: Java WebAssembly + Local WebGPU LLMs

Learn to build a serverless AI model in a browser tab using Java WebAssembly and Local WebGPU LLMs for a zero-infrastructure RAG architecture

Dev.to · vishalmysore

Building LSTMs with PyTorch and Lightning AI Part 7: Resuming Training with Checkpoints

Learn to resume LSTM training with checkpoints using PyTorch and Lightning AI, enabling efficient model iteration and development

Dev.to · Rijul Rajesh

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)