LangChain v0.1.0 Launch: Streaming

LangChain · Beginner ·🧠 Large Language Models ·2y ago

Key Takeaways

LangChain v0.1.0 supports streaming of responses from language models, enabling improved UX in generative AI applications, and provides tools for streaming intermediate results and actions taken by agents and language models. The platform utilizes LangChain expression language, retrieval augmented generation (RAG), and agent executors to facilitate streaming, filtering, and logging of intermediate steps and results.

Full Transcript

one of the most common and most important ux paradigms that we see with generative AI applications is streaming often times the calls to the language models can take a while and so having some response to the user that streams as it's coming out is really really important to let them know that stuff is actually happening this becomes even more important when you have chains and sequences and agents that that take a a bunch of calls um some of them to a I some of them to language models and so being able to show that intermediate work again in a streaming manner is really really important so as we've built Lang chain 0.1 um we've really really focused on making streaming a big component of it a lot of that comes from Lang chain expression language which we covered in an earlier video so Lang chain expression language when you create objects with it it exposes a common interface that interface has a few methods that that are really really important for streaming so one is the stream method this streams back tokens um another is the async stream method so if you want to use it in an async setting and then a third is the async streaming of intermediate steps so this is really useful for complex chains agents and and bigger things that you guys are building we've really focused on making sure that whatever chain whatever dag you create with Lang train expression language will have streaming at the end including if you're doing things like parsing outputs into a specific format which can be a little bit tricky so there are a few different things that I want to show off and so I've created a a notebook to kind of walk through a few of them so here's just basic streaming so if we create a really simple chain that's just a prompt um into a model into an output parser we can stream back responses um using the stream method um and we can see that it gets printed out another thing that we've focused on is streaming when you're doing things in parallel um so here uh we can run two chains in parallel one is telling a joke one is writing a poem it's about the same topic so we'll create those individual chains and then we'll create this parallel chain that runs those in parallel and we can stream back the response and we can see that we've got some tokens from poem some tokens from joke they're intermingled um and so we again the the parallelism happen happens uh through Lang chain expression language naturally but then we're focused on making sure that the streaming experience you can get back those results and do things with them so here I have just a really simple um logic that basically looks at what key getting returned and builds up a dictionary um so we can see that over time um we start to build up this dictionary of different uh uh of poem and joke and the different responses that it has and so if we wanted to display this in a UI somewhere um that would be really easy to to do the next thing that I want to highlight is the stream log method so the stream log method is really useful when you want to return some of the intermediate steps and this is really useful when you have intermediate steps that are interesting and potentially take uh a while or cause the chain to take a while so an example of this is with rag so with rag or retrieval augment in generation you have a question you then look up relevant documents and then you pass those documents and the question into an llm and get back a final response and so here it's often useful to stream the or or return the intermediate steps namely the uh documents that you fetch so that you can show them to the user so that you can show the user that some work is being done all of that so here we can create a really simple rag method and we'll cover we'll cover uh retrieval augmented generation in another video but here we'll create a really simple rag where we have a retriever we have a prompt we then have this chain and this chain requires context and question and then we create this other chain that that wraps this chain and it just adds in this context so we'll create this uh rag example and first let's take a look at what streaming looks like um so here we can do what is lsmith and we can stream back a result it takes a little bit um and again it takes a while to even get started that's because there's the search call to the retriever that's happening so if we want to see more of the information as it gets streamed back we can use a stream log so if we do this we start getting back a lot of information and so and that's because it's logging all the steps that happen and so some of these steps provide really valuable information so here we have this uh uh docs so here we call we named this retriever doc so we gave it a run name docs that we easily identify it so we have this output and this is the documents that it's fetching um from the search engine that we're we're using so that's really handy but there's also this other information that's not as handy and so one thing that we can do is basically uh use include names to filter Things based on their name so here we only stream things back from docs or from final output so docs will give us the immediate results um and and and we get those because we specified it with included names and we want those because those are the documents that we fetched we always get the final output results um and those we want because those are the tokens from the language model um and and we'll always get those with the path final output there is a lot more resources on streaming with rag in particular under the use case section so if you go to QA with rag you'll notice uh that we have a a few different um a a few different sources for doing things like streaming sources um uh adding chat history which also involves some streaming B um and other things like that the last thing I want to highlight with streaming is around agents so with agents um there's a few complicating factors first agents they call actions and it's unknown how many actions they will call they could call one they could call zero maybe they'll call five um and and so that that uh places a lot more emphasis on the importance of knowing what those actions are and often times agent to take a while as well so one thing that we've done is we've made it so that the thing that's returned by the agent executor and we'll cover agents in a separate video so if you don't understand the exact specifics of what I'm talking about there'll be a separate video going into what an agent executed is but basically we've made it so that the agent executor when you stream that it Returns the actions that are taken not the tokens and so let's take a look at what that looks like so we can create a simple agent here and then if we stream it um you can see that we get back first uh this action which is saying uh call to villy search results Json what's the weather in San Francisco um we then get back this result from the the agent that has this step this is the result from tavil we then call what's the weather in Los Angeles we then get back the result of what's the weather in Los Angeles and then we get back the final output by the way if we wanted to see what this looks like in Lang Smith we easily could this is what it would look like um you've got the call to the language model first you can click on it you can see that you get back this function call um whether in San Francisco you can then see the result of this function call you can then see the call to the llm it has this function call in it but it realizes it needs to make another one what's the weather in La so it calls that um and then you can see the call to the language model at the end um and it generates a output um down here so that's how you can stream back the intermediate results of an agent which is really important so that you can show the user what steps are being taken um you can also stream the tokens um so here um what we can do is we can set streaming equals to true in the llm and so this is important we have to do this and then we can call a stream log in the agent executor we can then filter to things that are open Ai and that's because this LM is named open AI and then we can start printing out the thing so here we have the streamed output um when it's first occurring um and so here we can start to see that it's building up this function call um thing um so we can start to see that it's slowly building up the function call where we get to San Francisco weather in San Francisco it's this is the query so it's so it's building up um this function call and then if we scroll to the end and um we can see that it's streaming out the the um the final response that it gives so um we can see here it starts here I'm sorry but I couldn't find the current weather in San Francisco so by using this aam log method um we can get back the results of the to at a at a token level for agents um and so you'll Noti that it also combines again uh the uh the the search results as well so we get back both and so we have more information on this as well so if you go to agents um and uh then in the howto guides we have streaming um and so we cover this more heavily here streaming is a really important ux for a lot of llm applications we've put a lot of emphasis on making sure that Lang chain is really really good at streaming let us know if you run into any issues

Original Description

Streaming is an important UX consideration for LLM applications. We've put a lot of work into making sure streaming works for your chains and agents. Jupyter Notebook (to follow along): https://github.com/hwchase17/langchain-0.1-guides/blob/master/streaming.ipynb JavaScript Notebook: https://github.com/bracesproul/langchainjs-0.1-guides/blob/main/streaming.ipynb Links: Streaming with LCEL: https://python.langchain.com/docs/expression_language/interface#stream Streaming with RAG: https://python.langchain.com/docs/use_cases/question_answering/streaming Streaming with Agents: https://python.langchain.com/docs/modules/agents/how_to/streaming
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from LangChain · LangChain · 41 of 60

1 Chat With Your Documents Using LangChain + JavaScript
Chat With Your Documents Using LangChain + JavaScript
LangChain
2 LangChain SQL Webinar
LangChain SQL Webinar
LangChain
3 LangChain "OpenAI functions" Webinar
LangChain "OpenAI functions" Webinar
LangChain
4 LangSmith Launch
LangSmith Launch
LangChain
5 LangChain x Pinecone: Supercharging Llama-2 with RAG
LangChain x Pinecone: Supercharging Llama-2 with RAG
LangChain
6 LangChain Expression Language
LangChain Expression Language
LangChain
7 Building LLM applications with LangChain with Lance
Building LLM applications with LangChain with Lance
LangChain
8 Benchmarking Question/Answering Over CSV Data
Benchmarking Question/Answering Over CSV Data
LangChain
9 LangChain "RAG Evaluation" Webinar
LangChain "RAG Evaluation" Webinar
LangChain
10 Fine-tuning in Your Voice Webinar
Fine-tuning in Your Voice Webinar
LangChain
11 Tabular Data Retrieval
Tabular Data Retrieval
LangChain
12 Building an LLM Application with Audio by AssemblyAI
Building an LLM Application with Audio by AssemblyAI
LangChain
13 Superagent Deepdive Webinar
Superagent Deepdive Webinar
LangChain
14 Lessons from Deploying LLMs with LangSmith
Lessons from Deploying LLMs with LangSmith
LangChain
15 Shortwave Assistant Deepdive Webinar
Shortwave Assistant Deepdive Webinar
LangChain
16 Cognitive Architectures for Language Agents
Cognitive Architectures for Language Agents
LangChain
17 Effectively Building with LLMs in the Browser with Jacob
Effectively Building with LLMs in the Browser with Jacob
LangChain
18 Data Privacy for LLMs
Data Privacy for LLMs
LangChain
19 "Theory of Mind" Webinar with Plastic Labs
"Theory of Mind" Webinar with Plastic Labs
LangChain
20 LangChain Templates
LangChain Templates
LangChain
21 Using Natural Language to Query Postgres with Jacob
Using Natural Language to Query Postgres with Jacob
LangChain
22 Building a Research Assistant from Scratch
Building a Research Assistant from Scratch
LangChain
23 Benchmarking RAG over LangChain Docs
Benchmarking RAG over LangChain Docs
LangChain
24 Skeleton-of-Thought: Building a New Template from Scratch
Skeleton-of-Thought: Building a New Template from Scratch
LangChain
25 Benchmarking Methods for Semi-Structured RAG
Benchmarking Methods for Semi-Structured RAG
LangChain
26 LangSmith Highlights: Getting Started
LangSmith Highlights: Getting Started
LangChain
27 LangSmith Highlights: Debugging
LangSmith Highlights: Debugging
LangChain
28 LangSmith Highlights: Datasets
LangSmith Highlights: Datasets
LangChain
29 LangSmith Highlights: Evaluation
LangSmith Highlights: Evaluation
LangChain
30 LangSmith Highlights: Human Annotation
LangSmith Highlights: Human Annotation
LangChain
31 LangSmith Highlights: Monitoring
LangSmith Highlights: Monitoring
LangChain
32 LangSmith Highlights: Hub
LangSmith Highlights: Hub
LangChain
33 SQL Research Assistant
SQL Research Assistant
LangChain
34 Getting Started with Multi-Modal LLMs
Getting Started with Multi-Modal LLMs
LangChain
35 Build a Full Stack RAG App With TypeScript
Build a Full Stack RAG App With TypeScript
LangChain
36 Auto-Prompt Builder (with Hosted LangServe)
Auto-Prompt Builder (with Hosted LangServe)
LangChain
37 LangChain v0.1.0 Launch: Introduction
LangChain v0.1.0 Launch: Introduction
LangChain
38 LangChain v0.1.0 Launch: Observability
LangChain v0.1.0 Launch: Observability
LangChain
39 LangChain v0.1.0 Launch: Integrations
LangChain v0.1.0 Launch: Integrations
LangChain
40 LangChain v0.1.0 Launch: Composability
LangChain v0.1.0 Launch: Composability
LangChain
LangChain v0.1.0 Launch: Streaming
LangChain v0.1.0 Launch: Streaming
LangChain
42 LangChain v0.1.0 Launch: Output Parsing
LangChain v0.1.0 Launch: Output Parsing
LangChain
43 LangChain v0.1.0 Launch: Retrieval
LangChain v0.1.0 Launch: Retrieval
LangChain
44 LangChain v0.1.0 Launch: Agents
LangChain v0.1.0 Launch: Agents
LangChain
45 Build and Deploy a RAG app with Pinecone Serverless
Build and Deploy a RAG app with Pinecone Serverless
LangChain
46 Hosted LangServe + LangChain Templates
Hosted LangServe + LangChain Templates
LangChain
47 LangGraph: Intro
LangGraph: Intro
LangChain
48 LangGraph: Agent Executor
LangGraph: Agent Executor
LangChain
49 LangGraph: Chat Agent Executor
LangGraph: Chat Agent Executor
LangChain
50 LangGraph: Human-in-the-Loop
LangGraph: Human-in-the-Loop
LangChain
51 LangGraph: Dynamically Returning a Tool Output Directly
LangGraph: Dynamically Returning a Tool Output Directly
LangChain
52 LangGraph: Respond in a Specific Format
LangGraph: Respond in a Specific Format
LangChain
53 LangGraph: Managing Agent Steps
LangGraph: Managing Agent Steps
LangChain
54 LangGraph: Force-Calling a Tool
LangGraph: Force-Calling a Tool
LangChain
55 LangGraph: Multi-Agent Workflows
LangGraph: Multi-Agent Workflows
LangChain
56 Streaming Events: Introducing a new `stream_events` method
Streaming Events: Introducing a new `stream_events` method
LangChain
57 Building a web RAG chatbot: using LangChain, Exa (prev. Metaphor), LangSmith, and Hosted Langserve
Building a web RAG chatbot: using LangChain, Exa (prev. Metaphor), LangSmith, and Hosted Langserve
LangChain
58 OpenGPTs
OpenGPTs
LangChain
59 Open Source RAG with Nomic's New Embedding Model (and ChromaDB and Ollama)
Open Source RAG with Nomic's New Embedding Model (and ChromaDB and Ollama)
LangChain
60 LangGraph: Persistence
LangGraph: Persistence
LangChain

LangChain v0.1.0 provides a powerful platform for building streaming-enabled LLM applications, utilizing retrieval augmented generation and agent executors to facilitate improved UX and intermediate result logging. By following the steps outlined in this lesson, users can create simple chains, stream intermediate results, and filter outputs based on model names.

Key Takeaways
  1. Create a simple chain with a prompt, model, and output parser to stream back responses using the stream method
  2. Run two chains in parallel to stream back responses and intermingled tokens from poem and joke
  3. Use logic to build up a dictionary of different responses and keys returned
  4. Use the stream log method to return intermediate steps, such as documents fetched in a RAG process
  5. Create a simple RAG method and use streaming to get results as they become available
  6. Filter results based on name using include names
  7. Create an agent and stream its actions
  8. Use the agent executor to return the actions taken
💡 LangChain's streaming capabilities enable users to see the steps taken by the agent and language model, providing valuable insights into the decision-making process and facilitating improved UX and debugging.

Related AI Lessons

Building LSTMs with PyTorch and Lightning AI Part 7: Resuming Training with Checkpoints
Learn to resume LSTM training with checkpoints using PyTorch and Lightning AI, enabling efficient model iteration and development
Dev.to · Rijul Rajesh
How AI Learns with Less Labeled Data
Learn how AI can learn with less labeled data, a crucial aspect of machine learning beyond model selection
Medium · AI
Comparing Sarvam-30B and Qwen2.5–14B on Spider Text-to-SQL: An Active-Parameter Perspective
Learn how to compare large language models like Sarvam-30B and Qwen2.5-14B on the Spider Text-to-SQL benchmark from an active-parameter perspective
Medium · LLM
Claude Sonnet 5 closes the gap to Opus without the Opus bill
Claude Sonnet 5 emerges as a cost-effective alternative to Opus, learn how it closes the gap without the hefty price tag
Medium · LLM
Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch →