World Model RAG: Generative Semantic Workspaces
Key Takeaways
The video discusses an ultra-modern RAG system with inherent world model creation and a structured, spaciotemporal memory, using tools such as GPT4 Omni, GPT2, and GPD4 Omni, and techniques like deep semantic analysis and reconciler analysis.
Full Transcript
Hello community. So great that you are back. Yes, we have a new Rex system and we have a beautiful a powerful one. So welcome to my channel Discover AI where we have a look at the latest AI research papers. And here we have it published November 11th, 2025 beyond fact retrieval episodic memory for rag with a generative semantic workspace from University of California, LA. They had a beautiful idea. They said, "Hey, let's look at the human brain, you know, at the brain regions here, the temporal lobe here, the amda here, different connectivities. And let's go with the idea that the hypocmpus is more or less responsible for indexing. The neoortex does some cultural validation and we do have predictive function. And now we map all of this into a new EI system. So this was the primary idea. Let's start. They say we found we need an operator and the job of the operator is to perform a deep semantic analysis as a ra system on a small local chunk of text. You know when we go out and we have access to external data streams we chunk it up so that we have chunk of text that we can analyze. Now the operator job is now to transform this unstructured language into a structured machine readable snapshot of an event. So the output of the operator is now after ingested here one two three sentences a local semantic graph or if you want you can also have it as an index card. The context of the index card is now defined by the authors. You have all the actors, the roles, the states, the actions and and this is beautiful the spatial temporal information. So what actors are at what time exactly at what space? How are they interacting? What are their roles? What a nice idea. Okay, let's have a closer look. Let's build examples. And you know, I built on my channel examples for my because I think it's easier if you see it here in action. So the first text junk number one is, hey, on May 10th at the Geneva Convention Center here, Dr. Somebody nervously approached the podium. The presentation was about to begin. Now comes here beautiful and output here. Uh local semantic graph, the index card, and this is the local index card. Now you can immediately build here of course a graph structure because look what we have. We have identified here in this text chunk one multiple actors Dr. Aristo the Geneva Convention Center and even a particular date and you see immediately that the definition of an actor is not a classical definition because I will show you later then the real definition in the LLM prompt. And for this actor we have the role. The role is the speaker. The state is nervous. The convention center, the rule is the location and from the date, guess what? The role is the date. And then we have an action approaches to the podium. And this is the actor, Dr. Aris Tor. And this is it. This is text chunk number one. And guess what? Now we have text chunk number two. 10 minutes later, he began his talk on quantum entanglement. Oh, he was a theoretical physicist. What a coincidence. Now as he spoke his initial nervousness was replaced with visible confidence. So now our new system says okay my output is not a second local semantic graph limited only to this two sentences and our index card this runs like this actor is he because we don't know who is he. We start all over again. The state is confident. The actor is now of course we found here some term some object like quantum entanglement. The role is supposed by the EI not the topic and the action is to begin his talk. The actor is he. So you see what we built. We built hundreds and thousands of these little tiny semantic local graphs. And now what we do if we have thousands of them, we need an AI system that sees where do they fit together because he is as you know from the last one a specific person. This is not the job of the reconciler. The reconciler analyzes now the context of all the index card tiles and thousands and build a global coherent stateful memory representation a structured memory. And this is the way it looks. The memory entry after these two is now this one. We have a profile for Dr. Aristor. The known roles are speaker and presenter. The event timeline is presentation on quantum entanglement. The location is the Geneva Convention Center. The date is given and the state progress initially nervous. Finally confident. And the key action that we have from our two chunk is approach the podium and begin to talk. So you see what is happening. The reconciler tries to make sense of all the little chunks that are little semantic graphs and build it into a coherent story. This is not a rag system. Reconiler analyzes the context, performs the integration task. We have the entity resolution. The reconciler analyzes the context and determines that he from graph 2 and is the same identity as Dr. Ernestone from the graph one. Then we update the state. This reconcil is not a two state for a single entity. Nervous and confident. It places the now on a specific timeline recognizing now a chronological transition. So in our rag memory we suddenly have a timeline established and then the integration all the information belongs now here to the entity Dr. Ris torren. So you see exactly how this is going to be executed and I told you that hey let's have a look at the llm prompt for the operator extraction. How are these index cards built? Now here this is the complete prompt for the operator. Let's have a closer look. This is the first half. So we are simply telling here and this is a GPT4 omni system. They say hey you are required to perform the operator extraction. So you should shall do the following steps. Task one actor identification. And an actor can be a person, an organization, a place, a creative work, a temporary entity, a physical object, an item, an abstract entity, a ground actor, whatever. Just extract those. And then the second part is task two. Assign a role state identification, explicit verb phrase identification, implicit action phrase interference, prototypical semantic role question generation, and the task six is an answer mapping and an actor connection. So you see we place a lot of burden on the semantic understanding of GPT4 Omni because little our little AI has to do all of this given that we have now one or two text sentences here as the input. So the system is now trying not just to feed new words or new sentences into the EI system but it tries to make sense also already at the extraction level. So you see the first two three sentences already there. We have trying to build a mini story. What is the person? What is it doing? What is it talking about? What is the time? What is happening in this little mini frame if you want mini time frame? And then we go on and on and on. We build more and more and more. Then we build here coherent story. We have here the prompt. I show you here for the space-time coupling. This is all there is. This is what you give the task here for GP2 GPD4 omni and you hope that it will finally come up with the correct timeline here in the memory structure. Here is now if you want a visualization by the orders themselves. They say okay situation summary we have here the operator example number one here number one too. And you see those are the sentences. We have the first sentence and the second sentence. And you see exactly as I just went with you through all of this, the system tries to make sense. It tries to identify who is the subject, what is the subject doing, where is the subject acting on what is it acting, is it communicating with somebody else, what is the location, what is the action in specific what we have about the timeline development. So you see the system tries already to extract as much as possible before building here the coherent stories. And then here you have the reconciler now that takes you those two elements I just showed you and builds now here a complete timeline. This is here the complete if you want world understanding. Yes of course we are trying here also to build world models that we can store in a structured memory entity for rag for our AI system. But this is not that trivial because it is simple if you just have here somebody attended a conference. But if you have a domain specific expertise like physics or mathematics or finance or medicine, this is not as easy that you just say hey yeah identify the subject and identify the role and identify the timeline. The internal complexity in a sentence of medicine or in a sentence of theoretical physics at first we have much more technical terms. So you have to train your vocabulary in this and then the interconnect between those technical terms is not easy at all. So you have to train your eye system first on this topic on this domain knowledge and then hopefully you hope that for additional external data integration via rag this will be performed on the external data given the understanding of your internal data. And here you have the final the end result. Here beautiful managed. Here you have here the sentence. And then you have here a complete understanding here. If you want a knowledge graph with a timeline that develops here for a particular person on a particular location communicating with other persons. This is now the new rack. This is the main idea of this new rack. Now as I told you, you can see this from different perspective. No, you can change your point of view and you have a different framing. I can tell you for example if you want to have it a little bit more scientific now that we went through the example and you have a feeling what is is happening here. Your operator model function here as a semantic parser. It processes here the text chunks from the external data to extract here local graph of actors their assigned roles and the evolving state. These states act as a contextual modulators on the action distribution defined by the roles. The reconciler integrates these local graphs into a persistent global workspace and the process is modeled as a state space transition iteratively updating the posterior of the workspace given the context sequence effectively performing an information fusion to maintain a spatiotemporal and logical coherence. If you prefer this more scientific speech talking pattern, maybe this helps you here. if you want to go to a conference. But the beauty of framing is hey we have a brain so let's use it and let's see it hey let's do the same thing now in the formal language of mathematics and here you have a screenshot from the paper and of course they did it first in mathematics and they said okay give him a simple text input here and as you can see we can run through this it's rather simple but you can formulate it it here in a mathematical for system then you just explain it and say the transition model uses a Makovian assumption I have a particular video on this to produce the updated workspace instance by reconciling existing workspace semantic maps with new semantic information. Sounds good. Yeah, since we are sounding good, let's do another thing. What about a semiotic representation? You know, semiotic relating to signs and symbols. If I saw this first year in the paper and I said, what is happening here? Documents, chunk, operator, reconciler. I did not understand those signs together and then reading the papers understood exactly what is happening. Okay, large scale text is segmented to seeantically coherent chunks. Careful if you have a long medical sentence let's say 20 30 maybe 50 words in this single sentence. No, it is really sensitive where you chunk up now the pieces. Each chunk has to be processed now by the operator model to generate a local workspace instance that is represented as a semantic local graph or as I showed you as an index card that you can transform back and forward and backwards to a graph structure. And those instances are then incrementally integrated by the reconiler resulting now in a global unified memory structure for the rack system. And if you then have a question on answering. So what you do the system retrieves now the relevant portions of the memory by matching in the simplest case here named entities. No in the query to identifiers in the sematic network and for each match it reconstruct here the episodic summaries. So you don't have simply facts but you have many stories you know complete closed stories where you understand who is the acting person what is he she doing why they are doing what is the environment what is the time frame you simply have a much denser information stream and not just a single word or just a single sentence but you have a mini world model and you see exactly we want then to accumulate all the world models in a world graph representation in our knowledge graph. So operational performance data they did this they experimented it they run it and they compared it to our other rack system and now the main question is should you switch is this a good rack system now they give unfortunately here only here at the app bench so a 200 chapter book benchmark but if they take this benchmark here okay with precision now we have the vanilla LLM and you have a performance of 84% so We are already saturated close to 100%. Maybe this is not the best benchmark but only benchmark available. Then we have the embedding rack. We have the graph rack. We have the hippo rack. We have the light rack. For each of those I have a particular video. And now we have the latest the GSW rack system. Since we're already close yet 95%, you see it is not giving us a lot of information. But in general you can see since the bold figures show here the best performance system. Oh yeah we have quite some bold figures in here. Now if we do here now for the recall another parameter that we can benchmark you see here again. Oh yeah almost all the parameter on our bolt. So we have here at top performance but you can really now go and compare it here. I don't know with a light rag you have here 71.6% 6% and with this new methodology you have 86.3%. So there is something happening that is really interesting. And if we go down here to an F1 score you have another complete numerical benchmark performance data. And as you can see all the numerical indicators here in Bol. So outperforming here every other rag model that we are familiar with from vanilla llm embedding rag graph rig hyperre and light rag. So it looks like quite a powerful rack system. If you look closely you see that this new GSW rack achieves the highest overall F1 score, precision score and recoil score on a particular metric that was chosen by the autos. Of course, it's the best scenario that is presented. But in general, improving here the metric by more than 10% over the next best model. This is graph rack. 10% in rack is quite a lot. This is a quite a jump. And you know this is not the only benefit because I can tell you if you look here at the token usage, this new GSW rag achieves a remarkable 50% reduction in token usage if you compare it to graph rag. Now this means a lot of less compute, a lot of less thinking, a lot of less whatever you have to pay for. And if you go with other embedding with the classical embedding rag or hipper rag or whatever, you can have a token reduction that is almost up to 60%. So real nice because it is not only that you have to pay less, but it is also faster. And if you remember in this video where I showed you that we have now AI agent for realtime intelligence that we need fast EIS. If we have a fast rack system, guess what? This is the perfect combination. So what a beautiful new rack system. And I wanted to show you this rack system because I really think it is outstanding. Here you have the summary here, the quote here from the oras where they tell you, yes, we are just beautiful. We construct now a persistent structured memory sequence here for our YI system. Now you might say okay this was interesting but this was just the rag system. So we activated here level one but what about the other levels? Is there something more that we can learn? If we understood that this is now a rack optimization can we also do an EI optimization? Well, what a coincidence that you ask because in my very next video, we are going to talk about a new training methodology for AI. First, we will understand that reinforcement learning by verifiable reward. We have new publication, new insight how the learning process is structured internally and we will build some beautiful low cores subspaces to understand a deeper learning process and understanding the learning process we will find complete new solution how you can train your AI. I hope you enjoyed the video. I hope you had a little bit of fun you learned a little bit. If you want, subscribe, even become a member of my channel. This would be great. And I hope to see you in my next video.
Original Description
An ultra-modern RAG system with inherent world model creation and a structured, spaciotemporal memory.
We've all seen LLMs fail on long-form narratives, their reasoning collapsing under the weight of "context rot" as standard RAG systems feed them a fragmented "bag of chunks." But what if, instead of just retrieving facts, an AI could construct a persistent, episodic memory?
The Generative Semantic Workspace (GSW) paper proposes a groundbreaking, neuro-inspired framework that does precisely that. It moves beyond fact retrieval to build a dynamic internal world model, using an Operator to witness events and a Reconciler to weave them into a coherent spatiotemporal timeline.
This architecture allows the model to track evolving actor states and relationships, generating concise narrative summaries from its own structured memory.
This isn't just a better RAG; it's a blueprint for an AI that truly remembers, and its state-of-the-art performance on episodic benchmarks suggests the future of long-context reasoning is finally here.
All rights w/ authors:
Beyond Fact Retrieval: Episodic Memory for RAG with Generative Semantic
Workspaces
Shreyas Rajesh, Pavan Holur, Chenda Duan, David Chong, Vwani Roychowdhury
from
University of California, Los Angeles
arXiv:2511.07587
#aiexplained
#airesearch
#artificialintelligence
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Discover AI · Discover AI · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Step Into the Unknown (by YouChat) - May 2023 be your best year yet
Discover AI
Wishing you all an amazing 2023 filled with Love, Laughter, and Happiness!
Discover AI
Create a Smarter Future!
Discover AI
The Art of Text to Vector Transformation: A Comprehensive Look at AI and NLP Transformers
Discover AI
Feature Vectors: The Key to Unlocking the Power of BERT and SBERT Transformer Models
Discover AI
Domain-Specific AI Models: How to Create Customized BERT and SBERT Models for Your Business
Discover AI
Achieve Unimaginable Levels of Domain Knowledge through SBERT Extreme in 3D (SBERT 48)
Discover AI
Unlocking Scientific Domain Knowledge w/ BPE Tokenizer: An Amazing Journey! (SBERT 49)
Discover AI
SBERT Extreme 3D: Train a BERT Tokenizer on your (scientific) Domain Knowledge (SBERT 50)
Discover AI
Discover Vision Transformer (ViT) Tech in 2023
Discover AI
Pre-Train BERT from scratch: Solution for Company Domain Knowledge Data | PyTorch (SBERT 51)
Discover AI
Flan-T5-XL model on a free COLAB | A free LLM - that explains itself w/ reasoning /write essay | AI
Discover AI
BERT and GPT in Language Models like ChatGPT or BLOOM | EASY Tutorial on Large Language Models LLM
Discover AI
Free Alternative to ChatGPT: Flan-T5-XL GUI (open-source) #shorts
Discover AI
From T5 to T5X: A Game-Changing Evolution with JAX & FLAX
Discover AI
How to start with ChatGPT? | Short Introduction to OpenAI API #shorts
Discover AI
The Future of Conversational AI? Google's PaLM w/ RLHF | LLM ChatGPT Competitor
Discover AI
Microsoft and ChatGPU
Discover AI
From Zero to FLAN-T5 XL Model GUI with Gradio: A Step-by-Step Guide on Free COLAB Notebook PyTorch
Discover AI
Google's 2nd Answer to "BING ChatGPT": Sparrow | after BARD w/ LaMDA | 2nd Gen Conversational AI
Discover AI
TF2: Pre-Train BERT from scratch (a Transformer), fine-tune & run inference on text | KERAS NLP
Discover AI
3D Visualization for BERT: How to Pre-Train with a New Layer & Fine-Tune with Downstream Task Layer
Discover AI
FLAN-T5-XXL on NVIDIA A100 GPU w/ HF Inference Endpoints, let's explore 11b models!
Discover AI
ChatGPT - Can it Lie to you?
Discover AI
ChatGPT Alternative: Perplexity by Perplexity.AI
Discover AI
2023 KerasNLP Tutorial: Explore Latest KERAS Toolbox & NLP Processing Library for BERT - TF2
Discover AI
Self-aware AI: You.com/chat vs Perplexity.ai | Live Demo, LLMs show Future of ChatGPT w/ BING
Discover AI
BLOOM 176B Inference on AWS | Bigger than GPT-3 for more Power!
Discover AI
Fine-tune ChatGPT? Buy Embeddings /OpenAI? What are Embeddings? My own ChatGPT? | Visual Q+A
Discover AI
Unleashing the Power of BLOOM 176B with AWS ml.p4de.24xlarge, DJL & DeepSpeed: The Ultimate Boost!
Discover AI
After ChatGPT: NEW BioGPT by Microsoft | Do YOU trust Microsoft for your Medication?
Discover AI
Improve ChatGPT: Modular, Adaptive, Smart LLM | Inside ChatGPT
Discover AI
Fine-tune ChatGPT w/ in-context learning ICL - Chain of Thought, AMA, reasoning & acting: ReAct
Discover AI
The Intersection of Copyright Law and Human Faces: Exploring Virtual K-Pop with MAVE
Discover AI
New TECH: Vision Transformer 2023 on Image Classification | AI
Discover AI
PyTorch code Vision Transformer: Apply ViT models pre-trained and fine-tuned | AI Tech
Discover AI
New BING ChatGPT: Unlock the Power of Emotions in your Search Engine!
Discover AI
New BING ChatGPT loses its mind
Discover AI
Self-Attention Heads of last Layer of Vision Transformer (ViT) visualized (pre-trained with DINO)
Discover AI
Visualizing the Self-Attention Head of the Last Layer in DINO ViT: A Unique Perspective on Vision AI
Discover AI
Microsoft strongly restricts access to ChatGPT on new BING - WHY?
Discover AI
PyTorch ViT: The Ultimate Guide to Fine-Tuning for Object Identification (COLAB)
Discover AI
New BING Chat AGGRESSIVE
Discover AI
Panoptic Image Segmentation: Mask2Former explained | Identify all objects!
Discover AI
Code Panoptic Image Segmentation w/ Vision Transformer & Mask2Former - A PyTorch tutorial
Discover AI
Dream Job Alert: AI Prompt Engineer - $335K | AI Prompt Design: A Crash Course
Discover AI
Streamlining Similar Image Detection with ViT in PyTorch: A Step-by-Step Guide
Discover AI
Microsoft's CEO in Trouble #shorts
Discover AI
Why wait for KOSMOS-1? Code a VISION - LLM w/ ViT, Flan-T5 LLM and BLIP-2: Multimodal LLMs (MLLM)
Discover AI
OpenAI's ChatGPT can NOW summarize external Sources on the Internet?
Discover AI
ChatGPT polarizes
Discover AI
Hospital /Clinic AI Decision Models: Performance of 12 AI LLM Systems (incl $$) Radiology, Biomed
Discover AI
ChatGPT Prompt Engineering w/ in-context learning (ICL) - 7 Examples | Tutorial
Discover AI
Chat with your Image! BLIP-2 connects Q-Former w/ VISION-LANGUAGE models (ViT & T5 LLM)
Discover AI
ChatGPT: Multidimensional Prompts
Discover AI
ChatGPT: In-context Retrieval-Augmented Learning (IC-RALM) | In-context Learning (ICL) Examples
Discover AI
Code your BLIP-2 APP: VISION Transformer (ViT) + Chat LLM (Flan-T5) = MLLM
Discover AI
Buy Microsoft "Azure OpenAI Service" or buy from OpenAI its API for ChatGPT access & tuning?
Discover AI
Pretraining vs Fine-tuning vs In-context Learning of LLM (GPT-x) EXPLAINED | Ultimate Guide ($)
Discover AI
Reversible Transformer: ReFORMER for GPU Memory Optimization! Reversible Residual Layers?
Discover AI
More on: Research Methods
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way
Medium · AI
ICMI 2026 Reviews [D]
Reddit r/MachineLearning
Workshop submission for main conference paper under review [D]
Reddit r/MachineLearning
Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]
Reddit r/MachineLearning
🎓
Tutor Explanation
DeepCamp AI