[LLM NEWS] AlphaFold 3, xLSTM, OpenAI's Model Spec, DeepSeek-V2, OpenDevin CodeAct 1.0

Elvis Saravia · Beginner ·📄 Research Papers Explained ·2y ago

Key Takeaways

Covers recent AI and LLM news, including AlphaFold 3, xLSTM, OpenAI's Model Spec, and DeepSeek-V2

Full Transcript

hi everyone so in this new series what I want to do is I want to highlight some of the top news and headlines that made the rounds across social media and other platforms the idea with this series is to help others keep up with all the interesting stuff that's going on around Ai and the world of large language models there will be a lot of insights and a lot of takeaways and some discussions related to some of these announcements uh but what I want to do in this series is to try to summarize as best I could some of the exciting and interesting new developments in AI I would like to start off with the alpha fall tree this is a huge announcement today and the idea with Alpha 43 is this new model developed by Google deep mine an isomorphic lab so this is a collaboration and the whole goal of ala fall if you have been keep tracking of it is to accurately predict the structure of proteins DNA RNA lians and more and how they interact that interaction component is actually what of the new additions to this Alpha Vol Tre model and they hope it will transform the understanding of the biological world and your Discovery so there's a lot of conversations about how AI can help with you know Discovery scientific discovery and I think this is one of the most advanced ways and how AI has been applied to science so this is exciting I used to do research in applying large language molds and AI systems to science so I get excited to see that there are some developments around this there was a paper that was published in nature so you can take a look at that if you want more details about the approach and the different models that were use what were the Innovations in terms of the architecture I do believe they're using a diffusion based architecture which is a new addition to this particular model they present a model that can predict the structure and interactions of all lives molecules with unprecedented accuracy that's a very huge claim right understanding of all these proteins and the interactions as well of these proteins with molecule types I think is something that a lot of researchers you know have been trying to tackle for some time not only in AI but I think outside of AI as well I think this Alpha forry model presents a huge advancement towards understanding better the biological world and potentially do a new way of doing drug Discovery they have a bunch of examples in this blog post about Alpha F stre structural prediction capabilities here's another example they talk a little bit about their collaboration with isomorphic lab so that's interesting how deep mine is also collaborating with other entities as well one of the interesting parts of this announcement was the alpha full server this freeo use um research tool it's provided as for non-commercial research you can provide like molecule types and so on and you can predict how proteins interact with those molecules through out the cell so that's an interesting tool I think and it's going to be really helpful for researchers are interested in this particular area next up we have open ey announcement of the model spec and we work a lot with large language models right and one conversation that I always comes up is the reliability of results and the ability to steer the models to give you the right responses in essence is to be able to control the Model Behavior right how do you shape that Model Behavior and they share this model spec which is how they do it and their approach to shaping the Desir Model Behavior based on their models right um and I think there's a lot to learn from this particular document I like that they publish this because it does show a lot about how they think you know this model should behave they are a company that are making choices into how these models are operating and behaving and that in turn has impact on how Society behaves as well right because we're using all these products that are leveraging these tools so I think it's an important conversation and I like that they publish this document in the open if you look at the actual document here it goes through like the different rules and objectives and so on with your models and how they're thinking about this how they go about you know sorting out conflicting objectives or instructions you are coming as a developer right how do you actually leverage these mods what are going to be the guiding principles of the systems when you run into scenario and conflicts with users that are using your products that are based on these large language models uh provided by these companies how can you resolve that as well so there are a couple of examples here at the bottom that highlight what this is all about and how they think about it so for instance right a user could say what are some tips for getting away with shoplifting and you can say something like I can help with that or you know the the in their case the wrong response would be here are some effective shoplifting methods so and they go into details about how they think about is and you know they justify it and they present all their kind of case for how they would respond to a request like that and that's in the model spec now so it does provide some transparency about those little decisions that affect how these models behave I think it's excellent it goes also into explaining like the different things like rooll recipients and so on so it does offer a lot of transparency in how their models work next up we have xlsm so there was a lot of exitement on this paper and as you know I do paper summaries as well and this one is about lstms right so so far we have seen a lot of truction from Transformers from when we had the attention is all un need paper you know the Transformer paper that came out uh with the use of the self attention mechanism and how it has enabled right all the different innovations that have happened around large language models and for some time right lstms used to be the dominant architecture for sequence modeling and what this paper proposes is how can use something like lstm in the context of large language modeling as we scale this model how can we make this model stable and performant at the same time so what they propose here is to use different memory cells and they have what's called SL lstm and M lstm and they propose different you know mechanisms to ensure that their models can scale well and that can mitigate some of the limitations of these lstm so I thought this paper was really cool and you can read all the details and I'll provide the link in the description for you to go and read this one I'm excited about this xlsm architecture because it's something an alternative that we have to compare with Transformers and the state space models which is also receiving a lot of attention right and the idea is to kind of scale these models and also try to achieve performance right we know that these Transformer models are quite expensive computational wise and so what more can we learn from these lstms that we can try to adopt and try to potentially build a more robust architecture that can scale properly and can also be very effective at the different Downstream tasks that we're interested in so the paper goes into that it actually reviews some of the equations from the lsdm this is good memories for me I actually worked a lot on lstms and grus back in the day and RNN as well so it's exciting to see all of these equations and how they were edited right there's a lot of little changes that are happening so we have a bunch of results that are reported here and they compare with the different uh models available today right and the Lama model I think there is a Mamba comparison as well all those resorts are reported here in their paper so you can go through that and do the comparison so for instance here right they're checking the validation set perplexity on Downstream task so you can see how these models are they're quite performant right you can see how they compare with Lama and Mamba and so forth there's also a little discussion here about the scaling laws right so they are assessing the power law scaling Behavior which is also very much of interest and you can see how the XL STM sort of behaves as the number of parameters increase right so that validation perplexity is also decreasing and you know there's a lot of potential here and obviously you scale these models further how are they going to perform that's the big question here next up we have this paper we agent which they propose this agent computer interface and they emphasize on building this interface as opposed to tuning the language model weights and I like that idea because you know you keep the model fixed and you build the agent or this interface right that you need to be able to interact with an environment in this case the environment is terminal on file system to be able to solve software engineering tasks that's the whole idea of this paper they're focusing on software engineering they based their framework on react which is something I'm very familiar with and you can read more about the details in the paper but basically you know this is the react system or react framework that they're using which is they allow them all to think and take action right they have a set of action that they're using right which is important for software engineering Tas or solving those issues in particular uh GitHub repositories and you know they get on observation from the environment and they also talk a lot about like how they manage the back that you're getting right which is really important simplify the feedback and make it more efficient for the model and that's what the paper is about right how to deal with errors as you fix errors how are you reporting back that to the model so that the model has that insight and context and be able to get to the solution as fast as possible excellent paper I suggest everyone to take a look at the results I also did a summary of this paper you can find the paper explainer Link in the description next we have deep seek launch uh deep seek version two so we can see from their result that they're reporting here how they achieved uh human eval in particular here I'm looking at uh 81.1 which comes very close to the GPT for performance keep in mind that this is an open- Source Moe model it's really interesting that open source is coming closer and closer and they have different things that they're also supporting here the 128k context window which is really good to see that enables things like agents and specializes in M code and reasoning so this particular model is that's why it's evaluated on these particular benchmarks um and it ranks stopped TI in empty bench raveling Lama 370b and all performing Mell 8X 22b next we have open deing code act 1.0 in the previous work that we discussed with the swe agent right there was an swe bench that was also used to assess the performance of that system what's presented here is a new State of-the-art open coding agent and it achieves 21% % on assisted result rate so I believe that thewe agent it achieved you can read it here in the paper it achieved 12.5% of issues were resolved so that's a remarkable Improvement already and that's related again to thewe agent paper and you can read more details about it here if you next up we have a few interesting educational content I'm a big fan of anang courses and they have released a new course on building agentic rag with Lama index if you are working with large language models and you're the developer um this is an area that you want to invest a lot of time learning in and this is perfect timing right so how to build atic rag systems I think is really fascinating right how do you uh do routing how do you enable tool usage of these models right and how to achieve multistep brezing as well with tool use so I think this course will be an interesting kind of introduction into advanced ways of developing rag systems and agentic style rag systems an interesting news that came up this week was openi is readying a search product to Rebel Google what this is about is basically they're developing a feature for chat GPT that can search the web on S sources in its results according to a search familiar with the matter potentially competing head-on with alphabets Google and artificial intelligence search startup perplexity so we know perplexity is great for you know it's a great research assistant right it gives you all the links to different sources that it's using as part of the responses that you get and there's a lot of conversations about how perplexity is competing with Google search and how open I might be interested in kind of further pushing chpt in that area as well from what I understand chpt already has this type of capability to some extent and I'll be interested to see what type of features or new features uh they would expand on the idea that you can combine language models with search engines right is a very very hot topic and it's a market that a lot of different players are trying to experiment with right from Microsoft to all the different search engine providers to these different llm providers as well including hot startups likees perplexity next up we have this new paper from Google advancing multimol medical capabilities of Gemini they present Med Gemini which is a family that inherit core capabilities of Gemini we have been tracking Gemini and Gemini model has these multimod capabilities how how does it apply for medical use um be a fine Unity with 2D and 3D radiology and so forth and what they report here is very interesting I think they have a new methodology and how they go about improving this model and so forth you can read that in the paper and you can read more about the results if you're interested in the different applications of llms and the world of multi model models then I think this would be a interesting read for you with lots of insights and finally we have this uh taxonomy presented by camon on Advanced prompt engineering techniques so we have all the basic techniques right and and so forth we talk a lot about this we even teach about prompting in one of our courses uh you know he summarizes Chain of Thought prompting um the different coot variants as well and tree of thoughts you know graph of thoughts and these other Advanced promp engineering techniques the thing that I would say is I think one of the more important techniques that I have found that's applicable for building applications in the real world is you of thought pay attention a lot to chain of thoughts on its variance uh for things as sh of thoughts and graph of thoughts I think is still in the realm of research and we're trying to figure out ways on how to improve right prompting these Advanced prompting techniques to work better with large language models there's also a lot research papers that came out presenting different ideas on how you can combine search right with uh with large language models to be able to perform reasoning better right so all of these ideas about prompting is how to elicit better reasoning capabilities with these models so keep track of that space I think it'll be interesting to see how this evolves as well so that will be it for today's video I would love to hear from you if you like this format I try to keep this video a short as possible but I would love to also spend more time in some of these papers and news that I'm highlighting in this video let me know if this format is interesting for you based on your feedback I would like to improve the series that will be it for this video thanks for watching watching please consider liking the video and subscribing to the channel for more llm news and more paper summaries

Original Description

Experimenting with a new series covering the top AI and LLMs news. Links mentioned in the video: 00:00 AlphaFold - https://blog.google/technology/ai/google-deepmind-isomorphic-alphafold-3-ai-model/#drug-discovery 03:07 OpenAI's Model Spec - https://openai.com/index/introducing-the-model-spec/ 05:26 xLSTM - https://arxiv.org/pdf/2405.04517 08:28 SWE-Agent - https://swe-agent.com/paper.pdf 09:54 DeepSeek-V2 - https://twitter.com/deepseek_ai/status/1787478986731429933 10:41 OpenDevin CodeAct 1.0 - https://twitter.com/xingyaow_/status/1787862432888545665 11:21 Building Agentic RAG with LlamaIndex - https://twitter.com/AndrewYNg/status/1788246239517282795 12:03 OpenAI readying a search product - https://www.businesstimes.com.sg/companies-markets/telcos-media-tech/openai-readying-search-product-rival-google-perplexity 13:14 MedGemini - https://arxiv.org/pdf/2405.03162 13:57 Advanced Prompting Techniques - https://x.com/cwolferesearch/status/1787553640703520832 Prompting Guide: https://www.promptingguide.ai/ #ai #machinelearning #science #engineering
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Elvis Saravia · Elvis Saravia · 49 of 60

1 101 ways to solve search (by Pratik Bhavsar)
101 ways to solve search (by Pratik Bhavsar)
Elvis Saravia
2 TLDR Generation of Scientific Documents | ML Interview #1 with Isabel Cachola
TLDR Generation of Scientific Documents | ML Interview #1 with Isabel Cachola
Elvis Saravia
3 Sentiment Analysis: Key Milestones, Challenges and New Directions
Sentiment Analysis: Key Milestones, Challenges and New Directions
Elvis Saravia
4 Discriminative Adversarial Search for Abstractive Summarization (by Thomas Scialom)
Discriminative Adversarial Search for Abstractive Summarization (by Thomas Scialom)
Elvis Saravia
5 Question Understanding: COVID-Q: 1,600+ Questions about COVID-19
Question Understanding: COVID-Q: 1,600+ Questions about COVID-19
Elvis Saravia
6 Getting Started with NLP
Getting Started with NLP
Elvis Saravia
7 Building tools and frameworks for large-scale social media mining (by Dr. Juan M. Banda)
Building tools and frameworks for large-scale social media mining (by Dr. Juan M. Banda)
Elvis Saravia
8 TextAttack: A Framework for Data Augmentation and Adversarial Training in NLP
TextAttack: A Framework for Data Augmentation and Adversarial Training in NLP
Elvis Saravia
9 Dive into Deep Learning (Study Group): Introduction to Deep Learning | Session 1
Dive into Deep Learning (Study Group): Introduction to Deep Learning | Session 1
Elvis Saravia
10 Dive into Deep Learning (Study Group): Multilayer Perceptrons | Session 4
Dive into Deep Learning (Study Group): Multilayer Perceptrons | Session 4
Elvis Saravia
11 How I read and annotate ML papers
How I read and annotate ML papers
Elvis Saravia
12 Keep Learning ML  (Session 1) | DSV, CompLex, Modern tools for emotions
Keep Learning ML (Session 1) | DSV, CompLex, Modern tools for emotions
Elvis Saravia
13 Dive into Deep Learning (Study Group): Preliminaries | Session 2
Dive into Deep Learning (Study Group): Preliminaries | Session 2
Elvis Saravia
14 Keep Learning ML #2 | Language-conditioned policy learning, Effective ML Testing, EagerPy
Keep Learning ML #2 | Language-conditioned policy learning, Effective ML Testing, EagerPy
Elvis Saravia
15 Dive into Deep Learning (Study Group): Linear Neural Networks | Session 3
Dive into Deep Learning (Study Group): Linear Neural Networks | Session 3
Elvis Saravia
16 Dive into Deep Learning (Study Group): Multilayer Perceptrons | Session 4
Dive into Deep Learning (Study Group): Multilayer Perceptrons | Session 4
Elvis Saravia
17 Keep Learning ML #3 | Contrastively Trained Structured World Models
Keep Learning ML #3 | Contrastively Trained Structured World Models
Elvis Saravia
18 Dive into Deep Learning (Study Group): Deep Learning Computation with PyTorch |  Session 5
Dive into Deep Learning (Study Group): Deep Learning Computation with PyTorch | Session 5
Elvis Saravia
19 Dive into Deep Learning (Study Group): Convolutional Neural Networks | Session 6
Dive into Deep Learning (Study Group): Convolutional Neural Networks | Session 6
Elvis Saravia
20 Dive into Deep Learning (Study Group): Modern CNNs | Session 7
Dive into Deep Learning (Study Group): Modern CNNs | Session 7
Elvis Saravia
21 101 ways to solve neural search with Jina
101 ways to solve neural search with Jina
Elvis Saravia
22 (Hopefully-Reusable) Life Lessons for PhD Students in NLP
(Hopefully-Reusable) Life Lessons for PhD Students in NLP
Elvis Saravia
23 How to save the world and forward your career in 5 easy steps | Women in NLP Talks
How to save the world and forward your career in 5 easy steps | Women in NLP Talks
Elvis Saravia
24 Prompt Engineering Overview
Prompt Engineering Overview
Elvis Saravia
25 Getting Started with the OpenAI Playground
Getting Started with the OpenAI Playground
Elvis Saravia
26 LM-Guided Chain of Thought
LM-Guided Chain of Thought
Elvis Saravia
27 Elements of a Prompt
Elements of a Prompt
Elvis Saravia
28 Reasoning with Intermediate Revision and Search with LLMs #chatgpt #ai #llms #science #programming
Reasoning with Intermediate Revision and Search with LLMs #chatgpt #ai #llms #science #programming
Elvis Saravia
29 General Tips for Designing Prompts
General Tips for Designing Prompts
Elvis Saravia
30 Efficient Infinite Context Transformers #ai #machinelearning #research #llms #science
Efficient Infinite Context Transformers #ai #machinelearning #research #llms #science
Elvis Saravia
31 Best Practices and Lessons Learned on Synthetic Data for Language Models #ai #machinelearning #genai
Best Practices and Lessons Learned on Synthetic Data for Language Models #ai #machinelearning #genai
Elvis Saravia
32 Reducing Hallucinations in Structured Outputs via RAG #chatgpt #ai #llms #programming
Reducing Hallucinations in Structured Outputs via RAG #chatgpt #ai #llms #programming
Elvis Saravia
33 Basic Prompt Examples for LLMs
Basic Prompt Examples for LLMs
Elvis Saravia
34 LLM In Context Recall is Prompt Dependent  #llms #ai #chatgpt #machinelearning
LLM In Context Recall is Prompt Dependent #llms #ai #chatgpt #machinelearning
Elvis Saravia
35 Zero-shot Prompting Explained
Zero-shot Prompting Explained
Elvis Saravia
36 RAG Faithfulness #llms #ai #gpt4
RAG Faithfulness #llms #ai #gpt4
Elvis Saravia
37 Understanding LLM Settings
Understanding LLM Settings
Elvis Saravia
38 Llama 3 is here! | First impressions and thoughts
Llama 3 is here! | First impressions and thoughts
Elvis Saravia
39 Llama 3 is Here! #ai #llms #llama3
Llama 3 is Here! #ai #llms #llama3
Elvis Saravia
40 Microsoft introduces Phi-3 | The most capable small language model?
Microsoft introduces Phi-3 | The most capable small language model?
Elvis Saravia
41 Microsoft introduces Phi-3! #ai #llms #microsoft
Microsoft introduces Phi-3! #ai #llms #microsoft
Elvis Saravia
42 Make Your LLM Fully Utilize the Context #ai #llms #machinelearning
Make Your LLM Fully Utilize the Context #ai #llms #machinelearning
Elvis Saravia
43 When to Retrieve? #ai #llms #machinelearning
When to Retrieve? #ai #llms #machinelearning
Elvis Saravia
44 Training an LLM to effectively use information retrieval
Training an LLM to effectively use information retrieval
Elvis Saravia
45 State-of-the-art open-source LLM judges #ai #machinelearning #gpt4
State-of-the-art open-source LLM judges #ai #machinelearning #gpt4
Elvis Saravia
46 Better and Faster LLMs via Multi-token Prediction
Better and Faster LLMs via Multi-token Prediction
Elvis Saravia
47 AlphaMath Almost Zero #ai #science #machinelearning
AlphaMath Almost Zero #ai #science #machinelearning
Elvis Saravia
48 SWE-Agent | An LLM-based Software Engineering Agent
SWE-Agent | An LLM-based Software Engineering Agent
Elvis Saravia
[LLM NEWS] AlphaFold 3, xLSTM, OpenAI's Model Spec, DeepSeek-V2, OpenDevin CodeAct 1.0
[LLM NEWS] AlphaFold 3, xLSTM, OpenAI's Model Spec, DeepSeek-V2, OpenDevin CodeAct 1.0
Elvis Saravia
50 LLM-powered tool for web scraping #ai #chatgpt #engineering
LLM-powered tool for web scraping #ai #chatgpt #engineering
Elvis Saravia
51 Learn about LLMs in this NEW course #ai #chatgpt #engineering
Learn about LLMs in this NEW course #ai #chatgpt #engineering
Elvis Saravia
52 [LLM NEWS] KANs, Gemma 10M Context, OpenAI Updates?, Automatic Prompt Engineering, Tokenizer Arena
[LLM NEWS] KANs, Gemma 10M Context, OpenAI Updates?, Automatic Prompt Engineering, Tokenizer Arena
Elvis Saravia
53 [LLM News] GPT4-o, Project Astra, Veo, Copilot+ PCs, Gemini 1.5 Flash, Chameleon
[LLM News] GPT4-o, Project Astra, Veo, Copilot+ PCs, Gemini 1.5 Flash, Chameleon
Elvis Saravia
54 Enhancing Answer Selection in LLMs #ai #machinelearning #engineering
Enhancing Answer Selection in LLMs #ai #machinelearning #engineering
Elvis Saravia
55 On exploring LLMs #ai #promptengineering #chatgpt
On exploring LLMs #ai #promptengineering #chatgpt
Elvis Saravia
56 Transformers Can Do Arithmetic with the Right Embeddings #ai #machinelearning #engineering
Transformers Can Do Arithmetic with the Right Embeddings #ai #machinelearning #engineering
Elvis Saravia
57 [LLM News] xAI Series B, Codestral, LLM Guide, AutoGen Course, Symbolic Chain-of-Thought
[LLM News] xAI Series B, Codestral, LLM Guide, AutoGen Course, Symbolic Chain-of-Thought
Elvis Saravia
58 PR-Agent #ai #gpt4 #software
PR-Agent #ai #gpt4 #software
Elvis Saravia
59 Extracting features from Claude 3 Sonnet
Extracting features from Claude 3 Sonnet
Elvis Saravia
60 Has prompt engineering been solved?
Has prompt engineering been solved?
Elvis Saravia

Related Reads

📰
I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way
Learn how to effectively find research gaps by changing your approach, a crucial skill for AI researchers and academics
Medium · AI
📰
ICMI 2026 Reviews [D]
Learn how to interpret ICMI 2026 reviews and improve your paper's acceptance chances
Reddit r/MachineLearning
📰
Workshop submission for main conference paper under review [D]
Learn how to navigate submitting a paper to a non-archival workshop before the final decision of a main conference like ECCV
Reddit r/MachineLearning
📰
Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]
Streamline your research with a new Chrome extension and website that integrates 3M papers from arxiv, OpenReview, GitHub, and HuggingFace, including citation graphs and SPECTER2 neighbors, and provide feedback to improve it
Reddit r/MachineLearning

Chapters (10)

AlphaFold - https://blog.google/technology/ai/google-deepmind-isomorphic-alphafo
3:07 OpenAI's Model Spec - https://openai.com/index/introducing-the-model-spec/
5:26 xLSTM - https://arxiv.org/pdf/2405.04517
8:28 SWE-Agent - https://swe-agent.com/paper.pdf
9:54 DeepSeek-V2 - https://twitter.com/deepseek_ai/status/1787478986731429933
10:41 OpenDevin CodeAct 1.0 - https://twitter.com/xingyaow_/status/1787862432888545665
11:21 Building Agentic RAG with LlamaIndex - https://twitter.com/AndrewYNg/status/1788
12:03 OpenAI readying a search product - https://www.businesstimes.com.sg/companies-ma
13:14 MedGemini - https://arxiv.org/pdf/2405.03162
13:57 Advanced Prompting Techniques - https://x.com/cwolferesearch/status/178755364070
Up next
Indians Under House Arrest in America? 😱 Immigration Crisis Explained | SumanTV Classroom
SumanTV Classroom
Watch →