ARC PRIZE - Win $1Million to Beat the ARC-AGI benchmark

Harshit Tyagi · Beginner ·📄 Research Papers Explained ·2y ago

Key Takeaways

The ARC-AGI benchmark is a visual reasoning benchmark data set that requires learning patterns and solving test puzzles, with a $1Million prize for solving the benchmark. The benchmark is designed to test for general intelligence by assessing not just the skill but also skill acquisition, and is evaluated on the percentage of correct predictions for each task.

Full Transcript

suppose that it's the case that in a year a multimodal model can solve Arc let's say get 80% what whatever the average human would get then AGI quite possibly yes hey everyone so Arc AGI has been hyped over the last week as The Benchmark that llms cannot solve now in this video I'll break down what Arc is what the hype is about how you can start solving this Challenge and we'll also cover a promising solution approach so what is this challenge what is arc Arc is abstraction and reasoning corpus now Arc for AGI or Arc hyen AGI this is The Benchmark which actually measures the ability of skill acquisition of a particular algorithm or model now AR price is a $1.1 million public competition to beat and open- Source the solution to the ark AGI Benchmark data set which is hosted by Fran sh who is the creator of Keras and is also a staff software engineer at Google and Mike new who is the co-founder of zapier before we go any further in this video it's very important for all of us to First understand what AGI is so we'll all Converge on one definition of AGI because that is the basis of this entire challenge so most people think that AGI is a system that can automate you know a large chunk of economically valuable work but that isn't the right way to look at it because it is primarily focusing on replacing humans from the system the currect definition is Agi is a system that can efficiently acquire new skills and solve open-ended problems now this definition focuses on augmenting human intelligence to further invent and discover alongside humans with minimal input from us so what's different about this Benchmark data set up until now we know that llms have been trained on unimaginably was amounts of data and yet they have been unable to solve or adapt to new problems that were not part of their training data set so they fail at learning new skills or solving open-ended problems now that's why Fran says that egi progress has talled and we need new ideas and if you need new ideas you should first Define what the problem is all about and that's where we go back to the correct definition of AGI focusing on the skill to learn new skills and solve open-ended problems without any explicit instruction so comes a new Benchmark data set which is designed to measure the ability of an AI system to learn new skills and solve tasks without any explicit instruction most of the AI benchmarks today measure skill but skill alone is not intelligence the ability to learn a new skill is what is called intelligence and that's what humans are good at with our general ability so most of the AI benchmarks if you look at it for example H swag now they evaluate llms on Common Sense natural language inference if you talk about MML okay measure in massive multitask language understanding they test lm's understanding across diverse subjects human eval they assess lm's ability to write functional code based on the instructions but general intelligence is the ability to efficiently acquire new skills and AR AGI is the only unique formal Benchmark for AGI that tests for general intelligence by assessing not just the skill but also skill acquisition so AR AGI is a visual reasoning Benchmark data set that requires us to learn the pattern and then solve the test puzzle and they provided a playground page in order to understand what these tasks are like the complete data set consists of unique training and evaluation tasks and if you look at the playground page over here each task consists of these sort of input output pairs which are basically puzzles and I need to learn from these examples and then fill my test output grid and within the output grid I also need to pick the right dimension of the output grid so let's solve this in order to understand what this puzzle is all about from the examples I need to learn the pattern so if you look at this input over here and this is my 6x6 output the pattern is that this 2x2 grid is placed over here and we continue to place these grids and if you see the last column over here continues in the next row okay let's see if in second example this holds up I place the complete grid okay in my 6x6 output grid and then this over here this column in my 2x2 grid continues in my second row as well so first of all over here if you want you can select it from the input okay copy from input you can do that also that from this input I need all of this over here and you can resize this grid so I need to pick the right output grid so right now my output grid should be 6X 6 so I'll resize this over here I'll continue to fill my output grid based on this let's complete all the greens first then we'll complete all the orange and finally I will pick my blue and then the last one is the red all the remaining ones are red so this is going to be my output based on the pattern that I have learned now in order to check whether I've done it right or not I will submit my solution over here and here you go correct try the next puzzle so this is how my data set is structured it is filled with such now the complete source of Truth data set is present on this repository GitHub repository Arc AGI from Fran chal and here you can learn about the complete structure of the data now you will have to download clone this repository in order to play around with it and you know continue working on top of this the data directory consists of two subd directories okay training which contains task file for training 400 tasks and similarly evaluation data set also contains 400 such tasks where each task contains you know three to five examples where each example is input output Pairs and similarly you will have one typically one uh test output that you'll have to create so if you go over this data folder you have evaluation and training and in these evaluation all of these puzzles have been provided to you in a Json format so if you see the train I have input I have then output as a list over here okay now Matrix is nothing but list of lists and that's what we have over here output looks something like this and similarly I have this train and test directly over here this is one single task if you count all these These are 400 tasks in my evaluation data set similarly in the training data set also you have such 400 tasks now to start solving this challenge first of all I would like to clone this repository okay copy the URL okay and write get clone and add a DOT here so I've created a dedicated folder and if I see I have all these files over here now I'll open this folder AR AGI so when you clone this you have the data folder where you have training data set and evaluation data set if you open it all the files are going to be over here you can format this document and you can then go through the training input output as well as test out input and output uh puzzles okay all of these have been structured in a Json format and provided to you if you see you also have this apps folder and in this apps I have this testing interface. HTML now I can test the interface on my local system as well which will help me build my solution and test it here itself so I'll run uh an HTTP server python minus M HTTP dos server now I'll go to the this link and here directory listing for all of these files my complete folder so go to apps and then open up your testing interface. HTML and here you can click on any random task we can pick any random task from the Json so here you have task demonstration and that same interface that you looked at over here on the playground page you can test it directly over here so you can build your solution within this folder and then you know keep testing it challenge is being hosted on C so complete overview of the challenge the evaluation process what the submission file and the format of the submission file all of those details have been explained over here for evaluation the competition evaluates submissions on the percentage of correct predictions and for each task you have to predict exactly two outputs for every test input grid which is contained in the task okay and you have to keep in mind that there is this training and evaluation data set and at the same time there is a private heldout data set as well for which the team is going to assess your algorithm's capability and uh performance and combine together the average is going to be your final score now your ultimate goal over here should be to score 85% on this challenge in order to win the $500,000 the big pricee now after learning all this you must be wondering what should be my starting point okay and what should be the solution approach that I should explore first so in order to save you some time they have also shared a bunch of approaches that have worked well and have led to the current state of the art so first of all they mentioned discrete program search which has worked really well uh for them and this turned up in the araton the a hackathon which was you know conducted by lab 42 back in 2020 this involves searching through a massive program space in a discrete step-by-step manner similarly they have on soble Solutions direct llm prompting domain specific language program synthesis and besides this there has been another submission Ryan green blat he has already scored 50% state-ofthe-art on Arc AGI with GPT 4 so let's look at what Ryan has done first of all you have to provide the problem details to the model that you're using the llm that you're using and Ryan has used GPT 4 to solve this task so present the ark AGI problem with both image and detailed text representation of each grid then you have to guide GPD 4 to understand the necessary transformation so you have to learn the Transformations needed and then you have to code for it so he's used detailed few short prompt with step-by-step reasoning examples to help GPD 40 finally once you have used different prompts which is for you know different grid sizes and few short prompts that you have have crafted they both go into these Ensemble prompt combine the output from multiple pairs of few short prompts to enhance the accuracy then he samples once these code programs are generated then he samples approximately 5,000 completions per problem from GPT 40 so he generates many completions and he selects and fixes those completions so you choose the top 12 completions and ask gbd4 to revise them based on the actual outputs and finally once sample attempts to fix then again few short prompts for revision include the text representation and then you have the final submission selection where you select the three final submissions based on a majority vote over the correct programs and finally there is a htics uh if you need it okay so he created it but it's not actually required so select the three submissions from him all right so all in all a very complex and at the same time very very interesting problem to solve and Fran believes that this can get us to AGI which makes it all the more interesting and hype worthy I would say and if you want to participate you can participate alone or with a team anybody can participate and submit their solution and if you are interested if this got you pumped I would wish you nothing but the best they have a Discord channel so do join join that try to learn from other participants you can interact with them and uh yeah uh check out the kaggle code notebooks the Eda that people have done on the data set that would be really helpful so for all those who are participating all the very best and I'll catch you guys in the next one

Original Description

In 2019, François Chollet - creator of Keras, an open-source deep learning library adopted by over 2.5M developers, and Software Engineer & AI Researcher at Google - published the influential paper "On the Measure of Intelligence" where he introduced a benchmark to measure the efficiency of AI skill-acquisition on unknown tasks. Abstraction and Reasoning Corpus (ARC-AGI) Dwarkesh's episode with Francois: https://www.youtube.com/watch?v=UakqL6Pj9xo&t=1978s ARC Prize website: https://arcprize.org/ GitHub: https://github.com/fchollet/ARC-AGI Kaggle: https://www.kaggle.com/competitions/arc-prize-2024/overview Link to Ryan's approach: https://redwoodresearch.substack.com/p/getting-50-sota-on-arc-agi-with-gpt ## Subscribe to my Newsletter to stay up to date on such updates in the world of AI - High Signal AI Newsletter: https://highsignalai.substack.com/ - High Signal AI Instagram: https://www.instagram.com/highsignal_ai/ ## AI Engineer Roadmap - Roadmap video: https://youtu.be/br8u4JwXMBU - Roadmap GitHub (don't forget to leave a star): https://github.com/dswh/ai-engineer-roadmap ## Social Media & Discord Server Invitation Follow me for more AI Engineering resources, tutorials, and reviews: - LinkedIn: https://www.linkedin.com/in/tyagiharshit/ - X / Twitter: https://twitter.com/dswharshit - Join the Discord community for ideas, discussion, reviews and more: https://discord.gg/rssxJV2Xkz ## Chapters: 0:00 Francois on Dwarkesh's episode 00:29 🌟 Introduction to ARC AGI and its significance 02:18 🧩 Structure and purpose of the ARC AGI benchmark 04:20 🎲 Understanding the ARC AGI dataset through examples 06:36 📂 Accessing and using the ARC AGI dataset 08:27 🖥️ Setup for ARC AGI 10:33 🏆 Solution approaches for succeeding in ARC AGI 13:16 🚀 Participation and community engagement in ARC AGI
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Harshit Tyagi · Harshit Tyagi · 54 of 60

1 Your PATH to learning Data Science
Your PATH to learning Data Science
Harshit Tyagi
2 Ideal Python environment setup for Data Science projects - Unix shell, Anaconda and Git.
Ideal Python environment setup for Data Science projects - Unix shell, Anaconda and Git.
Harshit Tyagi
3 Building COVID-19 interactive dashboard from Jupyter Notebook | No frontend/backend coding required.
Building COVID-19 interactive dashboard from Jupyter Notebook | No frontend/backend coding required.
Harshit Tyagi
4 Introduction to Jupyter Notebooks - Interface | Ipython Kernel | Sharing | GitHub
Introduction to Jupyter Notebooks - Interface | Ipython Kernel | Sharing | GitHub
Harshit Tyagi
5 Python fundamentals for Data Science - Part  1 | Data types | Strings | Lists
Python fundamentals for Data Science - Part 1 | Data types | Strings | Lists
Harshit Tyagi
6 Python fundamentals for Data Science - Part 2 Dictionaries | Conditionals | Loops | Functions
Python fundamentals for Data Science - Part 2 Dictionaries | Conditionals | Loops | Functions
Harshit Tyagi
7 Python fundamentals for Data Science - Part 3 OOPS | Working with External Libraries & Modules
Python fundamentals for Data Science - Part 3 OOPS | Working with External Libraries & Modules
Harshit Tyagi
8 NumPy Essentials for Data Science - part-1 | One Dimensional Array
NumPy Essentials for Data Science - part-1 | One Dimensional Array
Harshit Tyagi
9 NumPy Essentials for Data Science - part-2 | Multi-Dimensional Array
NumPy Essentials for Data Science - part-2 | Multi-Dimensional Array
Harshit Tyagi
10 Math For Data Science | Practical reasons to learn math for Machine/Deep Learning
Math For Data Science | Practical reasons to learn math for Machine/Deep Learning
Harshit Tyagi
11 Linear Algebra Ep 1 | Introduction to Vectors, Matrices and Tensors using NumPy
Linear Algebra Ep 1 | Introduction to Vectors, Matrices and Tensors using NumPy
Harshit Tyagi
12 Linear Algebra Ep 2 | Dot Product in Linear Algebra for Data Science
Linear Algebra Ep 2 | Dot Product in Linear Algebra for Data Science
Harshit Tyagi
13 Python vs R | The BEST programming language for your Data Science Project
Python vs R | The BEST programming language for your Data Science Project
Harshit Tyagi
14 Linear Algebra for Data Science Ep3 | Identity and Inverse Matrices | NumPy
Linear Algebra for Data Science Ep3 | Identity and Inverse Matrices | NumPy
Harshit Tyagi
15 The Data Show Ep1 | Elucidating Data Science in Drug Discovery - A CTO's Account
The Data Show Ep1 | Elucidating Data Science in Drug Discovery - A CTO's Account
Harshit Tyagi
16 Google Certified TensorFlow Developer | Learning Plan, Tips, FAQs & my Journey
Google Certified TensorFlow Developer | Learning Plan, Tips, FAQs & my Journey
Harshit Tyagi
17 Speeding up your Data Analysis | Hacks & Libraries
Speeding up your Data Analysis | Hacks & Libraries
Harshit Tyagi
18 How to build an Effective Data Science Portfolio
How to build an Effective Data Science Portfolio
Harshit Tyagi
19 End-to-End Machine Learning Project Tutorial - Part 1
End-to-End Machine Learning Project Tutorial - Part 1
Harshit Tyagi
20 Data Preparation with Sci-kit learn and Pandas | End-to-End ML Project Tutorial - Part 2
Data Preparation with Sci-kit learn and Pandas | End-to-End ML Project Tutorial - Part 2
Harshit Tyagi
21 Training and Fine-Tuning ML Models with Sklearn | End-to-End ML Project Tutorial - Part 3
Training and Fine-Tuning ML Models with Sklearn | End-to-End ML Project Tutorial - Part 3
Harshit Tyagi
22 Deploying a Trained ML model via Flask on Heroku | End-to-End ML Project Tutorial - Part 4
Deploying a Trained ML model via Flask on Heroku | End-to-End ML Project Tutorial - Part 4
Harshit Tyagi
23 Three Decades of Practising Data Science | Interview with Dean Abbott
Three Decades of Practising Data Science | Interview with Dean Abbott
Harshit Tyagi
24 Calculating Vector Norms - Linear Algebra for Data Science - IV
Calculating Vector Norms - Linear Algebra for Data Science - IV
Harshit Tyagi
25 Ep1 - Getting Started | Zero to Hero in Computer Vision with TensorFlow
Ep1 - Getting Started | Zero to Hero in Computer Vision with TensorFlow
Harshit Tyagi
26 Ep3 - Designing Data Experiments to enhance your Product | Rapido's Data Science Lead, Pramod N
Ep3 - Designing Data Experiments to enhance your Product | Rapido's Data Science Lead, Pramod N
Harshit Tyagi
27 Building projects with fastai - From Model Training to Deployment
Building projects with fastai - From Model Training to Deployment
Harshit Tyagi
28 October AI - Video Calling with One-Tenth of Internet Bandwidth
October AI - Video Calling with One-Tenth of Internet Bandwidth
Harshit Tyagi
29 November AI - Breakthrough in biology after 50 years | Datasets, books, research papers and more...
November AI - Breakthrough in biology after 50 years | Datasets, books, research papers and more...
Harshit Tyagi
30 Data Science learning roadmap for 2021
Data Science learning roadmap for 2021
Harshit Tyagi
31 Talk is cheap, BUILD - Microsoft Software Engineer | Interview with Abhirath Batra
Talk is cheap, BUILD - Microsoft Software Engineer | Interview with Abhirath Batra
Harshit Tyagi
32 Building a Habit of Reading Research Papers | Ft. Anurag Ghosh(Microsoft Researcher)
Building a Habit of Reading Research Papers | Ft. Anurag Ghosh(Microsoft Researcher)
Harshit Tyagi
33 Tableau vs Python - Building a COVID tracker dashboard
Tableau vs Python - Building a COVID tracker dashboard
Harshit Tyagi
34 [Explained] What is MLOps | Getting started with ML Engineering
[Explained] What is MLOps | Getting started with ML Engineering
Harshit Tyagi
35 Dmitry Petrov - Creator of DVC | ML Systems, Teams, Scaling challenges, and Learning Data Science
Dmitry Petrov - Creator of DVC | ML Systems, Teams, Scaling challenges, and Learning Data Science
Harshit Tyagi
36 Five hard truths about building a career in Data Science
Five hard truths about building a career in Data Science
Harshit Tyagi
37 Computing gradients using TensorFlow | Training a Linear Regression model from scratch.
Computing gradients using TensorFlow | Training a Linear Regression model from scratch.
Harshit Tyagi
38 Foundations for Data Science & ML - First steps for every beginner!
Foundations for Data Science & ML - First steps for every beginner!
Harshit Tyagi
39 Course Outline - Foundations for Data Science & ML
Course Outline - Foundations for Data Science & ML
Harshit Tyagi
40 How Machine Learning uses Linear Algebra to solve data problems
How Machine Learning uses Linear Algebra to solve data problems
Harshit Tyagi
41 Calculus for ML - How much you should know to get started
Calculus for ML - How much you should know to get started
Harshit Tyagi
42 Building a buzzing stocks news feed using NLP and Streamlit | Named Entity Recognition & Linking
Building a buzzing stocks news feed using NLP and Streamlit | Named Entity Recognition & Linking
Harshit Tyagi
43 AI Engineer - The next big tech role!
AI Engineer - The next big tech role!
Harshit Tyagi
44 AI researcher vs AI engineer | The next big tech role!
AI researcher vs AI engineer | The next big tech role!
Harshit Tyagi
45 Reviewing LLMs for content creation
Reviewing LLMs for content creation
Harshit Tyagi
46 Building a chatGPT-like bot on WhatsApp #coding  #chatgpt #engineering
Building a chatGPT-like bot on WhatsApp #coding #chatgpt #engineering
Harshit Tyagi
47 High Signal AI - the most action-oriented newsletter on the web! #ai
High Signal AI - the most action-oriented newsletter on the web! #ai
Harshit Tyagi
48 Building an AI-powered Discord Chatbot Locally for FREE using Ollama
Building an AI-powered Discord Chatbot Locally for FREE using Ollama
Harshit Tyagi
49 Build a second brain with Khoj 🧠  #ai #obsidian #plugins #productivity #engineering #notes
Build a second brain with Khoj 🧠 #ai #obsidian #plugins #productivity #engineering #notes
Harshit Tyagi
50 Summarising YouTube Videos using Ollama on Discord | Becoming an AI Engineer - Ep 2
Summarising YouTube Videos using Ollama on Discord | Becoming an AI Engineer - Ep 2
Harshit Tyagi
51 Watch the full video on my channel - Roadmap to become an AI Engineer.
Watch the full video on my channel - Roadmap to become an AI Engineer.
Harshit Tyagi
52 Mesop - Python-based UI framework from Google!
Mesop - Python-based UI framework from Google!
Harshit Tyagi
53 How I automated my YouTube | Gumloop tutorial | No Code
How I automated my YouTube | Gumloop tutorial | No Code
Harshit Tyagi
ARC PRIZE - Win $1Million to Beat the ARC-AGI benchmark
ARC PRIZE - Win $1Million to Beat the ARC-AGI benchmark
Harshit Tyagi
55 Microsoft's Autogen vs CrewAI - tested on a diverse range of use cases
Microsoft's Autogen vs CrewAI - tested on a diverse range of use cases
Harshit Tyagi
56 Claude #AI artifacts are just amazing!
Claude #AI artifacts are just amazing!
Harshit Tyagi
57 OpenAI releases CriticGPT to correct GPT-4's mistakes | Read the paper with me
OpenAI releases CriticGPT to correct GPT-4's mistakes | Read the paper with me
Harshit Tyagi
58 Day in my life | Vlog #1
Day in my life | Vlog #1
Harshit Tyagi
59 How to add AI Copilot to your application using CopilotKit | Tutorial
How to add AI Copilot to your application using CopilotKit | Tutorial
Harshit Tyagi
60 Quick Questions with an AI Founder - Anudeep Yegireddi
Quick Questions with an AI Founder - Anudeep Yegireddi
Harshit Tyagi

The ARC-AGI benchmark is a visual reasoning benchmark data set that requires learning patterns and solving test puzzles. The benchmark is designed to test for general intelligence by assessing not just the skill but also skill acquisition. The challenge requires predicting exactly two outputs for every test input grid contained in the task, and the evaluation process combines the average of the training and evaluation data set with the private heldout data set to determine the final score.

Key Takeaways
  1. Clone the repository
  2. Download the data directory
  3. Open the data folder
  4. Format the document
  5. Run an HTTP server
  6. Provide problem details to the model
  7. Guide the model to understand necessary transformations
  8. Code for the task
  9. Use detailed prompts with step-by-step reasoning examples
  10. Ensemble prompts
💡 The ARC-AGI benchmark is a challenging task that requires a deep understanding of visual reasoning and general intelligence, and the use of retrieval augmented generation and fine-tuning can enhance the accuracy of the model.

Related AI Lessons

I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way
Learn how to effectively find research gaps by changing your approach, a crucial skill for AI researchers and academics
Medium · AI
ICMI 2026 Reviews [D]
Learn how to interpret ICMI 2026 reviews and improve your paper's acceptance chances
Reddit r/MachineLearning
Workshop submission for main conference paper under review [D]
Learn how to navigate submitting a paper to a non-archival workshop before the final decision of a main conference like ECCV
Reddit r/MachineLearning
Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]
Streamline your research with a new Chrome extension and website that integrates 3M papers from arxiv, OpenReview, GitHub, and HuggingFace, including citation graphs and SPECTER2 neighbors, and provide feedback to improve it
Reddit r/MachineLearning

Chapters (8)

Francois on Dwarkesh's episode
0:29 🌟 Introduction to ARC AGI and its significance
2:18 🧩 Structure and purpose of the ARC AGI benchmark
4:20 🎲 Understanding the ARC AGI dataset through examples
6:36 📂 Accessing and using the ARC AGI dataset
8:27 🖥️ Setup for ARC AGI
10:33 🏆 Solution approaches for succeeding in ARC AGI
13:16 🚀 Participation and community engagement in ARC AGI
Up next
Beyond Big Vendors: ERP Systems Explained #shorts
Digital Transformation with Eric Kimberling
Watch →