ARC PRIZE - Win $1Million to Beat the ARC-AGI benchmark

Harshit Tyagi · Beginner ·📄 Research Papers Explained ·2y ago

Skills: Research Methods90%Reading ML Papers90%LLM Foundations80%Paper Reproduction80%RAG Basics70%

Key Takeaways

The ARC-AGI benchmark is a visual reasoning benchmark data set that requires learning patterns and solving test puzzles, with a $1Million prize for solving the benchmark. The benchmark is designed to test for general intelligence by assessing not just the skill but also skill acquisition, and is evaluated on the percentage of correct predictions for each task.

Full Transcript

suppose that it's the case that in a year a multimodal model can solve Arc let's say get 80% what whatever the average human would get then AGI quite possibly yes hey everyone so Arc AGI has been hyped over the last week as The Benchmark that llms cannot solve now in this video I'll break down what Arc is what the hype is about how you can start solving this Challenge and we'll also cover a promising solution approach so what is this challenge what is arc Arc is abstraction and reasoning corpus now Arc for AGI or Arc hyen AGI this is The Benchmark which actually measures the ability of skill acquisition of a particular algorithm or model now AR price is a $1.1 million public competition to beat and open- Source the solution to the ark AGI Benchmark data set which is hosted by Fran sh who is the creator of Keras and is also a staff software engineer at Google and Mike new who is the co-founder of zapier before we go any further in this video it's very important for all of us to First understand what AGI is so we'll all Converge on one definition of AGI because that is the basis of this entire challenge so most people think that AGI is a system that can automate you know a large chunk of economically valuable work but that isn't the right way to look at it because it is primarily focusing on replacing humans from the system the currect definition is Agi is a system that can efficiently acquire new skills and solve open-ended problems now this definition focuses on augmenting human intelligence to further invent and discover alongside humans with minimal input from us so what's different about this Benchmark data set up until now we know that llms have been trained on unimaginably was amounts of data and yet they have been unable to solve or adapt to new problems that were not part of their training data set so they fail at learning new skills or solving open-ended problems now that's why Fran says that egi progress has talled and we need new ideas and if you need new ideas you should first Define what the problem is all about and that's where we go back to the correct definition of AGI focusing on the skill to learn new skills and solve open-ended problems without any explicit instruction so comes a new Benchmark data set which is designed to measure the ability of an AI system to learn new skills and solve tasks without any explicit instruction most of the AI benchmarks today measure skill but skill alone is not intelligence the ability to learn a new skill is what is called intelligence and that's what humans are good at with our general ability so most of the AI benchmarks if you look at it for example H swag now they evaluate llms on Common Sense natural language inference if you talk about MML okay measure in massive multitask language understanding they test lm's understanding across diverse subjects human eval they assess lm's ability to write functional code based on the instructions but general intelligence is the ability to efficiently acquire new skills and AR AGI is the only unique formal Benchmark for AGI that tests for general intelligence by assessing not just the skill but also skill acquisition so AR AGI is a visual reasoning Benchmark data set that requires us to learn the pattern and then solve the test puzzle and they provided a playground page in order to understand what these tasks are like the complete data set consists of unique training and evaluation tasks and if you look at the playground page over here each task consists of these sort of input output pairs which are basically puzzles and I need to learn from these examples and then fill my test output grid and within the output grid I also need to pick the right dimension of the output grid so let's solve this in order to understand what this puzzle is all about from the examples I need to learn the pattern so if you look at this input over here and this is my 6x6 output the pattern is that this 2x2 grid is placed over here and we continue to place these grids and if you see the last column over here continues in the next row okay let's see if in second example this holds up I place the complete grid okay in my 6x6 output grid and then this over here this column in my 2x2 grid continues in my second row as well so first of all over here if you want you can select it from the input okay copy from input you can do that also that from this input I need all of this over here and you can resize this grid so I need to pick the right output grid so right now my output grid should be 6X 6 so I'll resize this over here I'll continue to fill my output grid based on this let's complete all the greens first then we'll complete all the orange and finally I will pick my blue and then the last one is the red all the remaining ones are red so this is going to be my output based on the pattern that I have learned now in order to check whether I've done it right or not I will submit my solution over here and here you go correct try the next puzzle so this is how my data set is structured it is filled with such now the complete source of Truth data set is present on this repository GitHub repository Arc AGI from Fran chal and here you can learn about the complete structure of the data now you will have to download clone this repository in order to play around with it and you know continue working on top of this the data directory consists of two subd directories okay training which contains task file for training 400 tasks and similarly evaluation data set also contains 400 such tasks where each task contains you know three to five examples where each example is input output Pairs and similarly you will have one typically one uh test output that you'll have to create so if you go over this data folder you have evaluation and training and in these evaluation all of these puzzles have been provided to you in a Json format so if you see the train I have input I have then output as a list over here okay now Matrix is nothing but list of lists and that's what we have over here output looks something like this and similarly I have this train and test directly over here this is one single task if you count all these These are 400 tasks in my evaluation data set similarly in the training data set also you have such 400 tasks now to start solving this challenge first of all I would like to clone this repository okay copy the URL okay and write get clone and add a DOT here so I've created a dedicated folder and if I see I have all these files over here now I'll open this folder AR AGI so when you clone this you have the data folder where you have training data set and evaluation data set if you open it all the files are going to be over here you can format this document and you can then go through the training input output as well as test out input and output uh puzzles okay all of these have been structured in a Json format and provided to you if you see you also have this apps folder and in this apps I have this testing interface. HTML now I can test the interface on my local system as well which will help me build my solution and test it here itself so I'll run uh an HTTP server python minus M HTTP dos server now I'll go to the this link and here directory listing for all of these files my complete folder so go to apps and then open up your testing interface. HTML and here you can click on any random task we can pick any random task from the Json so here you have task demonstration and that same interface that you looked at over here on the playground page you can test it directly over here so you can build your solution within this folder and then you know keep testing it challenge is being hosted on C so complete overview of the challenge the evaluation process what the submission file and the format of the submission file all of those details have been explained over here for evaluation the competition evaluates submissions on the percentage of correct predictions and for each task you have to predict exactly two outputs for every test input grid which is contained in the task okay and you have to keep in mind that there is this training and evaluation data set and at the same time there is a private heldout data set as well for which the team is going to assess your algorithm's capability and uh performance and combine together the average is going to be your final score now your ultimate goal over here should be to score 85% on this challenge in order to win the $500,000 the big pricee now after learning all this you must be wondering what should be my starting point okay and what should be the solution approach that I should explore first so in order to save you some time they have also shared a bunch of approaches that have worked well and have led to the current state of the art so first of all they mentioned discrete program search which has worked really well uh for them and this turned up in the araton the a hackathon which was you know conducted by lab 42 back in 2020 this involves searching through a massive program space in a discrete step-by-step manner similarly they have on soble Solutions direct llm prompting domain specific language program synthesis and besides this there has been another submission Ryan green blat he has already scored 50% state-ofthe-art on Arc AGI with GPT 4 so let's look at what Ryan has done first of all you have to provide the problem details to the model that you're using the llm that you're using and Ryan has used GPT 4 to solve this task so present the ark AGI problem with both image and detailed text representation of each grid then you have to guide GPD 4 to understand the necessary transformation so you have to learn the Transformations needed and then you have to code for it so he's used detailed few short prompt with step-by-step reasoning examples to help GPD 40 finally once you have used different prompts which is for you know different grid sizes and few short prompts that you have have crafted they both go into these Ensemble prompt combine the output from multiple pairs of few short prompts to enhance the accuracy then he samples once these code programs are generated then he samples approximately 5,000 completions per problem from GPT 40 so he generates many completions and he selects and fixes those completions so you choose the top 12 completions and ask gbd4 to revise them based on the actual outputs and finally once sample attempts to fix then again few short prompts for revision include the text representation and then you have the final submission selection where you select the three final submissions based on a majority vote over the correct programs and finally there is a htics uh if you need it okay so he created it but it's not actually required so select the three submissions from him all right so all in all a very complex and at the same time very very interesting problem to solve and Fran believes that this can get us to AGI which makes it all the more interesting and hype worthy I would say and if you want to participate you can participate alone or with a team anybody can participate and submit their solution and if you are interested if this got you pumped I would wish you nothing but the best they have a Discord channel so do join join that try to learn from other participants you can interact with them and uh yeah uh check out the kaggle code notebooks the Eda that people have done on the data set that would be really helpful so for all those who are participating all the very best and I'll catch you guys in the next one

Original Description

In 2019, François Chollet - creator of Keras, an open-source deep learning library adopted by over 2.5M developers, and Software Engineer & AI Researcher at Google - published the influential paper "On the Measure of Intelligence" where he introduced a benchmark to measure the efficiency of AI skill-acquisition on unknown tasks. Abstraction and Reasoning Corpus (ARC-AGI) Dwarkesh's episode with Francois: https://www.youtube.com/watch?v=UakqL6Pj9xo&t=1978s ARC Prize website: https://arcprize.org/ GitHub: https://github.com/fchollet/ARC-AGI Kaggle: https://www.kaggle.com/competitions/arc-prize-2024/overview Link to Ryan's approach: https://redwoodresearch.substack.com/p/getting-50-sota-on-arc-agi-with-gpt ## Subscribe to my Newsletter to stay up to date on such updates in the world of AI - High Signal AI Newsletter: https://highsignalai.substack.com/ - High Signal AI Instagram: https://www.instagram.com/highsignal_ai/ ## AI Engineer Roadmap - Roadmap video: https://youtu.be/br8u4JwXMBU - Roadmap GitHub (don't forget to leave a star): https://github.com/dswh/ai-engineer-roadmap ## Social Media & Discord Server Invitation Follow me for more AI Engineering resources, tutorials, and reviews: - LinkedIn: https://www.linkedin.com/in/tyagiharshit/ - X / Twitter: https://twitter.com/dswharshit - Join the Discord community for ideas, discussion, reviews and more: https://discord.gg/rssxJV2Xkz ## Chapters: 0:00 Francois on Dwarkesh's episode 00:29 🌟 Introduction to ARC AGI and its significance 02:18 🧩 Structure and purpose of the ARC AGI benchmark 04:20 🎲 Understanding the ARC AGI dataset through examples 06:36 📂 Accessing and using the ARC AGI dataset 08:27 🖥️ Setup for ARC AGI 10:33 🏆 Solution approaches for succeeding in ARC AGI 13:16 🚀 Participation and community engagement in ARC AGI

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Harshit Tyagi · Harshit Tyagi · 54 of 60

← Previous Next →

Your PATH to learning Data Science

Your PATH to learning Data Science

Ideal Python environment setup for Data Science projects - Unix shell, Anaconda and Git.

Ideal Python environment setup for Data Science projects - Unix shell, Anaconda and Git.

Building COVID-19 interactive dashboard from Jupyter Notebook | No frontend/backend coding required.

Building COVID-19 interactive dashboard from Jupyter Notebook | No frontend/backend coding required.

Introduction to Jupyter Notebooks - Interface | Ipython Kernel | Sharing | GitHub

Introduction to Jupyter Notebooks - Interface | Ipython Kernel | Sharing | GitHub

Python fundamentals for Data Science - Part 1 | Data types | Strings | Lists

Python fundamentals for Data Science - Part 1 | Data types | Strings | Lists

Python fundamentals for Data Science - Part 2 Dictionaries | Conditionals | Loops | Functions

Python fundamentals for Data Science - Part 2 Dictionaries | Conditionals | Loops | Functions

Python fundamentals for Data Science - Part 3 OOPS | Working with External Libraries & Modules

Python fundamentals for Data Science - Part 3 OOPS | Working with External Libraries & Modules

NumPy Essentials for Data Science - part-1 | One Dimensional Array

NumPy Essentials for Data Science - part-1 | One Dimensional Array

NumPy Essentials for Data Science - part-2 | Multi-Dimensional Array

NumPy Essentials for Data Science - part-2 | Multi-Dimensional Array

Math For Data Science | Practical reasons to learn math for Machine/Deep Learning

Math For Data Science | Practical reasons to learn math for Machine/Deep Learning

Linear Algebra Ep 1 | Introduction to Vectors, Matrices and Tensors using NumPy

Linear Algebra Ep 1 | Introduction to Vectors, Matrices and Tensors using NumPy

Linear Algebra Ep 2 | Dot Product in Linear Algebra for Data Science

Linear Algebra Ep 2 | Dot Product in Linear Algebra for Data Science

Python vs R | The BEST programming language for your Data Science Project

Python vs R | The BEST programming language for your Data Science Project

Linear Algebra for Data Science Ep3 | Identity and Inverse Matrices | NumPy

Linear Algebra for Data Science Ep3 | Identity and Inverse Matrices | NumPy

The Data Show Ep1 | Elucidating Data Science in Drug Discovery - A CTO's Account

The Data Show Ep1 | Elucidating Data Science in Drug Discovery - A CTO's Account

Google Certified TensorFlow Developer | Learning Plan, Tips, FAQs & my Journey

Google Certified TensorFlow Developer | Learning Plan, Tips, FAQs & my Journey

Speeding up your Data Analysis | Hacks & Libraries

Speeding up your Data Analysis | Hacks & Libraries

How to build an Effective Data Science Portfolio

How to build an Effective Data Science Portfolio

End-to-End Machine Learning Project Tutorial - Part 1

End-to-End Machine Learning Project Tutorial - Part 1

Data Preparation with Sci-kit learn and Pandas | End-to-End ML Project Tutorial - Part 2

Data Preparation with Sci-kit learn and Pandas | End-to-End ML Project Tutorial - Part 2

Training and Fine-Tuning ML Models with Sklearn | End-to-End ML Project Tutorial - Part 3

Training and Fine-Tuning ML Models with Sklearn | End-to-End ML Project Tutorial - Part 3

Deploying a Trained ML model via Flask on Heroku | End-to-End ML Project Tutorial - Part 4

Deploying a Trained ML model via Flask on Heroku | End-to-End ML Project Tutorial - Part 4

Three Decades of Practising Data Science | Interview with Dean Abbott

Three Decades of Practising Data Science | Interview with Dean Abbott

Calculating Vector Norms - Linear Algebra for Data Science - IV

Calculating Vector Norms - Linear Algebra for Data Science - IV

Ep1 - Getting Started | Zero to Hero in Computer Vision with TensorFlow

Ep1 - Getting Started | Zero to Hero in Computer Vision with TensorFlow

Ep3 - Designing Data Experiments to enhance your Product | Rapido's Data Science Lead, Pramod N

Ep3 - Designing Data Experiments to enhance your Product | Rapido's Data Science Lead, Pramod N

Building projects with fastai - From Model Training to Deployment

Building projects with fastai - From Model Training to Deployment

October AI - Video Calling with One-Tenth of Internet Bandwidth

October AI - Video Calling with One-Tenth of Internet Bandwidth

November AI - Breakthrough in biology after 50 years | Datasets, books, research papers and more...

November AI - Breakthrough in biology after 50 years | Datasets, books, research papers and more...

Data Science learning roadmap for 2021

Data Science learning roadmap for 2021

Talk is cheap, BUILD - Microsoft Software Engineer | Interview with Abhirath Batra

Talk is cheap, BUILD - Microsoft Software Engineer | Interview with Abhirath Batra

Building a Habit of Reading Research Papers | Ft. Anurag Ghosh(Microsoft Researcher)

Building a Habit of Reading Research Papers | Ft. Anurag Ghosh(Microsoft Researcher)

Tableau vs Python - Building a COVID tracker dashboard

Tableau vs Python - Building a COVID tracker dashboard

[Explained] What is MLOps | Getting started with ML Engineering

[Explained] What is MLOps | Getting started with ML Engineering

Dmitry Petrov - Creator of DVC | ML Systems, Teams, Scaling challenges, and Learning Data Science

Dmitry Petrov - Creator of DVC | ML Systems, Teams, Scaling challenges, and Learning Data Science

Five hard truths about building a career in Data Science

Five hard truths about building a career in Data Science

Computing gradients using TensorFlow | Training a Linear Regression model from scratch.

Computing gradients using TensorFlow | Training a Linear Regression model from scratch.

Foundations for Data Science & ML - First steps for every beginner!

Foundations for Data Science & ML - First steps for every beginner!

Course Outline - Foundations for Data Science & ML

Course Outline - Foundations for Data Science & ML

How Machine Learning uses Linear Algebra to solve data problems

How Machine Learning uses Linear Algebra to solve data problems

Calculus for ML - How much you should know to get started

Calculus for ML - How much you should know to get started

Building a buzzing stocks news feed using NLP and Streamlit | Named Entity Recognition & Linking

Building a buzzing stocks news feed using NLP and Streamlit | Named Entity Recognition & Linking

AI Engineer - The next big tech role!

AI Engineer - The next big tech role!

AI researcher vs AI engineer | The next big tech role!

AI researcher vs AI engineer | The next big tech role!

Reviewing LLMs for content creation

Reviewing LLMs for content creation

Building a chatGPT-like bot on WhatsApp #coding #chatgpt #engineering

Building a chatGPT-like bot on WhatsApp #coding #chatgpt #engineering

High Signal AI - the most action-oriented newsletter on the web! #ai

High Signal AI - the most action-oriented newsletter on the web! #ai

Building an AI-powered Discord Chatbot Locally for FREE using Ollama

Building an AI-powered Discord Chatbot Locally for FREE using Ollama

Build a second brain with Khoj 🧠 #ai #obsidian #plugins #productivity #engineering #notes

Build a second brain with Khoj 🧠 #ai #obsidian #plugins #productivity #engineering #notes

Summarising YouTube Videos using Ollama on Discord | Becoming an AI Engineer - Ep 2

Summarising YouTube Videos using Ollama on Discord | Becoming an AI Engineer - Ep 2

Watch the full video on my channel - Roadmap to become an AI Engineer.

Watch the full video on my channel - Roadmap to become an AI Engineer.

Mesop - Python-based UI framework from Google!

Mesop - Python-based UI framework from Google!

How I automated my YouTube | Gumloop tutorial | No Code

How I automated my YouTube | Gumloop tutorial | No Code

ARC PRIZE - Win $1Million to Beat the ARC-AGI benchmark

ARC PRIZE - Win $1Million to Beat the ARC-AGI benchmark

Microsoft's Autogen vs CrewAI - tested on a diverse range of use cases

Microsoft's Autogen vs CrewAI - tested on a diverse range of use cases

Claude #AI artifacts are just amazing!

Claude #AI artifacts are just amazing!

OpenAI releases CriticGPT to correct GPT-4's mistakes | Read the paper with me

OpenAI releases CriticGPT to correct GPT-4's mistakes | Read the paper with me

Day in my life | Vlog #1

Day in my life | Vlog #1

How to add AI Copilot to your application using CopilotKit | Tutorial

How to add AI Copilot to your application using CopilotKit | Tutorial

Quick Questions with an AI Founder - Anudeep Yegireddi

Quick Questions with an AI Founder - Anudeep Yegireddi

The ARC-AGI benchmark is a visual reasoning benchmark data set that requires learning patterns and solving test puzzles. The benchmark is designed to test for general intelligence by assessing not just the skill but also skill acquisition. The challenge requires predicting exactly two outputs for every test input grid contained in the task, and the evaluation process combines the average of the training and evaluation data set with the private heldout data set to determine the final score.

Key Takeaways

Clone the repository
Download the data directory
Open the data folder
Format the document
Run an HTTP server
Provide problem details to the model
Guide the model to understand necessary transformations
Code for the task
Use detailed prompts with step-by-step reasoning examples
Ensemble prompts

💡 The ARC-AGI benchmark is a challenging task that requires a deep understanding of visual reasoning and general intelligence, and the use of retrieval augmented generation and fine-tuning can enhance the accuracy of the model.

🔒 Pro feature: Ask AI to explain this lesson →

More on: Research Methods

View skill →

Mechanics of Materials III: Beam Bending

Mechanics of Materials III: Beam Bending

Inaugural Lecture: Juliane Reinecke

Inaugural Lecture: Juliane Reinecke

Saïd Business School, University of Oxford

Hands-On Learning: How and Why You Should Build a Home Lab

Hands-On Learning: How and Why You Should Build a Home Lab

SANS Live Online Interactive Remote Lab and Range Demo – SEC599: Defeating Advanced Adversaries

SANS Live Online Interactive Remote Lab and Range Demo – SEC599: Defeating Advanced Adversaries

Does Water Swirl the Other Way in the Southern Hemisphere?

Does Water Swirl the Other Way in the Southern Hemisphere?

Undergraduate Research Forum 2026

Undergraduate Research Forum 2026

Related Reads

I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way

Learn how to effectively find research gaps by changing your approach, a crucial skill for AI researchers and academics

ICMI 2026 Reviews [D]

Learn how to interpret ICMI 2026 reviews and improve your paper's acceptance chances

Reddit r/MachineLearning

Workshop submission for main conference paper under review [D]

Learn how to navigate submitting a paper to a non-archival workshop before the final decision of a main conference like ECCV

Reddit r/MachineLearning

Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]

Streamline your research with a new Chrome extension and website that integrates 3M papers from arxiv, OpenReview, GitHub, and HuggingFace, including citation graphs and SPECTER2 neighbors, and provide feedback to improve it

Reddit r/MachineLearning

Chapters (8)

Francois on Dwarkesh's episode

0:29 🌟 Introduction to ARC AGI and its significance

2:18 🧩 Structure and purpose of the ARC AGI benchmark

4:20 🎲 Understanding the ARC AGI dataset through examples

6:36 📂 Accessing and using the ARC AGI dataset

8:27 🖥️ Setup for ARC AGI

10:33 🏆 Solution approaches for succeeding in ARC AGI

13:16 🚀 Participation and community engagement in ARC AGI

Indians Under House Arrest in America? 😱 Immigration Crisis Explained | SumanTV Classroom

SumanTV Classroom