Multi-armed bandit algorithms - ETC Explore then Commit

Sophia Yang · Intermediate ·🧠 Large Language Models ·4y ago

Skills: ML Maths Basics80%LLM Foundations70%

Key Takeaways

The video discusses the Explore-Then-Commit (ETC) algorithm for solving multi-armed bandit problems, with a focus on determining the optimal strategy for maximizing rewards in a row of slot machines with different reward distributions.

Full Transcript

in a multi-armed bandit problem we have a row of slot machines where each machine provides a random rewards from a probability distribution in real life we can consider each machine as different adds a b testing variants or others now we need to figure out which strategy can give us the best reward the simplest algorithm is the etc explore then commit algorithm let's assume that we have two arms depending on your use case the reward distribution may follow any kinds of distributions here we assume that the reward distribution follow a one sub gaussian arm 1 has a mean of 0.5 and arm 2 has a mean of 0.8 the first step of the algorithm is explore where each arm gets played one after another for a number of times the first round arm 1 returns a 0.6 reward second round arm 2 returns 0.7 and then arm 1 returns 0.5 and then arm 2 returns 0.9 if we just look at these four rounds we can calculate the empirical mean of the rewards for each arm we can see that the improper mean for arm 1 is 0.55 and the empirical mean for arm 2 is 0.8 arm 2 wins here is the mathematical equation of the simple coming calculation where the entire mean mu of the of the arm i around t equals to one of the sum of this indicator function times the sum of the indicator function times the reward at each step again after four rounds the improvement of arm two is greater than arm one if we only consider this four rounds we can say that arm two is better and thus in the commit phase we only use arm 2. the overall algorithm can be summarized as follows in round t we can choose the action a t where it equals t mod k plus 1 if t is less than or equal to mk meaning that we explore k arms m times in the exploration phase if t is greater than mk we choose the action a t that is the arc max of the empirical mean of the rewards of arm i during exploration if we only have two arms k equals two then it is the best to choose m as the max of one or this other term which is a function of delta the mean difference between two arms and n the total number of rounds so that is the etc algorithm for a problem

Original Description

Hi, I plan to make a series of videos on the multi-armed bandit algorithms. Here is the first one ETC Explore then Commit :) 📖 Ref: https://tor-lattimore.com/downloads/book/book.pdf https://web.mit.edu/6.246/www/lectures/L13-2021sp.pdf ⭐ Stay in touch: Medium: https://sophiamyang.medium.com/ Twitter: https://twitter.com/sophiamyang Linkedin: https://www.linkedin.com/in/sophiamyang/

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Sophia Yang · Sophia Yang · 8 of 60

← Previous Next →

Customer lifetime value in a discrete-time contractual setting (math and Python implementation)

Customer lifetime value in a discrete-time contractual setting (math and Python implementation)

Time series analysis using Prophet in Python — Math explained

Time series analysis using Prophet in Python — Math explained

Multiclass logistic/softmax regression from scratch

Multiclass logistic/softmax regression from scratch

Deploy a Python Visualization Panel App to Google Cloud App Engine

Deploy a Python Visualization Panel App to Google Cloud App Engine

Deploy a Python Visualization Panel App to Google Cloud Run

Deploy a Python Visualization Panel App to Google Cloud Run

[Read a paper (with code)] Beyond Accuracy: Behavioral Testing of NLP models with CheckList

[Read a paper (with code)] Beyond Accuracy: Behavioral Testing of NLP models with CheckList

5-step data science workflow

5-step data science workflow

Multi-armed bandit algorithms - ETC Explore then Commit

Multi-armed bandit algorithms - ETC Explore then Commit

Multi-armed bandit algorithms - Epsilon greedy algorithm

Multi-armed bandit algorithms - Epsilon greedy algorithm

User retention analysis framework | data science product sense

User retention analysis framework | data science product sense

Visualization and Interactive Dashboard in Python: My favorite Python Viz tools — HoloViz

Visualization and Interactive Dashboard in Python: My favorite Python Viz tools — HoloViz

Multi-armed bandit algorithms: Thompson Sampling

Multi-armed bandit algorithms: Thompson Sampling

The Easiest Way to Create an Interactive Dashboard in Python

The Easiest Way to Create an Interactive Dashboard in Python

Big Data Visualization Using Datashader in Python | How does Datashader work and why is it so fast?

Big Data Visualization Using Datashader in Python | How does Datashader work and why is it so fast?

Why do you want to be a data scientist? Don't be a data scientist if ...

Why do you want to be a data scientist? Don't be a data scientist if ...

Johnny Depp v Amber Heard Twitter Sentiment Analysis | Is Camille Vasquez the real winner | 🤗 NLP

Johnny Depp v Amber Heard Twitter Sentiment Analysis | Is Camille Vasquez the real winner | 🤗 NLP

How to build a product that sells itself | Product-led Growth | Book Summary | Read a book with me

How to build a product that sells itself | Product-led Growth | Book Summary | Read a book with me

Designing Machine Learning Systems | book summary | Read a book with me

Designing Machine Learning Systems | book summary | Read a book with me

Where do data scientists/analysts go next? Love and hate in data analytics (ft. Shashank Kalanithi)

Where do data scientists/analysts go next? Love and hate in data analytics (ft. Shashank Kalanithi)

Meet the Author: Fundamentals of Data Engineering | DS/ML book club

Meet the Author: Fundamentals of Data Engineering | DS/ML book club

What's new in hvPlot releases 0.8.0 & 0.8.1?

What's new in hvPlot releases 0.8.0 & 0.8.1?

Meet the Author: Machine Learning Design Patterns | What do ML/Research Engineers do at Google?

Meet the Author: Machine Learning Design Patterns | What do ML/Research Engineers do at Google?

Machine Learning Design Patterns | Google Executive | Investor | Meet the Author

Machine Learning Design Patterns | Google Executive | Investor | Meet the Author

How to solve data quality issues | Data Reliability | Meet the Author

How to solve data quality issues | Data Reliability | Meet the Author

Reliable Machine Learning author interview | DS/ML book club

Reliable Machine Learning author interview | DS/ML book club

Toronto VLOG | First vlog | Meet my favorite author | Toronto ML Summit conference

Toronto VLOG | First vlog | Meet my favorite author | Toronto ML Summit conference

TOP 6 tech news in 2022 #shorts

TOP 6 tech news in 2022 #shorts

How to deploy a Panel app to Hugging Face using Docker?

How to deploy a Panel app to Hugging Face using Docker?

Tech news this week | ChatGPT, Hacks, Snowflake, CES #shorts

Tech news this week | ChatGPT, Hacks, Snowflake, CES #shorts

🗞️ Tech news this week: ChatGPT, DreamerV3, Muse, VALL-E, Mineral, DoNotPay, Tesla, SBF... #shorts

🗞️ Tech news this week: ChatGPT, DreamerV3, Muse, VALL-E, Mineral, DoNotPay, Tesla, SBF... #shorts

Tech news this week | Boston Dynamics, Microsoft, Snowflake, Google, and more #shorts

Tech news this week | Boston Dynamics, Microsoft, Snowflake, Google, and more #shorts

The story of Metaflow | Effective Data Science Infrastructure | Book author interview

The story of Metaflow | Effective Data Science Infrastructure | Book author interview

Tech news this week #shorts

Tech news this week #shorts

A day in life of a data scientist | Data Day Texas | Interview 12 authors/speakers

A day in life of a data scientist | Data Day Texas | Interview 12 authors/speakers

Tech news this week #shorts

Tech news this week #shorts

Explainable AI with Shapley Values (Part 1: Game Theory)

Explainable AI with Shapley Values (Part 1: Game Theory)

Explainable AI with Shapley Values (Part 2: Estimate Shapley Values)

Explainable AI with Shapley Values (Part 2: Estimate Shapley Values)

Explainable AI with Shapley Values (Part 3: KernelSHAP)

Explainable AI with Shapley Values (Part 3: KernelSHAP)

Tech news this week | AI search war between Microsoft and Google #shorts

Tech news this week | AI search war between Microsoft and Google #shorts

The Story of ChatGPT's creator OpenAI | From Riches to Fame

The Story of ChatGPT's creator OpenAI | From Riches to Fame

Explainable AI for Practitioners | Must-read for XAI | author interview

Explainable AI for Practitioners | Must-read for XAI | author interview

Train your own language model with nanoGPT | Let’s build a songwriter

Train your own language model with nanoGPT | Let’s build a songwriter

The easiest way to work with large language models | Learn LangChain in 10min

The easiest way to work with large language models | Learn LangChain in 10min

The BEST browser? AI article summary, image generation, website insights. Microsoft Edge Copilot!

The BEST browser? AI article summary, image generation, website insights. Microsoft Edge Copilot!

startup scene in data | insights from 50+ data startups from Data Council

startup scene in data | insights from 50+ data startups from Data Council

NLP with Transformers author interview with Lewis Tunstall from Hugging Face

NLP with Transformers author interview with Lewis Tunstall from Hugging Face

4 ways to do question answering in LangChain | chat with long PDF docs | BEST method

4 ways to do question answering in LangChain | chat with long PDF docs | BEST method

5 Steps to Build a Question Answering PDF Chatbot: LangChain + OpenAI + Panel + HuggingFace.

5 Steps to Build a Question Answering PDF Chatbot: LangChain + OpenAI + Panel + HuggingFace.

4 Autonomous AI Agents: “Westworld” simulation, Camel, BabyAGI, AutoGPT, Camel ⭐ LangChain ⭐

4 Autonomous AI Agents: “Westworld” simulation, Camel, BabyAGI, AutoGPT, Camel ⭐ LangChain ⭐

MiniGPT4: image understanding & open-source!

MiniGPT4: image understanding & open-source!

BEST Practices in Prompt Engineering: Learnings and Thoughts from Andrew Ng's New Course

BEST Practices in Prompt Engineering: Learnings and Thoughts from Andrew Ng's New Course

Designing Machine Learning Systems author interview with Chip Huyen

Designing Machine Learning Systems author interview with Chip Huyen

Tech news this week: code interpreter, Mojo, Redpajama, MPT7b, StarCoder #shorts

Tech news this week: code interpreter, Mojo, Redpajama, MPT7b, StarCoder #shorts

🤗 Hugging Face Transformers Agent | LangChain comparisons

🤗 Hugging Face Transformers Agent | LangChain comparisons

📢 Tech news this week #shorts

📢 Tech news this week #shorts

📢 Tech news this week #shorts

📢 Tech news this week #shorts

The BEST ChatGPT Plugins | Brand NEW Bing Search | Web browsing, CODING, summarizing, and more

Tech news this week #shorts #short

Tech news this week #shorts #short

📢 Tech news this week #shorts

📢 Tech news this week #shorts

Deep Learning with PyTorch Author Interview with Eli Stevens, Luca Antiga, and Thomas Viehmann

Deep Learning with PyTorch Author Interview with Eli Stevens, Luca Antiga, and Thomas Viehmann

The video teaches the ETC algorithm for solving multi-armed bandit problems, including how to calculate empirical means and determine the optimal strategy for maximizing rewards. It provides a mathematical equation for the algorithm and discusses the importance of exploring and committing to the best arm. The key insight is that the ETC algorithm can be used to balance exploration and exploitation in bandit problems.

Key Takeaways

Define the multi-armed bandit problem and identify the reward distributions
Calculate the empirical mean of rewards for each arm
Determine the optimal strategy using the ETC algorithm
Choose the action with the highest empirical mean in the commit phase

💡 The ETC algorithm can be used to balance exploration and exploitation in bandit problems by first exploring each arm and then committing to the best arm.

🔒 Pro feature: Ask AI to explain this lesson →

More on: ML Maths Basics

View skill →

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

ChethanAIChronicles

“Hello, world” from scratch on a 6502 — Part 1

“Hello, world” from scratch on a 6502 — Part 1

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

ROC and AUC in R

ROC and AUC in R

StatQuest with Josh Starmer

Data Science Fundamentals: Data Cleaning in Python

Data Science Fundamentals: Data Cleaning in Python

Related Reads

Why LLM-Era AI Systems Break Every Rule You Learned About ML in Production

Learn why LLM-era AI systems require a new approach to ML in production, beyond just the model itself

Medium · Machine Learning

Why LLM-Era AI Systems Break Every Rule You Learned About ML in Production

Learn why LLM-era AI systems require a new approach to ML in production, beyond just the model itself

Why LLM-Era AI Systems Break Every Rule You Learned About ML in Production

Learn why LLM-era AI systems require a new approach to ML in production, beyond just the model itself

Medium · ChatGPT

From zero to live in a weekend: Parlàta

Learn how to build an atlas of Italian dialects using AI in just a weekend with Parlàta

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)