Multi-armed bandit algorithms - Epsilon greedy algorithm

Sophia Yang · Intermediate ·🎮 Reinforcement Learning ·4y ago

Skills: RL Foundations90%

Key Takeaways

Explains the epsilon greedy algorithm for multi-armed bandit problems in reinforcement learning

Full Transcript

previously we talked about the etc algorithm for the multi-armed bandit problem the epsilon greedy algorithm is a randomized relative of etc the algorithm is described as follows first it chooses each arm once and then subsequently in each round t it chooses empirically best arm with probability one minus epsilon otherwise it chooses an arm uniformly at random let's take a look at an example with two arms let's assume the rewards for these two arms are one subgaussian with a mean of 0.9 and 0.6 we first play each arm once at the third round we know that with the probability 1 minus epsilon we choose the empirically best arm and with probability epsilon we choose a norm at random in this example let's assume epsilon three equals one which means that we have a hundred percent of chance choosing an arm and random here in this example we randomly choose arm 2. now assume at round 4 epsilon 4 is 0.9 which means that with 90 probability we choose an arm at random and ten percent probability we choose the best arm how do we choose let's get a random number from this random number generator here we get point four eight we see that the random number is smaller than epsilon 4 thus we choose an arm at random again this time we choose arm 1. at round 5 assume epsilon 5 is 0.5 which means half of the chance we should choose the best arm half of the chance we should select an arm at random again we use the random number generator to get a random number 0.8 which is greater than 0.5 therefore this time we need to choose the empirically best arm how do we find the best arm let's assume that in the previous four rounds we played arm one gives us a reward of 0.9 arm 2 gives us 0.5 and then arm 2 gives us 0.3 and arm 1 gives us 0.7 the empirical mean estimate for arm 1 is 0.8 and for arm 2 is 0.4 which you play arm one and then we just repeat this whole process over and over again to write this algorithm formally we have action a at time t expressed as follows it is the arg max of the programming estimate with probability y minus epsilon and the uniform selection of an arm with the probability epsilon note that we need to calculate epsilon at each round at epsilon is a function of c k p and delta c is a constant number k is the number of arms p is the number of the current round delta mean is the minimum of mean rewards difference among arms in our example with two arms this delta mean is simply the difference of the main rewards between our two arms so that is the epsilon greedy algorithm for a multi-armed bandit problem

Original Description

Hi, I plan to make a series of videos on the multi-armed bandit algorithms. Here is the second one: Epsilon greedy algorithm :) Previous video on Explore-Then-Commit: https://www.youtube.com/watch?v=r5oz7by90-Y 📖 Ref: https://tor-lattimore.com/downloads/b... https://web.mit.edu/6.246/www/lecture... ⭐ Stay in touch: Medium: https://sophiamyang.medium.com/ Twitter: https://twitter.com/sophiamyang Linkedin: https://www.linkedin.com/in/sophiamyang/

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Sophia Yang · Sophia Yang · 9 of 60

← Previous Next →

Customer lifetime value in a discrete-time contractual setting (math and Python implementation)

Customer lifetime value in a discrete-time contractual setting (math and Python implementation)

Time series analysis using Prophet in Python — Math explained

Time series analysis using Prophet in Python — Math explained

Multiclass logistic/softmax regression from scratch

Multiclass logistic/softmax regression from scratch

Deploy a Python Visualization Panel App to Google Cloud App Engine

Deploy a Python Visualization Panel App to Google Cloud App Engine

Deploy a Python Visualization Panel App to Google Cloud Run

Deploy a Python Visualization Panel App to Google Cloud Run

[Read a paper (with code)] Beyond Accuracy: Behavioral Testing of NLP models with CheckList

[Read a paper (with code)] Beyond Accuracy: Behavioral Testing of NLP models with CheckList

5-step data science workflow

5-step data science workflow

Multi-armed bandit algorithms - ETC Explore then Commit

Multi-armed bandit algorithms - ETC Explore then Commit

Multi-armed bandit algorithms - Epsilon greedy algorithm

Multi-armed bandit algorithms - Epsilon greedy algorithm

User retention analysis framework | data science product sense

User retention analysis framework | data science product sense

Visualization and Interactive Dashboard in Python: My favorite Python Viz tools — HoloViz

Visualization and Interactive Dashboard in Python: My favorite Python Viz tools — HoloViz

Multi-armed bandit algorithms: Thompson Sampling

Multi-armed bandit algorithms: Thompson Sampling

The Easiest Way to Create an Interactive Dashboard in Python

The Easiest Way to Create an Interactive Dashboard in Python

Big Data Visualization Using Datashader in Python | How does Datashader work and why is it so fast?

Big Data Visualization Using Datashader in Python | How does Datashader work and why is it so fast?

Why do you want to be a data scientist? Don't be a data scientist if ...

Why do you want to be a data scientist? Don't be a data scientist if ...

Johnny Depp v Amber Heard Twitter Sentiment Analysis | Is Camille Vasquez the real winner | 🤗 NLP

Johnny Depp v Amber Heard Twitter Sentiment Analysis | Is Camille Vasquez the real winner | 🤗 NLP

How to build a product that sells itself | Product-led Growth | Book Summary | Read a book with me

How to build a product that sells itself | Product-led Growth | Book Summary | Read a book with me

Designing Machine Learning Systems | book summary | Read a book with me

Designing Machine Learning Systems | book summary | Read a book with me

Where do data scientists/analysts go next? Love and hate in data analytics (ft. Shashank Kalanithi)

Where do data scientists/analysts go next? Love and hate in data analytics (ft. Shashank Kalanithi)

Meet the Author: Fundamentals of Data Engineering | DS/ML book club

Meet the Author: Fundamentals of Data Engineering | DS/ML book club

What's new in hvPlot releases 0.8.0 & 0.8.1?

What's new in hvPlot releases 0.8.0 & 0.8.1?

Meet the Author: Machine Learning Design Patterns | What do ML/Research Engineers do at Google?

Meet the Author: Machine Learning Design Patterns | What do ML/Research Engineers do at Google?

Machine Learning Design Patterns | Google Executive | Investor | Meet the Author

Machine Learning Design Patterns | Google Executive | Investor | Meet the Author

How to solve data quality issues | Data Reliability | Meet the Author

How to solve data quality issues | Data Reliability | Meet the Author

Reliable Machine Learning author interview | DS/ML book club

Reliable Machine Learning author interview | DS/ML book club

Toronto VLOG | First vlog | Meet my favorite author | Toronto ML Summit conference

Toronto VLOG | First vlog | Meet my favorite author | Toronto ML Summit conference

TOP 6 tech news in 2022 #shorts

TOP 6 tech news in 2022 #shorts

How to deploy a Panel app to Hugging Face using Docker?

How to deploy a Panel app to Hugging Face using Docker?

Tech news this week | ChatGPT, Hacks, Snowflake, CES #shorts

Tech news this week | ChatGPT, Hacks, Snowflake, CES #shorts

🗞️ Tech news this week: ChatGPT, DreamerV3, Muse, VALL-E, Mineral, DoNotPay, Tesla, SBF... #shorts

🗞️ Tech news this week: ChatGPT, DreamerV3, Muse, VALL-E, Mineral, DoNotPay, Tesla, SBF... #shorts

Tech news this week | Boston Dynamics, Microsoft, Snowflake, Google, and more #shorts

Tech news this week | Boston Dynamics, Microsoft, Snowflake, Google, and more #shorts

The story of Metaflow | Effective Data Science Infrastructure | Book author interview

The story of Metaflow | Effective Data Science Infrastructure | Book author interview

Tech news this week #shorts

Tech news this week #shorts

A day in life of a data scientist | Data Day Texas | Interview 12 authors/speakers

A day in life of a data scientist | Data Day Texas | Interview 12 authors/speakers

Tech news this week #shorts

Tech news this week #shorts

Explainable AI with Shapley Values (Part 1: Game Theory)

Explainable AI with Shapley Values (Part 1: Game Theory)

Explainable AI with Shapley Values (Part 2: Estimate Shapley Values)

Explainable AI with Shapley Values (Part 2: Estimate Shapley Values)

Explainable AI with Shapley Values (Part 3: KernelSHAP)

Explainable AI with Shapley Values (Part 3: KernelSHAP)

Tech news this week | AI search war between Microsoft and Google #shorts

Tech news this week | AI search war between Microsoft and Google #shorts

The Story of ChatGPT's creator OpenAI | From Riches to Fame

The Story of ChatGPT's creator OpenAI | From Riches to Fame

Explainable AI for Practitioners | Must-read for XAI | author interview

Explainable AI for Practitioners | Must-read for XAI | author interview

Train your own language model with nanoGPT | Let’s build a songwriter

Train your own language model with nanoGPT | Let’s build a songwriter

The easiest way to work with large language models | Learn LangChain in 10min

The easiest way to work with large language models | Learn LangChain in 10min

The BEST browser? AI article summary, image generation, website insights. Microsoft Edge Copilot!

The BEST browser? AI article summary, image generation, website insights. Microsoft Edge Copilot!

startup scene in data | insights from 50+ data startups from Data Council

startup scene in data | insights from 50+ data startups from Data Council

NLP with Transformers author interview with Lewis Tunstall from Hugging Face

NLP with Transformers author interview with Lewis Tunstall from Hugging Face

4 ways to do question answering in LangChain | chat with long PDF docs | BEST method

4 ways to do question answering in LangChain | chat with long PDF docs | BEST method

5 Steps to Build a Question Answering PDF Chatbot: LangChain + OpenAI + Panel + HuggingFace.

5 Steps to Build a Question Answering PDF Chatbot: LangChain + OpenAI + Panel + HuggingFace.

4 Autonomous AI Agents: “Westworld” simulation, Camel, BabyAGI, AutoGPT, Camel ⭐ LangChain ⭐

4 Autonomous AI Agents: “Westworld” simulation, Camel, BabyAGI, AutoGPT, Camel ⭐ LangChain ⭐

MiniGPT4: image understanding & open-source!

MiniGPT4: image understanding & open-source!

BEST Practices in Prompt Engineering: Learnings and Thoughts from Andrew Ng's New Course

BEST Practices in Prompt Engineering: Learnings and Thoughts from Andrew Ng's New Course

Designing Machine Learning Systems author interview with Chip Huyen

Designing Machine Learning Systems author interview with Chip Huyen

Tech news this week: code interpreter, Mojo, Redpajama, MPT7b, StarCoder #shorts

Tech news this week: code interpreter, Mojo, Redpajama, MPT7b, StarCoder #shorts

🤗 Hugging Face Transformers Agent | LangChain comparisons

🤗 Hugging Face Transformers Agent | LangChain comparisons

📢 Tech news this week #shorts

📢 Tech news this week #shorts

📢 Tech news this week #shorts

📢 Tech news this week #shorts

The BEST ChatGPT Plugins | Brand NEW Bing Search | Web browsing, CODING, summarizing, and more

Tech news this week #shorts #short

Tech news this week #shorts #short

📢 Tech news this week #shorts

📢 Tech news this week #shorts

Deep Learning with PyTorch Author Interview with Eli Stevens, Luca Antiga, and Thomas Viehmann

Deep Learning with PyTorch Author Interview with Eli Stevens, Luca Antiga, and Thomas Viehmann

More on: RL Foundations

View skill →

Build a Doom AI Model with Python | Gaming Reinforcement Learning Full Course

Build a Doom AI Model with Python | Gaming Reinforcement Learning Full Course

Nicholas Renotte

Deep Reinforcement Learning for Atari Games Python Tutorial | AI Plays Space Invaders

Deep Reinforcement Learning for Atari Games Python Tutorial | AI Plays Space Invaders

Nicholas Renotte

Training & Testing Deep reinforcement learning (DQN) Agent - Reinforcement Learning p.6

Training & Testing Deep reinforcement learning (DQN) Agent - Reinforcement Learning p.6

Build a Game Bot (LIVE)

Build a Game Bot (LIVE)

How to Win Slot Machines - Intro to Deep Learning #13

How to Win Slot Machines - Intro to Deep Learning #13

Build an Mario AI Model with Python | Gaming Reinforcement Learning

Build an Mario AI Model with Python | Gaming Reinforcement Learning

Nicholas Renotte

Related Reads

Snapshot Once, Rollout a Thousand Times: A Practical RL Setup for Coding Agents

Learn to optimize Reinforcement Learning (RL) by snapshotting the environment and forking it into multiple rollouts, reducing the bottleneck in RL training

Dev.to · Sebastian Buzdugan

Trust Region Policy Optimisation — The Exact Method PPO Approximates

Learn the exact method that Proximal Policy Optimization (PPO) approximates, Trust Region Policy Optimisation, and its implementation in Python for advanced reinforcement learning

Medium · Python

Hierarchical Reinforcement Learning in StarCraft Micromanagement with Influence Maps and Cluster-based Scripts

Learn how to apply hierarchical reinforcement learning to StarCraft micromanagement using influence maps and cluster-based scripts to improve AI decision-making in real-time strategy games

Proximal Policy Optimisation — The Clip That Made Policy Gradients Reliable

Learn how Proximal Policy Optimisation (PPO) makes policy gradients reliable in reinforcement learning

Medium · Machine Learning

Middle Management Meritocracy: Shockingly Naive