Jeremy Bernstein - Depths of First Order Optimization

Cohere · Advanced ·📐 ML Fundamentals ·1y ago

Skills: ML Maths Basics90%

Deep learning optimizers are often motivated through a mix of convex and approximate second-order theory. In this talk, I will argue that to build faster and more scalable training methods, we need to develop a deeper understanding of basic first-order optimization. I will begin by surveying popular theoretical approaches to optimization such as natural gradient descent, mirror descent and the Gauss-Newton method, with a focus on the assumptions and limitations of each approach. Next, I will argue that norm-based steepest descent---a first-order theory---overcomes many of these limitations. For the right choice of norm, I will show that we can directly obtain the benefits of two successful but poorly understood methods called Shampoo and muP. These ideas contributed to the proposal and development of the Muon optimizer, which has set speed records for training NanoGPT. I will conclude by introducing the modular norm---a means of systematically assigning a norm to any neural network as a function of the network architecture---as well as discussing opportunities for further progress. Jeremy Bernstein is a postdoc in CSAIL at MIT advised by Phillip Isola. His goal is to uncover the computational and statistical laws of natural and artificial intelligence, and thereby design learning systems that are more efficient, more automatic and more useful in practice. This session is brought to you by the Cohere Labs Open Science Community - a space where ML researchers, engineers, linguists, social scientists, and lifelong learners connect and collaborate with each other. We'd like to extend a special thank you to Anier Velasco Sotomayor, Thang Chu, and Andrej Jovanović, Leads of our ML Theory group for their dedication in organizing this event. If you’re interested in sharing your work, we welcome you to join us! Simply fill out the form at https://forms.gle/ALND9i6KouEEpCnz6 to express your interest in becoming a speaker. Join the Cohere Labs Open Science Community to

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Cohere · Cohere · 0 of 60

← Previous Next →

Andreas Madsen on Independent Research and Interpretability

Andreas Madsen on Independent Research and Interpretability

Plex: Towards Reliability using Pretrained Large Model Extensions

Plex: Towards Reliability using Pretrained Large Model Extensions

Independent Research Panel Discussion

Independent Research Panel Discussion

The Future of ML Ops: Open Challenges and Opportunities

The Future of ML Ops: Open Challenges and Opportunities

C4AI Special - Grad School Applications

C4AI Special - Grad School Applications

Cohere For AI Fireside Chat: Samy Bengio

Cohere For AI Fireside Chat: Samy Bengio

Cohere For AI - Scholars Program Information Session

Cohere For AI - Scholars Program Information Session

Modular and Composable Transfer Learning with Jonas Pfeiffer

Modular and Composable Transfer Learning with Jonas Pfeiffer

Jay Alammar Presents Large Language Models for Real World Applications

Jay Alammar Presents Large Language Models for Real World Applications

Catherine Olsson - Mechanistic Interpretability: Getting Started

Catherine Olsson - Mechanistic Interpretability: Getting Started

How To Prompt Engineer a Tech Interview App | TOHacks 2022 Winners

How To Prompt Engineer a Tech Interview App | TOHacks 2022 Winners

C4AI Sparks: Samy Bengio

C4AI Sparks: Samy Bengio

BERTopic for Topic Modeling - Maarten Grootendorst - Talking Language AI Ep#1

BERTopic for Topic Modeling - Maarten Grootendorst - Talking Language AI Ep#1

Exploring News Headlines With Text Clustering | Jay Alammar

Exploring News Headlines With Text Clustering | Jay Alammar

Scale TransformX | Fireside Chat: Aidan Gomez and Alexandr Wang

Scale TransformX | Fireside Chat: Aidan Gomez and Alexandr Wang

Making Large Language Models Accessible | Scale AI Fireside chat with Bill MacCartney

Making Large Language Models Accessible | Scale AI Fireside chat with Bill MacCartney

Intro to KeyBERT - BERTopic for Topic Modeling

Intro to KeyBERT - BERTopic for Topic Modeling

Intro to PolyFuzz - BERTopic for Topic Modeling

Intro to PolyFuzz - BERTopic for Topic Modeling

API Design Philosophy - BERTopic for Topic Modeling

API Design Philosophy - BERTopic for Topic Modeling

Code demo of BERTopic - BERTopic for Topic Modeling

Code demo of BERTopic - BERTopic for Topic Modeling

Short texts vs long texts in BERTopic- BERTopic for Topic Modeling

Short texts vs long texts in BERTopic- BERTopic for Topic Modeling

How People can help BERTopic - BERTopic for Topic Modeling

How People can help BERTopic - BERTopic for Topic Modeling

Cohere For AI: Training Sensorimotor Agency in Cellular Automata with Bert Chan

Cohere For AI: Training Sensorimotor Agency in Cellular Automata with Bert Chan

Cohere API Community Demos | October 2022

Cohere API Community Demos | October 2022

Perfect Prompt Demo By Arjun Patel

Perfect Prompt Demo By Arjun Patel

Project Idea Generator Demo By Tobechukwu Okamkpa

Project Idea Generator Demo By Tobechukwu Okamkpa

SuperTransformer Demo By Amir Nagri and Team Megatron

SuperTransformer Demo By Amir Nagri and Team Megatron

Cohere For AI Fireside Chat: Pablo Samuel Castro

Cohere For AI Fireside Chat: Pablo Samuel Castro

How Startups Can Use NLP to Build a Competitive Moat

How Startups Can Use NLP to Build a Competitive Moat

Build Chatbots Faster with Large Language Models

Build Chatbots Faster with Large Language Models

Tools to Improve Training Data - Vincent Warmerdam - Talking Language AI Ep#2

Tools to Improve Training Data - Vincent Warmerdam - Talking Language AI Ep#2

Utku Evci - Sparsity and Beyond Static Network Architectures

Utku Evci - Sparsity and Beyond Static Network Architectures

Adding human intelligence to ML models with human-learn #shorts #machinelearning #nlp

Adding human intelligence to ML models with human-learn #shorts #machinelearning #nlp

Iterating on your data with doubtlab - Tools to Improve Training Data

Iterating on your data with doubtlab - Tools to Improve Training Data

Adding Human Intelligence to ML models with Human learn - Tools to Improve Training Data

Adding Human Intelligence to ML models with Human learn - Tools to Improve Training Data

Scikt Learn embeddings helpers with Embetter - Tools to Improve Training Data

Scikt Learn embeddings helpers with Embetter - Tools to Improve Training Data

Building Cohere API Demo App With Streamlit | Adrien Morisot

Building Cohere API Demo App With Streamlit | Adrien Morisot

Rosanne Liu - career creation for non-standard candidates

Rosanne Liu - career creation for non-standard candidates

Giving computers many human languages with Cohere's multilingual embeddings

Giving computers many human languages with Cohere's multilingual embeddings

Learning by Distilling Context with Charlie Snell

Learning by Distilling Context with Charlie Snell

Sentence Transformers and Embedding Evaluation - Nils Reimers - Talking Language AI Ep#3

Sentence Transformers and Embedding Evaluation - Nils Reimers - Talking Language AI Ep#3

Reflecting on for.ai...

Reflecting on for.ai...

Create a Custom Language Model with Surge AI and Cohere

Create a Custom Language Model with Surge AI and Cohere

Cohere API Community Demos | November 2022

Cohere API Community Demos | November 2022

Cohere API Community Demos | December 2022

Cohere API Community Demos | December 2022

Cohere For AI Presents: Colin Raffel

Cohere For AI Presents: Colin Raffel

Lucas Beyer - FlexiViT: One Model for All Patch Sizes

Lucas Beyer - FlexiViT: One Model for All Patch Sizes

What is Neural Search? Nils Reimers - Sentence Transformers and Embedding Evaluation

What is Neural Search? Nils Reimers - Sentence Transformers and Embedding Evaluation

Evaluating Information Retrieval with BEIR

Evaluating Information Retrieval with BEIR

Evaluating Embeddings with MTEB Massive text embeddings benchmark - Nils Reimers

Evaluating Embeddings with MTEB Massive text embeddings benchmark - Nils Reimers

High quality text classification with few training examples with SetFit

High quality text classification with few training examples with SetFit

Multilingual and cross lingual embeddings - Nils Reimers

Multilingual and cross lingual embeddings - Nils Reimers

Developing open-source software: lessons, benefits, and challenges - Nils Reimers

Developing open-source software: lessons, benefits, and challenges - Nils Reimers

Ask Me Anything with Ed Grefenstette, Head of Machine Learning at Cohere

Ask Me Anything with Ed Grefenstette, Head of Machine Learning at Cohere

HyperWrite Powers Its Generative AI Service with Cohere

HyperWrite Powers Its Generative AI Service with Cohere

EMNLP 2022 Conference Special Edition - Talking Language AI #4

EMNLP 2022 Conference Special Edition - Talking Language AI #4

Cohere API Community Demos | January 2023

Cohere API Community Demos | January 2023

C4AI Sparks: Rosanne Liu on Career Creation for Non-Standard Candidates

C4AI Sparks: Rosanne Liu on Career Creation for Non-Standard Candidates

Michael Tschannen - Image-and-Language Understanding from Pixels Only

Michael Tschannen - Image-and-Language Understanding from Pixels Only

How to Add AI to your App

How to Add AI to your App

More on: ML Maths Basics

View skill →

Coding the GARCH Model : Time Series Talk

Coding the GARCH Model : Time Series Talk

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

ChethanAIChronicles

“Hello, world” from scratch on a 6502 — Part 1

“Hello, world” from scratch on a 6502 — Part 1

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

ROC and AUC in R

ROC and AUC in R

StatQuest with Josh Starmer

Related AI Lessons

7 Common Java Streams Mistakes and How to Avoid Them

Learn to avoid common Java Streams mistakes and improve your coding skills

Medium · Programming

Implementing an Item-Based Recommendation System from Scratch in Python

Learn to implement an item-based recommendation system from scratch in Python for personalized suggestions

Medium · Machine Learning

Implementing an Item-Based Recommendation System from Scratch in Python

Learn to build an item-based recommendation system from scratch in Python for personalized user experiences

Medium · Data Science

The Threshold Is a Business Decision, Not a Statistical One

Learn how to build a production-grade fraud detection system and why statistical thresholds are business decisions, not just statistical ones

Medium · Machine Learning

Capstone Assignment