Intro to KeyBERT - BERTopic for Topic Modeling

Cohere · Advanced ·📄 Research Papers Explained ·3y ago

Key Takeaways

The video introduces KeyBERT and BERTopic for topic modeling, demonstrating how to extract keywords and create document embeddings using language models and cosine similarity.

Full Transcript

ah give it give it as a fun one um let me just quickly go to this one it's easier for those who want to visit the uh at the package uh keyboard is is super simple really really really basic um as a default essentially what we're doing is we want to extract keywords and why we're doing that is we create a documented branding from a document and then from each word in that document we use the same language model and create embeddings from those we do cosine similarity and pick the ones that are most similar that's it that's all we're doing um and the reason that that works so well is for two reasons uh language models tend to have um the same dimensional space with respect to both words and documents that work quite well especially when you then do cosine similarity or another distance measure the second thing is is the same with with your topic it's so intuitive there's not much going on there and then on top of that we can do some fine tuning so there's an algorithm called maximum marginal relevance where we can say okay I'm gonna throw out words that are very similar to each other I'm going to intro introduce some diversity of the keywords that I have and we can do things with the count vectorizer to only again use nouns or or verbs or whatever is interesting to your specific use case but the way it works like your topics really straightforward sometimes things don't have to be so complex

Original Description

BERTopic for Topic Modeling - Talking Language AI full episode: https://www.youtube.com/watch?v=uZxQz87lb84 Topic modeling allows us to explore large text archives with software. This is commonly called "topic modeling". Go in-depth into BERTopic (the popular python topic modeling library) with its creator, Maarten Grootendorst. We explore three important pillars of the package, modularity, variations, and visualizations. Each of the pillars demonstrates how BERTopic gives control back to the developer allowing for a one-stop-shop of topic modeling. This video also demonstrates BERTopic's basic capabilities and some advanced tricks that new and advanced users of BERTopic may enjoy. Maarten is Open Source Developer and Maintainer (BERTopic, PolyFuzz, KeyBERT), Data Scientist, Psychologist. === Join the Cohere Discord: https://discord.gg/co-mmunity Discussion thread for this episode (feel free to ask questions): https://discord.com/channels/95442198... Maarten on Twitter: https://twitter.com/MaartenGr BERTopic: https://maartengr.github.io/BERTopic/ BERTopic on Github: https://github.com/MaartenGr/BERTopic BERTopic paper: https://arxiv.org/abs/2203.05794
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Cohere · Cohere · 17 of 60

1 Andreas Madsen on Independent Research and Interpretability
Andreas Madsen on Independent Research and Interpretability
Cohere
2 Plex: Towards Reliability using Pretrained Large Model Extensions
Plex: Towards Reliability using Pretrained Large Model Extensions
Cohere
3 Independent Research Panel Discussion
Independent Research Panel Discussion
Cohere
4 The Future of ML Ops: Open Challenges and Opportunities
The Future of ML Ops: Open Challenges and Opportunities
Cohere
5 C4AI Special - Grad School Applications
C4AI Special - Grad School Applications
Cohere
6 Cohere For AI Fireside Chat: Samy Bengio
Cohere For AI Fireside Chat: Samy Bengio
Cohere
7 Cohere For AI - Scholars Program Information Session
Cohere For AI - Scholars Program Information Session
Cohere
8 Modular and Composable Transfer Learning with Jonas Pfeiffer
Modular and Composable Transfer Learning with Jonas Pfeiffer
Cohere
9 Jay Alammar Presents Large Language Models for Real World Applications
Jay Alammar Presents Large Language Models for Real World Applications
Cohere
10 Catherine Olsson - Mechanistic Interpretability: Getting Started
Catherine Olsson - Mechanistic Interpretability: Getting Started
Cohere
11 How To Prompt Engineer a Tech Interview App | TOHacks 2022 Winners
How To Prompt Engineer a Tech Interview App | TOHacks 2022 Winners
Cohere
12 C4AI Sparks: Samy Bengio
C4AI Sparks: Samy Bengio
Cohere
13 BERTopic for Topic Modeling - Maarten Grootendorst - Talking Language AI Ep#1
BERTopic for Topic Modeling - Maarten Grootendorst - Talking Language AI Ep#1
Cohere
14 Exploring News Headlines With Text Clustering | Jay Alammar
Exploring News Headlines With Text Clustering | Jay Alammar
Cohere
15 Scale TransformX | Fireside Chat: Aidan Gomez and Alexandr Wang
Scale TransformX | Fireside Chat: Aidan Gomez and Alexandr Wang
Cohere
16 Making Large Language Models Accessible | Scale AI Fireside chat with Bill MacCartney
Making Large Language Models Accessible | Scale AI Fireside chat with Bill MacCartney
Cohere
Intro to KeyBERT - BERTopic for Topic Modeling
Intro to KeyBERT - BERTopic for Topic Modeling
Cohere
18 Intro to PolyFuzz - BERTopic for Topic Modeling
Intro to PolyFuzz - BERTopic for Topic Modeling
Cohere
19 API Design Philosophy - BERTopic for Topic Modeling
API Design Philosophy - BERTopic for Topic Modeling
Cohere
20 Code demo of BERTopic - BERTopic for Topic Modeling
Code demo of BERTopic - BERTopic for Topic Modeling
Cohere
21 Short texts vs long texts in BERTopic- BERTopic for Topic Modeling
Short texts vs long texts in BERTopic- BERTopic for Topic Modeling
Cohere
22 How People can help BERTopic - BERTopic for Topic Modeling
How People can help BERTopic - BERTopic for Topic Modeling
Cohere
23 Cohere For AI: Training Sensorimotor Agency in Cellular Automata with Bert Chan
Cohere For AI: Training Sensorimotor Agency in Cellular Automata with Bert Chan
Cohere
24 Cohere API Community Demos | October 2022
Cohere API Community Demos | October 2022
Cohere
25 Perfect Prompt Demo By Arjun Patel
Perfect Prompt Demo By Arjun Patel
Cohere
26 Project Idea Generator Demo By Tobechukwu Okamkpa
Project Idea Generator Demo By Tobechukwu Okamkpa
Cohere
27 SuperTransformer Demo By Amir Nagri and Team Megatron
SuperTransformer Demo By Amir Nagri and Team Megatron
Cohere
28 Cohere For AI Fireside Chat: Pablo Samuel Castro
Cohere For AI Fireside Chat: Pablo Samuel Castro
Cohere
29 How Startups Can Use NLP to Build a Competitive Moat
How Startups Can Use NLP to Build a Competitive Moat
Cohere
30 Build Chatbots Faster with Large Language Models
Build Chatbots Faster with Large Language Models
Cohere
31 Tools to Improve Training Data - Vincent Warmerdam - Talking Language AI Ep#2
Tools to Improve Training Data - Vincent Warmerdam - Talking Language AI Ep#2
Cohere
32 Utku Evci - Sparsity and Beyond Static Network Architectures
Utku Evci - Sparsity and Beyond Static Network Architectures
Cohere
33 Adding human intelligence to ML models with human-learn #shorts #machinelearning #nlp
Adding human intelligence to ML models with human-learn #shorts #machinelearning #nlp
Cohere
34 Iterating on your data with doubtlab - Tools to Improve Training Data
Iterating on your data with doubtlab - Tools to Improve Training Data
Cohere
35 Adding Human Intelligence to ML models with Human learn - Tools to Improve Training Data
Adding Human Intelligence to ML models with Human learn - Tools to Improve Training Data
Cohere
36 Scikt Learn embeddings helpers with Embetter - Tools to Improve Training Data
Scikt Learn embeddings helpers with Embetter - Tools to Improve Training Data
Cohere
37 Building Cohere API Demo App With Streamlit | Adrien Morisot
Building Cohere API Demo App With Streamlit | Adrien Morisot
Cohere
38 Rosanne Liu - career creation for non-standard candidates
Rosanne Liu - career creation for non-standard candidates
Cohere
39 Giving computers many human languages with Cohere's multilingual embeddings
Giving computers many human languages with Cohere's multilingual embeddings
Cohere
40 Learning by Distilling Context with Charlie Snell
Learning by Distilling Context with Charlie Snell
Cohere
41 Sentence Transformers and Embedding Evaluation - Nils Reimers - Talking Language AI Ep#3
Sentence Transformers and Embedding Evaluation - Nils Reimers - Talking Language AI Ep#3
Cohere
42 Reflecting on for.ai...
Reflecting on for.ai...
Cohere
43 Create a Custom Language Model with Surge AI and Cohere
Create a Custom Language Model with Surge AI and Cohere
Cohere
44 Cohere API Community Demos | November 2022
Cohere API Community Demos | November 2022
Cohere
45 Cohere API Community Demos | December 2022
Cohere API Community Demos | December 2022
Cohere
46 Cohere For AI Presents: Colin Raffel
Cohere For AI Presents: Colin Raffel
Cohere
47 Lucas Beyer - FlexiViT: One Model for All Patch Sizes
Lucas Beyer - FlexiViT: One Model for All Patch Sizes
Cohere
48 What is Neural Search? Nils Reimers - Sentence Transformers and Embedding Evaluation
What is Neural Search? Nils Reimers - Sentence Transformers and Embedding Evaluation
Cohere
49 Evaluating Information Retrieval with BEIR
Evaluating Information Retrieval with BEIR
Cohere
50 Evaluating Embeddings with MTEB Massive text embeddings benchmark - Nils Reimers
Evaluating Embeddings with MTEB Massive text embeddings benchmark - Nils Reimers
Cohere
51 High quality text classification with few training examples with SetFit
High quality text classification with few training examples with SetFit
Cohere
52 Multilingual and cross lingual embeddings - Nils Reimers
Multilingual and cross lingual embeddings - Nils Reimers
Cohere
53 Developing open-source software: lessons, benefits, and challenges - Nils Reimers
Developing open-source software: lessons, benefits, and challenges - Nils Reimers
Cohere
54 Ask Me Anything with Ed Grefenstette, Head of Machine Learning at Cohere
Ask Me Anything with Ed Grefenstette, Head of Machine Learning at Cohere
Cohere
55 HyperWrite Powers Its Generative AI Service with Cohere
HyperWrite Powers Its Generative AI Service with Cohere
Cohere
56 EMNLP 2022 Conference Special Edition - Talking Language AI #4
EMNLP 2022 Conference Special Edition - Talking Language AI #4
Cohere
57 Cohere API Community Demos | January 2023
Cohere API Community Demos | January 2023
Cohere
58 C4AI Sparks: Rosanne Liu on Career Creation for Non-Standard Candidates
C4AI Sparks: Rosanne Liu on Career Creation for Non-Standard Candidates
Cohere
59 Michael Tschannen -  Image-and-Language Understanding from Pixels Only
Michael Tschannen - Image-and-Language Understanding from Pixels Only
Cohere
60 How to Add AI to your App
How to Add AI to your App
Cohere

This video teaches how to use KeyBERT and BERTopic for topic modeling, allowing users to explore large text archives and extract keywords from documents. The technique uses language models and cosine similarity to create document embeddings and pick the most similar words.

Key Takeaways
  1. Install KeyBERT and BERTopic libraries
  2. Load a document or text archive
  3. Create document embeddings using a language model
  4. Apply cosine similarity to extract keywords
  5. Use maximum marginal relevance to introduce diversity in keywords
  6. Fine-tune the algorithm using count vectorizer
💡 Language models can be used to create document embeddings and extract keywords using cosine similarity, making topic modeling more efficient and effective.

Related AI Lessons

I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way
Learn how to effectively find research gaps by changing your approach, a crucial skill for AI researchers and academics
Medium · AI
ICMI 2026 Reviews [D]
Learn how to interpret ICMI 2026 reviews and improve your paper's acceptance chances
Reddit r/MachineLearning
Workshop submission for main conference paper under review [D]
Learn how to navigate submitting a paper to a non-archival workshop before the final decision of a main conference like ECCV
Reddit r/MachineLearning
Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]
Streamline your research with a new Chrome extension and website that integrates 3M papers from arxiv, OpenReview, GitHub, and HuggingFace, including citation graphs and SPECTER2 neighbors, and provide feedback to improve it
Reddit r/MachineLearning
Up next
How to Open HSD Files (Husqvarna Viking Designer Embroidery)
File Extension Geeks
Watch →