How-to Use The Reddit API in Python
Key Takeaways
The video demonstrates how to use the Reddit API in Python, covering setup, authorization, and data retrieval from subreddits using tools like OAuth, pandas, and the request library.
Full Transcript
hi and welcome to this video on how to use the reddit api in python so i'm going to keep this really short and we'll just get straight into the code in just a moment but i just want to describe what we're actually going to cover in this video so the first thing we need to do is obviously get access to the api so i'll just take you through how we can do that and then i'll explain how we authenticate ourselves when accessing the api after that i'll take you through some of the most common uses of the api that i think most you're probably going to be most interested in so that's stuff like getting the most popular threads from a subreddit or just a steady stream of all the threads being posted onto a subreddit so let's just get straight into it and we'll start putting together our api okay so the first thing we need to do is head over to this page here which is reddit.com press slash apps now we just want to scroll down here and find this create another app or create an app button and you click on there and then you just give it a name it doesn't really matter what you call it just something that you recognize we are using this as a script for personal use obviously if you are using this api something else then tick one of the other options that is relevant you can give it a quick description and then here you need to give it a redirect uri so for me i'm just gonna enter my twitter address because basically you can put anything you want in here but it's so that when people are wanting to find out something about your api they will be directed to whatever you put in this box so obviously if someone's find out about my api they'll come to here and they know that they can ask me about it okay and then here this is our secret key which we are going to need later so make sure you keep note this and also this personal use script as well so i'm just going to copy those across and put them into my jupiter lab here and i'm just going to call it client id so identify and this is the public key and here we have our secret key as well so this one you need to keep secret obviously i'm showing you this but this api won't exist by the time i upload the video and we just enter those so now we have those the next step is to request a temporary auth token from reddit and the first thing we need to do is actually import the request library then we get our authorization lights out and here we enter our client id and secret key now once you've done that we are going to need to actually log in so to do that we can first initialize a dictionary where we specify that we are going to be logging in with a password which we do like this and then we pass in our username and password as well and for my password i'm just going to read it in from this text file here you can if you're on and this is just a simple script you can just enter your password here it's not recommended it's recommended that you read it from elsewhere but it's completely up to you how you deal with this but this is how you can read in from a text file and just make sure you put r there instead of w for read okay so that is the dictionary that we will need to pass along to reddit in just a moment so we also need to essentially identify the version of our api and for this you can literally put anything you want but we'll put something that is at least slightly descriptive we'll just call it my api and put this is the version number now all we need to do is actually send a request for our oauth token we send this request to this address we are accessing the api version 1 and the access token endpoint and in there we also need to include our auth that we received earlier we need to include our login data and we also need to include the headers and this will return us hopefully everything that we need okay and then here we can see our access token so we need to access that and we just store it in a variable here so this token is something that we will need to add to our headers whenever we're using the api so to do that we just write this and we need to add that within authorization and the token itself needs to be formatted in a string that contains the word bearer space and then the token itself so then if we just print out headers this is what we get so now we can access every endpoint within the reddit api so beforehand if we had tried to access this endpoint the oauth reddit.com then api v1 me if we'd have tried to access this we would have not been allowed so let's say we just put the headers and we will just put this user agent api that we had before okay and we get a 401 response so let's copy this and try again but this time use headers which includes our authorization vera token obviously you get a 200 which means everything is okay and then we can add json on to the end here and we get all of this information so that's great we now have access to anything and we can start accessing what i think is probably the more relevant important information so the first one those i want to focus on is retrieving the most popular posts on a subreddit so if i head over to the reddit api documentation over here okay so we can see here we have this get subreddit hot and this returns all of the hot posts on that subreddit so in our case let's go with the hot threads in the python subreddit so to do that we send the get request and like you can see here it's this r subreddit hot so we can copy that across and we start the request with the oauth reddit.com and then we have our our subreddit get rid of this n bracket hot and of course the subreddit that we want to look at is python and then we can just add our headers in here so this is request not ready and then we can see what is in there using this json method and then here we get all this layers so this is obviously not very clean at the moment so let's clean this up and we can put it into a panda's data frame so it's a bit more readable so first let's figure out how to access each post within the response so let's open this again now within this json all of our posts are contained within this data key here so let's add data and then once we get into data we have a few different options so we have this mod hash which is you know nothing we need to care about we have this which just 27 that's not the post that we want and then we have this one here which is children and then you'll see that this is a list and within this list we have all the information about all of the hot posts within the python subreddit so that is where we want to extract data from so let's do that let's print that post okay and now we are getting somewhere and you can see there's quite a lot of data in each one of these so it's probably worth let's clean this up a little bit more so you can see here this is our other um the next entry in this list so what we probably want to do here is extract the data within the post so this is giving us this other dictionary which contains all the relevant information we want and then it is within here that we are going to want to extract different parts of information into our data frame so just as an example we have the title okay and then here we can see all of these titles of the numerous popular threads in the subreddit so this is essentially the syntax that we're going to use to populate our data frame so first let's just import pandas and maybe install it okay and then we need to initialize our patented frame so we do it like so okay and that just gives us an empty data frame and then we're going to use the for loop like we did before to loop through each one of the posts and just extract them as a row into this data frame so we'll do df equals append and then within this we create a dictionary which is going to contain everything that we would like to include and at the end of that as well we also need to remember to ignore index otherwise we'll end up with a load of errors and we want to avoid doing that so first let's include the subreddit just so we know where this data is actually coming from so just like before we want to do the post data and then we just access the subreddit okay and let's just have a look at what we have there so okay perfect as expected we're getting all of these entries through that's great but obviously we're probably going to want a little bit more than just the subreddit so let's just add a few more items as well so we have the title like we did before and another pretty important one in my opinion so let's just go this another important one is the self text which contains the actual content of the thread or the text content of that thread so that one is pretty important if you're wanting to extract any information about well anything from reddit okay so this is starting to look a little bit better let's see what we have okay and it looks good and maybe we want to also include a few other items maybe the number of upvotes the down votes and the score of the posts so we can do a few different things here we have the upvote ratio [Music] which is of course the number of votes it is getting in in comparison to downvotes and maybe we'd also just like to include the actual number of upvotes and down votes as well and again it's pretty straightforward we just include these and we can include downs like so and finally we can also include the score of the post okay so that gives us quite a lot of information that we can go ahead with this now if there are other things that you're interested in adding in here you can just do this to actually see what what keys you can include so let's access the data and then keys and this will just return lists of everything in there now this is pretty useful for actually finding the most relevant or the most popular posts but a lot of the time what you might want to do is actually stream the newest post so you essentially get a real-time update of what is actually going on and i would say this is probably what most people are going to want to use the api for so we can take a quick look at that as well and we can find it just over here we have this r subreddit new okay so essentially all we actually need to do here is adjust our old call to instead of reaching out to the hotend point we reach out to the new endpoint so let's just modify our code to do that okay so up here where we have hot we just change that to new okay seems to work and then we just do the same thing again so we just rerun this code okay great and then we do this and we get all of the latest posts on our subreddit which of course is pretty useful now this is returning around 27 to this one is 25 posts at once and of course you're probably going to want maybe a few more than that so what we can do is actually add a limit parameter and this limit parameter we just add like so add params and then in here we add limit and we can go up to 100 items so if we run that and let's take a look at what we had before we had this json and we had this this equals 25 which means that we returned 25 items before now if we run that we will see 100 so now we're returning 100 items and of course that's pretty useful so now we're getting more data back and we can essentially just keep running this again and again and extracting as much data as we would like so if we just rerun this so you can see we go up to 24 here rerun that and we will go over to 99 okay so that again is pretty useful now there's also one more thing that is pretty important to understand with this and that is how we can extract the ids of a post from the reddit api so if we go into post here we have these two different items we have kind which is actually i think up here so we have this t3 so reddit posts just have these different uh types or kinds and it's essentially a code that says whether it's a thread or some other type of post which i think is something like ads or videos or something along those lines but generally where i was always going to be working with t3 which are threads but if you are working on something else of course that may change and then as well as that we also have the id which is here and we can put both of these together in order to create the reddit post id so we add this with a underscore in the middle and this that is the unique id and that is unique for every post on the subreddit and in the api documentation you will see this referred to as the full name so what we can do with this is actually essentially loop back in time with the api so one of the things we can do is only request threads that are further back in time than a post given a specific full name which would be this t3 mix of letters so if we would like to do that so let's take this final one we have here and all we do is add that into another variable after like this and this will only take 100 new threads that have appeared after this post so we can do that and then what we can do rather than actually initializing our new data frame we can avoid doing that and we can actually loop through and add all of these new posts to our data frame and then we end up with even more data and here we go okay so that's how we can walk through and keep extracting more and more data from the reddit api now at some point it will stop allowing you to do this you can only go so far back in time which depends on the volume of requests that you're making the volume of threads on a specific subreddit but that is essentially all you need to actually uh do that so like i said it start the red api is incredibly powerful and unlike most other apis on social networks it's free to use so definitely something to take advantage of and see how you can implement it in in your own projects so i hope you've enjoyed the video and thank you for watching see you next time bye
Original Description
Learn how to use the Reddit API in Python, including setup, authorization, and pulling data from subreddits.
Reddit API docs:
https://www.reddit.com/dev/api/
🤖 70% Discount on the NLP With Transformers in Python course:
https://bit.ly/3DFvvY5
📙 Medium article:
https://towardsdatascience.com/how-to-use-the-reddit-api-in-python-5e05ddfd1e5c
📖 Free link:
https://towardsdatascience.com/how-to-use-the-reddit-api-in-python-5e05ddfd1e5c?sk=0295f297c1365bee7cc7a32bdff21b61
Extract from article:
"Reddit is a huge ecosystem brimming with data that is readily available at our very fingertips. As a data-minded person, I wanted to take advantage of this and perform some analysis using this vast repository of open-source data.
Initially, it turned out that getting to grip with Reddit’s API wasn’t as clear-cut as expected — despite being a straightforward process; it can be a little confusing at first.
So, after figuring everything out, I wrote this article — which I hope will help a few of you to get familiar with using the Reddit API in Python. We will cover:
Getting Access
Making Requests
- Reading the Data
- Streaming New Posts
Parameters
Getting Access
First, we need access. Unlike most popular services, the Reddit API was somewhat difficult to figure out initially. There are several steps:
1. Go to App Preferences and click create another app… at the bottom.
2. Fill out the required details, make sure to select script — and click create app.
3. make a note of the personal use script and secret tokens.
4. Request a temporary OAuth token from Reddit. We need our username and password for this.
5. Add headers=headers to every request. The OAuth token will expire after ~2 hours, and a new one will need to be requested.
"
And so on, check it out if you're interested in reading (rather than watching).
🕹️ Free AI-Powered Code Refactoring with Sourcery:
https://sourcery.ai/?utm_source=YouTub&utm_campaign=JBriggs&utm_medium=aff
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from James Briggs · James Briggs · 17 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
▶
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Stoic Philosophy Text Generation with TensorFlow
James Briggs
How to Build TensorFlow Pipelines with tf.data.Dataset
James Briggs
Every New Feature in Python 3.10.0a2
James Briggs
How-to Build a Transformer for Language Classification in TensorFlow
James Briggs
How-to use the Kaggle API in Python
James Briggs
Language Generation with OpenAI's GPT-2 in Python
James Briggs
Text Summarization with Google AI's T5 in Python
James Briggs
How-to do Sentiment Analysis with Flair in Python
James Briggs
Python Environment Setup for Machine Learning
James Briggs
Sequential Model - TensorFlow Essentials #1
James Briggs
Functional API - TensorFlow Essentials #2
James Briggs
Training Parameters - TensorFlow Essentials #3
James Briggs
Input Data Pipelines - TensorFlow Essentials #4
James Briggs
6 of Python's Newest and Best Features (3.7-3.9)
James Briggs
Novice to Advanced RegEx in Less-than 30 Minutes + Python
James Briggs
Building a PlotLy $GME Chart in Python
James Briggs
How-to Use The Reddit API in Python
James Briggs
How to Build Custom Q&A Transformer Models in Python
James Briggs
How to Build Q&A Models in Python (Transformers)
James Briggs
How-to Decode Outputs From NLP Models (Python)
James Briggs
Identify Stocks on Reddit with SpaCy (NER in Python)
James Briggs
Sentiment Analysis on ANY Length of Text With Transformers (Python)
James Briggs
Unicode Normalization for NLP in Python
James Briggs
The NEW Match-Case Statement in Python 3.10
James Briggs
Multi-Class Language Classification With BERT in TensorFlow
James Briggs
How to Build Python Packages for Pip
James Briggs
How-to Structure a Q&A ML App
James Briggs
How to Index Q&A Data With Haystack and Elasticsearch
James Briggs
Q&A Document Retrieval With DPR
James Briggs
How to Use Type Annotations in Python
James Briggs
Extractive Q&A With Haystack and FastAPI in Python
James Briggs
Sentence Similarity With Sentence-Transformers in Python
James Briggs
Sentence Similarity With Transformers and PyTorch (Python)
James Briggs
NER With Transformers and spaCy (Python)
James Briggs
Training BERT #1 - Masked-Language Modeling (MLM)
James Briggs
Training BERT #2 - Train With Masked-Language Modeling (MLM)
James Briggs
Training BERT #3 - Next Sentence Prediction (NSP)
James Briggs
Training BERT #4 - Train With Next Sentence Prediction (NSP)
James Briggs
FREE 11 Hour NLP Transformers Course (Next 3 Days Only)
James Briggs
New Features in Python 3.10
James Briggs
Training BERT #5 - Training With BertForPretraining
James Briggs
How-to Use HuggingFace's Datasets - Transformers From Scratch #1
James Briggs
Build a Custom Transformer Tokenizer - Transformers From Scratch #2
James Briggs
3 Traditional Methods for Similarity Search (Jaccard, w-shingling, Levenshtein)
James Briggs
3 Vector-based Methods for Similarity Search (TF-IDF, BM25, SBERT)
James Briggs
Building MLM Training Input Pipeline - Transformers From Scratch #3
James Briggs
Training and Testing an Italian BERT - Transformers From Scratch #4
James Briggs
Faiss - Introduction to Similarity Search
James Briggs
Angular App Setup With Material - Stoic Q&A #5
James Briggs
Why are there so many Tokenization methods in HF Transformers?
James Briggs
Choosing Indexes for Similarity Search (Faiss in Python)
James Briggs
Locality Sensitive Hashing (LSH) for Search with Shingling + MinHashing (Python)
James Briggs
How LSH Random Projection works in search (+Python)
James Briggs
IndexLSH for Fast Similarity Search in Faiss
James Briggs
Faiss - Vector Compression with PQ and IVFPQ (in Python)
James Briggs
Product Quantization for Vector Similarity Search (+ Python)
James Briggs
How to Build a Bert WordPiece Tokenizer in Python and HuggingFace
James Briggs
Metadata Filtering for Vector Search + Latest Filter Tech
James Briggs
Build NLP Pipelines with HuggingFace Datasets
James Briggs
Composite Indexes and the Faiss Index Factory
James Briggs
More on: LLM Foundations
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Common Next.js Errors (and How I Solved Them)
Dev.to · gary killen
Applying Scalability in Backend (CodeBuddy)
Medium · LLM
Why Every Backend Developer Should Learn Nginx Before Going to Production
Medium · DevOps
Connecting Frontend to Backend: A Backend Engineer’s Reality Check
Medium · Programming
🎓
Tutor Explanation
DeepCamp AI