Cracking the Cold Start Problem

Data Skeptic · Intermediate ·🛡️ AI Safety & Ethics ·6mo ago

Key Takeaways

The video discusses the cold start problem in recommender systems, exploring techniques such as collaborative filtering, bandit learning, and data reduction to address this challenge. It also touches on the importance of considering algorithmic bias and fairness in the design of recommender systems.

Full Transcript

Welcome to Data Skeptic, a podcast [music] exploring the methods, use cases, and consequences of recommener systems. Welcome to another installment of data skeptic recommener systems. Today we get into the real nuts and bolts of what it is to build and design a recommener system. You know in machine learning when you look at classification everything is very standardized. Get your data into a nice tabular format one objective column pick an algorithm probably XG boost see if the model can learn to predict the objective from the features available. But you can't just graph that model on recommener systems. In fact most pathways to success involve a hybrid model. Today we talk about an approach that's a hybrid method combining several components. Collaborative filtering, the classic recommener system algorithm which effectively does dimensionality reductions. Then we talked about the use of embeddings, how you can recommend your users and their attributes and things like that in a latent space. And last but not least, bandit learning. Unlike some classification problem where you have presumably a well sampled data set, you never know how a user is going to react to a recommener system change. When you deploy a new idea, you've got to explore it. And when it works, you got to exploit it. Anyway, that's just a quick overview of some of the interesting topics we're going to get into right now. >> Hello, I'm Buoya. I'm from uh Virginia Tech. I'm an assistant professor of marketing in the pumping college of business. Before joining Virginia Tech, I had my PhD study in marketing from Duke. So I actually graduated last year. And before entering the PhD program in marketing, I had a two-year study of economics as a master students in Duke as well. Before that, I had my undergraduate in statistics from China. >> Could you talk a little bit about the transition? I mean, you didn't leave academia, but you went from being a student to being in the current position you're in now. Could you talk a little bit about that transition? >> So, I think that the biggest difference that I can tell from this transition is your role because that when you were a student, you just focus on your projects and also in Duke that PhD students generally don't need to teach because we don't have the undergraduate program for like PhD students to teach in the B school. But so far because you have become uh assistant professor formally so you need to make some balance between teaching and uh research. So definitely you should be a good time uh planner and also I'm not only working on my own projects that I'm also collaborating with the students and try to help them to figure out their research interest and also figure out our interest in common and then we can uh develop some projects and also I will provide some uh advice to the students. So that's something different. But in general, the research routine doesn't change that much because I I still work on some projects that I started during PhD but haven't finished before the graduation. >> And can you share some details on your research interest and the things you're currently looking into? My research interest is mainly about the digital economy especially the digital platform and I focus both on the e-commerce platform and also how the technology such as the recommendation system could help consumers and also how it would affect the supply side that is the uh the service or content provider. In addition to the e-commerce platform that another stream of my research focuses on social media, I care about the content strategy on social media such as the damics of the influencers production of controversial content and also how the news circulated on social media affects the offline business outcome such as how the news about impossible meat affects the local restaurants adoption of this new food technology. So that is the overall of my research and to handle all the questions I just mentioned that I usually use the combination of tools including the econometrics causal inference and machine learning tools. >> When you study e-commerce specifically as far as I know pretty much e-commerce stores are like closed private businesses and they're not necessarily obliged to share any data. Are there challenges or maybe on the positive side, what data sets do you have available to study e-commerce? >> Yeah, that's a really good question because you actually figure out the the main challenges that many researchers especially in the university are facing. It's pretty hard to conduct a field experiment with the companies because they have so many other priorities for their KPI and also if you wanted to collaborate with them for maybe such as conducting a field experiment they you usually need to collaborate with multiple departments because that even for the development of recommendation systems that the departments in the firm that could be involved would include the uh search engineering team and also the product manager. manager as well as the team for AB testing so far. So it's usually pretty hard unless that you would have some internal contact otherwise that we usually rely on some public data set such as the archive data shared on uh Kaggle to understand some historical consumer behaviors or another way is to establish your own platform for research such as the research platform used in my paper for recommendation system was established by a group of behavior scholars. from Duke that they mimic the real grocery shopping scenario and we can recruit participants to join our experiment. >> And at what point do you first take an interest in recommener systems? >> The the main reason why I started the research is from my own experience because you know the e-commerce is almost everywhere. So for me even if uh some purchases were still happen offline many of them actually happen online right and every time when you browse online store that you realize the store recommends you to recommend something to you sometimes the recommendations are good sometimes are bad and another thing is that I think my search entire shopping journey was affected by the recommendations because it affects what products that I can get exposure sometimes that my next search is usually based on the previous search or the previous uh browsing right but the previous browsing sometimes was recommended by the website itself. So that is the one scenario. The other scenario is not limited to the e-commerce setup is more about the the content consumption. For example, when you uh browse the Tik Tok or browse the Facebook or Instagram that you can always see some recommended content and also some content are the organic content which means that they're just for entertainment but some are actually ads especially Instagram. I find that the recommendation there could really understand what I would like many times that I was uh pretty attracted by the clothes or some furniture recommended on Instagram and then I was directed to the e-commerce platform through the recommendation on social media. So you can tell that how big influence it has at least to my own life and it's the same case for my friend's life. So I can tell that the recommendation systems is actually affecting people's life in so many ways. >> Definitely it's become rather ubiquitous. I think everyone has personal experiences both good and bad of recommener systems that worked great or sort of failed in maybe an embarrassing way. But those are kind of our personal anecdotes from someone in your position where you get to see maybe the bigger picture when you consider all users and all products or all possible recommendations. what sort of challenges emerge as you pursue it from the scalability point of view? >> Yeah, I think that when you talk about the scalability, we should think about from two views. One thing is from the user and item level, which means that there's so many users and items online, right? The items here are not limited to the the physical products we usually see on the e-commerce platform. It can also include the the content product. You will browse uh many websites and sometimes that you search for a target website and browse that. But sometimes you just randomly click something given your your your exposure at that moment. So that is the one aspect. It's just from the user and item level. But if we dive into the user and item themselves, so that refers to the second level is that for one user he or she can be described by so many tags because we would have a lot of footprints online so far because of the frequent visit uh on the website and even some offline activities can be recorded and in some way that they can be integrated with our online behaviors and also we have our basic demographics like where are you from like what the gender and also what would be your uh education background and and so much [clears throat] information about one person or maybe a group of persons available right so for each user the scalability would about the dimensionality of the features used to describe this person so the same case is also applicable to items if we think about a physical product that we can use price we can use a size we can use ingredients to describe a food or color or per producer location. So many information not limited to a physical product. If you look at the the content product such as the videos or audios that themselves are unstructured data. So when we talk about unstructured data that dimensionality is always not that small right and if you look at a video it will be about some visual elements and also would be about some like color contrast or some content design stuff even when we talk about the broadcast the audio product that we would have the the tune the emotion or or the pitch right there so many dimensions to describe such a a content product. So that is the second level of scalability. It means that the demographic space or the attribute space of an item or the person is usually high dimensional. So when we propose the algorithm to match the user and content we need to consider both levels of the scalability from the user and item level and also from their attribute level. What are the traditional approaches people used when they first went to tackle recommener systems? Like what are the classic algorithms? >> It's pretty hard to define a classical algorithm because you know for the high technology like that especially the digital technology the technique itself is always like moving forward and so far it has been 2025 and we know that CHBT this large language model has been so popular. So if you asked about the classic method like there will be a lot of options and maybe some advanced one but if you ask me this question three or four years ago my answer would be different. So I would not name a typical one but I can tell you some intrinsic logic of dealing with the high dimensional data that is to do the data reduction. So it means that in general no matter whether it's deep learning or maybe the coverage filtering as uh the one discussed in my paper the core idea is the same is that we're going to to figure out a latent space such latent space is low dimensional one but the lowdimensional one can still summarize the core information that distinguish items or users in their original highdimensional space of features and demographics and attributes. it it seems that we find a low dimensional space to summarize the important information in the original highdimensional space. So that is the logic. So if we are talking about the deep learning that the terminology we usually used uh for the lowdimensional vector in the low dimensional space would be embandings. Even for the unstructured data people can transfer the unstructured one to the embandings that is a low dimensional vector of numbers. Right? The same case here if we talk about the collaborative filter or the matrix factorization. So the terminology is usually the factorization the factors the lowdimensional factors that can summarize the information in a highdimensional original vectors. I think that's an excellent perspective that whether it's collaborative filtering or deep learning or some technique that's going to come out next year, we need a process to put data into a fixed vector into that latent space and maybe your mileage will vary on which is the right approach for you. Once I have that latent space where I've kind of have a I guess a feature vector on every item, can that help me with the cold start problem? Yeah, the code star problem is uh pretty salient especially when the for like when a website doesn't have any information about a new guest right because as I mentioned that people can randomly browse many pages like online pages every day but if we think about the low dimensional that I I mentioned when you have such a low dimensional space you you hope that the lowdimensional space can include information predictive of people's preference but how it could be predictive. So the generation or how we can figure out such a predictive low dimensional space depends on how much we have known about the focal users or the focal items. But if I don't know about him at all, maybe it's the person's first time to visit, how can I generate such an informative lowdimensional space? That is the challenge of code star. Code star means that I don't have enough information that can help me to generate such an informative lowdimensional latent space. >> Yeah, it makes sense that okay, if I've been on the site for a long time and I've given you a lot of feedback, I should have an expectation that you've maybe learned me a bit better. Could you describe that process? How do we go from zero to some model of the user? >> So that is related to how we design our method. So the point is that people could have is potential for a user to have a group of counterparts who share similar preferences. Right? If we know something about the group of counterparts or people who are similar to the focal new users, maybe we can use other people's behaviors to inform the potential preference of this new user. But this initialization would not be accurate. But at least it could be informative. So that is the logic is that uh we don't use the focal new users behavior to generate such a latent space because actually we don't have this information but it's hard for us to have no information at all because at least you can have the the person's location right that is usually together with the cookie or there's so many ways for a website to have some basic information that is the basic demographics and then we can use this basic information and the link link it with another group of users who would be similar to the focal new user and then we can use other people's preference to initialize uh the recommendation to the new user. It means that we estimate a prior a prior space of the lowdimensional representation for the new user and to handle or the mitigate the code star problem >> makes sense. Yeah, I like you describe it as a prior as well cuz now I'm picturing some basian process or something like that where as information trickles in you can update it. Do you have a sense of how long something like that takes? I can't imagine if I rate one movie suddenly you know everything about my movie taste. What does it take to get bootstrapped? >> You're asking about how long do I need to really understand a new person's preference? maybe around like several rounds of interaction with him and the longer would be let's say five rounds and the shorter one would be just one rounds as you mentioned when you only interact with one movie and I can totally understand you as a algorithm. So I think that how long it will take depends on how different the focal user compares to the existing user that I can match with the new users or the people that I have known from this view that you can categorize the new user into two groups. One is that the user is just a normal one means that it a is a member of a majority let's say. >> Mhm. the the person shares the the common interest with maybe 90% of the population here. But the other case is that the person is pretty niche. She has a lot of niche preference that we don't have good representation of her type in our training data or even in in our real world. For the first type, the majority one in general that the learning of the person's preference could be pretty fast because our priors can be very accurate, more accurate than the priors for a niche user. But for a niche user, I think that how long do we need to spend learning the person's well depends on the learning process that we designed in the recommendation system. That is the reason why in our method that we also incorporate the bandit learning. That is the effective way of learning by doing. >> Yeah. I'm thinking of cases like YouTube where I don't know the statistic, but they're getting some absurd amount of videos uploaded per day. And some of those new videos are going to be very popular, but maybe a lot of them are are almost garbage. They probably shouldn't have been uploaded. You know, the average video probably isn't a good video. How do you balance the process of exploring trying to find a good video versus saying okay we know enough now this is either a good one or a bad one to be promoting >> that's also related to the prior or the or the starting point of your exploration right suppose that you start from a random place it means that you don't have much prior knowledge about whether the new user or would like or what kind of video but you just randomly select Suppose that you have a hundred videos so far and the very naive way is just to randomly pick up like sever or maybe one and show it to the person and say his feedback, right? And then you use his feedback to update your your your algorithm for the for the future rounds, right? But because it's a random pick, it is very likely that the person would not like that and you usually have a pretty long uh stage of learning. But think about the case that you are the user. If you cannot find the movie you like on the platform for a while and what you are going to do, you're going to leave this platform and switch to another platform. So the consumer turning would happen. So it's always important for the firm to learn about the users preference in a short time and to increase their retention rate to to keep the the customers in their platform. So the solution is like the high level intuition that we need to have a good prior and we know even if the prior is not accurate to capture the real preference but it can help us to narrow down the target areas that we should search for more feedback and improve our algorithm. If you start from something wrong and uh it will take longer for you to go to the destination or even maybe forever you cannot arrive in the destination right but if you start with something right or maybe that's the area you explore it's just around the true preference of the person and then you will definitely takes shorter time to get the point. Yeah. I guess if you did a very naive implementation of multi-arm bandit on YouTube, I would end up much of the time seeing a video in a language I don't understand. Uh >> yeah. Yeah. >> Which most of the time would be a skip. So of course it would be good to have another solution like collaborative filtering to kind of pick up the balance there. But how do you get these two married together >> in our algorithm design? The first step is to recognize the lowdimensional space by linking the new user with some existing group of consumers given the similarity between their demographics. And we do the same thing to the item. It means that for a new item that we would recognize a group of existing items that share the similar attributes given this prior knowledge from both the user and item side and we can generate a relatively informative lowdimensional space to represent user's preference on items. But again because this is the initialized lowdimensional space it is not accurate but it's it can be a good uh starting point for the exploration. The next step is to do the bandits learning given the user preference represented by the lowdimensional space. We can choose the item that so far has a potentially good feedback from a user but it would have some uncertainties. So in general that the objective function of the bandit learning is not just maximizing the feedback is actually maximizing the feedbacks uh plus awaited uncertainties. It means that we value the uncertainty. So the goal is to reduce uncertainty. It means that during exploration, we're going to recommend something that we're not pretty sure the new user would like, but we know if we can collect the user's feedback to this item, even if it's an active feedback, it will highly help us to reduce our uncertainties of understanding the user's preference. It means that the information of uh getting the user's feedback to the recommended item is informative and then we'll recommend the item after recommending the the item and we observe the user's feedback and we include the user's feedback into the algorithm to do a basian updating and then we would have the posterior lowdimensional space at the end that's our like algorithm estimate of the user preference would converge to a a stable one and is usually a good or the accurate estimate of the user preference. So I know bandit is typically an online learning technique and uh of course you could take something like movie lens and simulate something online I suppose but could you talk a little bit about the experimentation or your approach to either testing on a platform or simulating something like that >> in our paper that as you mentioned that we do have some empirical testing um different data sets that the first is the standard and archive one but as you mentioned that Daniel learning is offline one if we use the offline data to evaluate that the bias is inevitable and we also have some synthetic data but it's still the offline one at the end we conduct our online experiments on a online grocery setup it means that we ask the participants the the real human being to join our experiment that they do the shopping as what they do on Amazon or instant cart Given their feedback to the recommended products we list on the product page, we update our algorithm and then we have multiple rounds of experiment. It means that in the second session, the second round and the same group of participants will be invited to do the uh second or second shopping again and the recommended items to each user would change. The change depends on their feedback in the previous run. So we keep the uh algorithm updating in a live mode. So it's the same as how you are interact with Amazon or instant cart >> and can you share a few details on the results? >> We in the experiment that we benchmark our method with another three algorithms that each of them doesn't have one element compared to our method. So remember that our method have the collaborative filtering that is the basic one that usually conducted for the users feedback to items but in addition to this component we have the collaborative filtering on the demographics and the attribute this two side matrix to inform the low dimensional space when we don't have the user feedback to solve the code star and a third component is the bandit right so we benchmark this method with another 31 and each one doesn't have any one of the component and overall our method uh outperforms these three alternatives on average that is the average result. Second we try to figure out the the source of the benefits for our method by comparing our method with each benchmark for the category of products such as protein. Then people's preference highly relied on their demographics which means that the priors themselves can be very informative. So the bandit learning so far is not that important. In this case, we can find that our method doesn't have a significant difference compared to the performance of the method that doesn't have the bandit learning. However, in some other categories such as produce, I think people's preference for sure are correlated with their demographics but not highly relied on that. It means that the demographics can only provide some information but not all the information about their preference. So in this case bandit learning matters a lot because that the bandit learning can help us to collect more information which is built on the initial lowdimensional space. So it can include a lot of valuable information that the demographics and the item attributes doesn't have at the very beginning. So in this case we can find that our method can significantly improve the performance compared to the method without the bandit learning. And overall we find no matter which scenario it is the data reduction is pretty important. That gets back to your question at the very beginning about scalability. So we know that learning from the original features which is a high dimensional one is slow. That is the first thing. Second that the information integration is not good. So it they cannot provide uh the information uh predictive of the people's preference because you usually need the synergies across the multiple information. It's that not the additive form of uh each small piece. So you need you need to have a model that can well integrate all of them. That is the informative data reduction. That is the second takeaway. The third takeaway is that we compare the benefits from our method across two user groups. One is the majority one and the uh the other is the the minority one. The minority user are those people who have the niche preference. It means that they like something unusual like they deviate from the mainstream or the major population's common interest. For this group of people, we find that our method because of the bandit learning can do better than the method that doesn't have the bandit learning. The reason is that the collaborative filtering that is a good dimension reduction method but because the the majority usually has a lot of data in the training set, right? So they are well represented. So for them learning about their preference can be very fast and the priors for them can be very accurate even at the very beginning. So they always benefit from such a information integration. But for the niche users that the platforms or the algorithms understanding about their preferences are so uncertain is because their data their preference cannot be well represented in the existing users. So the learning helps a lot because of the learning that the difference between the benefits for majority and minority can be mitigated. It means that our algorithm looks more fair to both groups. So that is the three main takeaways about our experimental results. >> Well, I consider myself to be one of those niche interest people in that small group. So I'm appreciative of any algorithm that can service me as I guess a minority in that case. I'm wondering if we could put ourselves in the shoes of that new user then. So they come to whatever platform that let's say is powered by your approach and uh they've got to give a little bit of feedback that will then place them in that latent space. So you can kind of get a similarity if I understand correctly. But that's an unsupervised process. So it's not like you can tell me exactly what it is without using the word IGEN vector, right? It's sort of just a mathematical answer, but I as the user, I'm going to experience it and maybe I'll say, "Oh, it learned the genre of music I liked or it learned the time period I like or it learned something about me." Do you have any sense of interpretability between the vector and uh the user experience? the interpretability of the user experience and the latent space or how the algorithm match the observed demographics or item attributes into the low space is very context dependent. So far we we use our method in a grocery shopping and uh we we can find that some lowdimensional vector because it's just a vector it's actually a combination of multiple observed demographics and items could be different for each product categories. For example, that for some vector in a low dimensional space, we can name it as the nutritionoriented one because that lowdimensional space is highly relevant or correlate with people's nutrition related diet or their age. But even if we applied the same algorithm into another product category maybe the beverage and then you would have different interpretability of the low dimensional space and their link with the observed user demographics and item attributes. So there is no specific answer about the interpretability because it's completely context dependent. >> I've touched a little bit on the topic of fairness in some other interviews. I guess could you give us the high level of how it would plug in with your approach? >> In my research so far that I think my understanding about the algorithmic bias is more about whether the same algorithm would be different for different groups of users. If I'm a niche user, why should I always spend a longer time interacting with a website to get what I like compared to people belonging to the majority, right? It's kind of some bias and that is what we mean by unfairness. So I think about the problem from first the the biasness in the training data that most algorithm is using even if we have the unsupervised learning but uh in practice the good performance is usually come from a semi-supervised or some supervised learn so we we need some guidelines from the existing data right but what if the existing data is already biased so different group of people can be represented differently in the in the existing data and also the accuracy of this representation can be different. My honor my research so far is try to say that how the common component in the algorithm design leverages the existing data and as a result how the intrinsic bias in the existing data can be transferred to the algorithmic outcomes and people can feel that in their user experience. Whether the solution should be improving the data or whether the solution would be more about improving the algorithm design to have a better use of the existing data because the people like the the proportion of different group of people is definitely not even that's that's for sure right that's the reason why we always have majority and we always have minority but given the the data is just like that that is the fact and how the algorithm design can correct it or can avoid some bad outcomes that violates the fairness. >> Well, I know this paper is a good milestone in and of itself and uh it's just a slice of your overall research, but are there any next steps along these lines? Yeah, along these lines as I mentioned that this is an empirical paper so far and we are trying to have some like more solid exploration from a theoretical view and to show is there any like closed form solution of the effect from different algorithm design on the fairness outcomes and what would be the right or not right what would be appropriate matrix of evaluating the fairness or the algorithmic bias. is when we include multiple uh groups of users or multiple group of items with different market shares or the proportion in this market that is more like a theoretical uh exploration along this line and another line would be about the effect of the recommendation system on social media. The reason is that the recommendation doesn't only affect the user experience. I mean as from a demand side it also affect the supply side. So suppose that you are a creator or in some cases you are an influencer. How to choose the topic of your video? >> Mhm. >> It depends on whether you want to get more views it means exposure or whether you want to get more positive feedback. Maybe that we can use like to quantify the positive feedback for sure that the number of views or the number of likes depend on your video quality but it's it's not limited to that depends on who is going to to receive your content. If your content is recommended to someone who already have some positive bias about that which means that they are more likely to like it and then it also more likely for them to give you positive feedback. But what if your video is recommended to somebody who is more irrelevant? Even if your video quality is high enough and uh you still have like less opportunity to be liked. The problem is you are not the only creator on this market. There are so many other creators and from the demand side the audience attention is limited. Everybody only has 24 hours. So how to compete for the attention? It depends on what other content are recommended together with your content to a focal user. So that is the next step to understand how this traffic allocation the recommendation system manipulate or affect the content suppliers decision about what topic to produce because you can choose the popular topics to to produce. is more likely for you to get exposure. But also, it's always easy to get exposure for others who produce the popular topic. But the problem is that the market for this topic could be more competitive. And how likely can this small creator get more attention? But if you produce something so niche, well, the recommendation system cannot find good or the relevant audience to watch the video. And still you have some risk of having a very small positive feedback. But then how to balance that? At the end you may find there could be some equilibrium decision on the topic choice and uh maybe on this platform a lot of content only focus on the popular topics. It means that the content centralization is is so severe but maybe if you use another recommendation algorithm or the logic and then the the platform the distribution of the content on the platform can be very diverse. So polarized content versus diverse content can also be an outcome of the recommendation system. So that would be the potential effect of the recommendation system on the supply side. So I would focus more on the effect on the supply side in my future research. >> Yeah, I think that's rather novel. I'm not aware of a lot of good work in the supply side space. I suspect a lot of creators are desperate to get algorithmic feedback over just, you know, uh things that might even be speculation that they're getting now on how to make the right video or or whatever they're producing. >> Yeah. Because I chat with some influencer friends. They told me that even for the same piece of video, it's really hard to get popular on every platform because the like different platforms have the different logics of uh allocating the traffic through their recommendation system. Even if the video quality is the same, even if that the audience base is also similar across some platform, but still from the creator's view that the performance of their creation can still be so different. It all depends on how the content can be recommended. >> Boy, thank you so much for taking the time to come on and share your work. >> Yeah, thank you so much for the interview and still that's a good opportunity for my work to be known to the public. Yeah, thank you. >> Definitely. [music]

Original Description

In this episode of Data Skeptic, we dive deep into the technical foundations of building modern recommender systems. Unlike traditional machine learning classification problems where you can simply apply XGBoost to tabular data, recommender systems require sophisticated hybrid approaches that combine multiple techniques. Our guest, Boya Xu, an assistant professor of marketing at Virginia Tech, walks us through a cutting-edge method that integrates three key components: collaborative filtering for dimensionality reduction, embeddings to represent users and items in latent space, and bandit learning to balance exploration and exploitation when deploying new recommendations. Boya shares insights from her research on how recommender systems impact both consumers and content creators across e-commerce and social media platforms. We explore critical challenges like the cold start problem—how to make good recommendations for brand new users—and discuss how her approach uses demographic information to create informative priors that accelerate learning. The conversation also touches on algorithmic fairness, revealing how her method reduces bias between majority and minority (niche preference) users by incorporating active learning through bandit algorithms. Whether you're interested in the mathematics of recommendation engines or the broader implications for digital platforms, this episode offers a comprehensive look at the state-of-the-art in recommender system design.
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Data Skeptic · Data Skeptic · 0 of 60

← Previous Next →
1 Data Skeptic book giveaway contest winner selection
Data Skeptic book giveaway contest winner selection
Data Skeptic
2 OpenHouse - Front end and API overview
OpenHouse - Front end and API overview
Data Skeptic
3 OpenHouse Crawling with AWS Lambda
OpenHouse Crawling with AWS Lambda
Data Skeptic
4 [MINI] Logistic Regression on Audio Data
[MINI] Logistic Regression on Audio Data
Data Skeptic
5 Data Provenance and Reproducibility with Pachyderm
Data Provenance and Reproducibility with Pachyderm
Data Skeptic
6 [MINI] Primer on Deep Learning
[MINI] Primer on Deep Learning
Data Skeptic
7 Big Data Tools and Trends
Big Data Tools and Trends
Data Skeptic
8 [MINI] Automated Feature Engineering
[MINI] Automated Feature Engineering
Data Skeptic
9 The Data Refuge Project
The Data Refuge Project
Data Skeptic
10 [MINI] The Perceptron
[MINI] The Perceptron
Data Skeptic
11 [MINI] Feed Forward Neural Networks
[MINI] Feed Forward Neural Networks
Data Skeptic
12 Data Science at Patreon
Data Science at Patreon
Data Skeptic
13 [MINI] Backpropagation
[MINI] Backpropagation
Data Skeptic
14 [MINI] GPU CPU
[MINI] GPU CPU
Data Skeptic
15 OpenHouse
OpenHouse
Data Skeptic
16 [MINI] Generative Adversarial Networks
[MINI] Generative Adversarial Networks
Data Skeptic
17 [MINI] AdaBoost
[MINI] AdaBoost
Data Skeptic
18 [MINI] The Bootstrap
[MINI] The Bootstrap
Data Skeptic
19 [MINI] Dropout
[MINI] Dropout
Data Skeptic
20 [MINI] Gini Coefficients
[MINI] Gini Coefficients
Data Skeptic
21 [MINI] Random Forest
[MINI] Random Forest
Data Skeptic
22 [MINI] Heteroskedasticity
[MINI] Heteroskedasticity
Data Skeptic
23 [MINI] ANOVA
[MINI] ANOVA
Data Skeptic
24 Urban Congestion
Urban Congestion
Data Skeptic
25 [MINI] The CAP Theorem
[MINI] The CAP Theorem
Data Skeptic
26 Unstructured Data for Finance
Unstructured Data for Finance
Data Skeptic
27 Detecting Terrorists with Facial Recognition?
Detecting Terrorists with Facial Recognition?
Data Skeptic
28 Predictive Models on Random Data
Predictive Models on Random Data
Data Skeptic
29 [MINI] Entropy
[MINI] Entropy
Data Skeptic
30 [MINI] F1 Score
[MINI] F1 Score
Data Skeptic
31 Causal Impact
Causal Impact
Data Skeptic
32 Machine Learning on Images with Noisy Human-centric Labels
Machine Learning on Images with Noisy Human-centric Labels
Data Skeptic
33 The Library Problem
The Library Problem
Data Skeptic
34 Stealing Models from the Cloud
Stealing Models from the Cloud
Data Skeptic
35 Data Science at eHarmony
Data Science at eHarmony
Data Skeptic
36 Multiple Comparisons and Conversion Optimization
Multiple Comparisons and Conversion Optimization
Data Skeptic
37 Election Predictions
Election Predictions
Data Skeptic
38 [MINI] Calculating Feature Importance
[MINI] Calculating Feature Importance
Data Skeptic
39 MS Connect Conference
MS Connect Conference
Data Skeptic
40 Music21
Music21
Data Skeptic
41 The Police Data and the Data Driven Justice Initiatives
The Police Data and the Data Driven Justice Initiatives
Data Skeptic
42 Studying Competition and Gender Through Chess
Studying Competition and Gender Through Chess
Data Skeptic
43 [MINI] Goodhart's Law
[MINI] Goodhart's Law
Data Skeptic
44 Trusting Machine Learning Models with LIME
Trusting Machine Learning Models with LIME
Data Skeptic
45 [MINI] Leakage
[MINI] Leakage
Data Skeptic
46 Predictive Policing
Predictive Policing
Data Skeptic
47 Mutli-Agent Diverse Generative Adversarial Networks
Mutli-Agent Diverse Generative Adversarial Networks
Data Skeptic
48 [MINI] Convolutional Neural Networks
[MINI] Convolutional Neural Networks
Data Skeptic
49 Unsupervised Depth Perception
Unsupervised Depth Perception
Data Skeptic
50 [MINI] Max-pooling
[MINI] Max-pooling
Data Skeptic
51 MS Build 2017
MS Build 2017
Data Skeptic
52 Activation Functions
Activation Functions
Data Skeptic
53 Doctor AI
Doctor AI
Data Skeptic
54 [MINI] The Vanishing Gradient
[MINI] The Vanishing Gradient
Data Skeptic
55 CosmosDB
CosmosDB
Data Skeptic
56 Estimating Sheep Pain with Facial Recognition
Estimating Sheep Pain with Facial Recognition
Data Skeptic
57 [MINI] Conditional Independence
[MINI] Conditional Independence
Data Skeptic
58 MINI: Bayesian Belief Networks
MINI: Bayesian Belief Networks
Data Skeptic
59 Project Common Voice
Project Common Voice
Data Skeptic
60 [MINI] Recurrent Neural Networks
[MINI] Recurrent Neural Networks
Data Skeptic

The video discusses the cold start problem in recommender systems and explores techniques to address this challenge. It also highlights the importance of considering algorithmic bias and fairness in the design of recommender systems. By understanding these concepts, viewers can design and implement more effective and fair recommender systems.

Key Takeaways
  1. Use basic information such as demographics to link new users with similar users
  2. Estimate a prior space of the low-dimensional representation for the new user
  3. Apply bandit learning to learn by doing
  4. Balance exploration and exploitation by starting from a random place and updating based on user feedback
  5. Recognize the low-dimensional space by linking new users with existing groups of consumers and items with similar attributes
  6. Do bandit learning given the user preference represented by the low-dimensional space
  7. Recommend items with potential good feedback from users but with uncertainties
  8. Observe user feedback and include it into the algorithm for Bayesian updating
  9. Simulate online experimentation
  10. Conduct offline empirical testing
💡 The cold start problem can be addressed by using a combination of techniques such as collaborative filtering, bandit learning, and data reduction. Additionally, considering algorithmic bias and fairness is crucial in the design of recommender systems to ensure that they are fair and transparent.

Related AI Lessons

What 116 court judgments taught me about the limits of AI
Learn about the limitations of AI in professional settings through an analysis of 116 court judgments and a personal project using consumer AI tools
Medium · AI
Your ChatGPT History Is a Liability. I Fixed That With a $80 Chip and a Pi5.
Protect your ChatGPT history from being used as evidence with a simple hardware solution using a $80 chip and a Pi5
Medium · AI
Your Skepticism About AI Is an Asset. Here’s How to Use It.
Learn to leverage skepticism about AI to improve its adoption and implementation in your team and organization, and why it matters for responsible AI development
Medium · Programming
The Dark Side of AI: What We Lose When We Stop Thinking
Discover how AI's benefits come with a cost to human critical thinking skills, and why it matters for professionals to be aware of this trade-off
Medium · AI
Up next
Containers Don't Make Your AI Agent Safe
Web Dev Simplified
Watch →