Customer Clustering

Data Skeptic · Advanced ·👁️ Computer Vision ·4y ago

Skills: ML Pipelines80%CV Basics60%

Key Takeaways

The video discusses using clustering techniques for actionable feature extraction from time-series single-feature data, with a focus on customer clustering methodologies and results.

Full Transcript

[Music] welcome to data skeptic k-means clustering the podcast exploring the problem the algorithms enhancements and use cases for k-means clustering [Music] if you heard last week's episode towards the end i was cautioning linda about blindly using k-means singularly as a technique for partitioning your customers into groups done blindly that doesn't make a lot of sense accomplishing such a goal in a meaningful way takes precision and an investigation of other metrics algorithms and approaches in this episode i speak with isan barkhorder about his research into clustering bank customers i'm essential i'm 26 years old and and now a data scientist at marcotte.com actually i acquired my master's degree in computer science from american university of technology my research interests are mainly in customer and market analytics time series analysis computer vision nlp and so on what are some of the types of questions that are interesting when you look into customer analytics it's a general question so it doesn't have a definitive answer sometimes it comes to my mind to talk about time analysis when you are doing some research about you know using some data to predict something and for some clustering purposes maybe some nlp tasks uh what are some of the types of questions that are interesting when you look into customer analytics for example when a company wants to cluster their customers what is the purpose what do they want to do with the clusters if you're working in a bank or a financial industry or something like that you are worrying about how to create a service for new customers or how to keep your customers satisfied with everything you have in your industry or in your company so it's really important to be updated to have some tools to actually to extract the information from the customers for example you want to know what types of customers i have or how they can interact with each other or how they move from one type or one group to another group actually transmitting and these informations are really important when you wanna making some decisions for example you're a manager you want to invest on a new start-up or new tools and you want to to make a big decision for creating a new service okay so it costs a lot so you should be careful about your decisions and this time the data and the clustering helps you to have a bear's eye view from the customers and make efficient decisions about them for example one types of customers are really worried about their balances or the loan or i don't know something like that and you should decide for them and you you need some information about them for example the monetary the frequency of purchases or something like that that's it so i think you're describing some of the feature engineering steps i guess is that a good way to look at it yeah what are some common types of features and the paper is all about uh the banking segments maybe we should talk about banking customers the last time i signed up for a bank i gave them my name my address my social security number and a deposit i don't think i told them too much more so it seems like they have limited data on me but i still have that account after a year there must be something there what can you featurize about me as a customer actually there is two types of information and features the first type is about the dynamic features from customer i mean purchases uh i don't know loan payments or something like that which are dynamic and change during the time and the second group are the static features you know your gender sometimes your age because you know it changes during the a year it's not dynamic and your location may be or something like that okay so at the first stage it seems there is no plenty of information about you but of course the financial information matters you know it's really important because it relies on your social activities it relies on your financial issues and everything okay so for example when you purchase a i don't know a car it means you have enough money to invest on something else okay if you purchase a cigarette okay it means you smoke okay so there is no need for more information i mean the personal information and the financial transactions actually are really meaningful information and let's talk a little bit about the architecture you worked on what did you guys build this with algorithmic wise this paper i have two parts okay the first part is about the neural network and lesson based encoder decoder and the second part is about dynamic time warping and i think many people these days are interested in neural network approach because they are new they are trend and in this paper we are using them for selecting features for extracting some hidden information from the transactions i mean the bank transactions and there are some steps to extracting information and then training on them and then validating at the end the uh visualizing or something like that you know because it depends on you on the dimensions you know it's really important to visualize on three or two dimensions and it's not possible to visualize n dimensions or a hundred dimensions and finally some insights comes to you and you should decide on them and you said it's a good clustering i'm looking for this and so at the first stage i'm uh actually i found the data set it's from uh bearcat banking system it was in a paper in uh 2000 and you know this data set uh is really good because it's realistic you know there is a bank in some country and they found some information about permission and convert those data into data set and after finding a proper data set i was able to use them in order to make a prediction actually it's not prediction it's a feature extraction if you are familiar with auto encoders auto encoders are neural networks that have the same input as the output so in the middle actually in a latent space the encoder output it has a lower dimension rather than the input case so it means you can use this information to actually select some good features for customers which are looking for and what is the role of dynamic time warping in this paper actually dynamic time warping is a measure you know it measures the dependency or similarity of two time series that may differ in time for time service and in this paper we are looking for a distance matrix between the purchases actually it shows us the difference between the types of purchases how they similar to each other or how they differ from each other and after analyzing these time series we have some features so we have two algorithms in our people first is the lstm auto encoder and the second is dynamic time work and we combine these features by some special methods and after that we use pca or some other algorithms to convert them in the lower dimensions and use them for clustering at first and after that visualizing the results what do you find when you visualize the results sometimes it's really obvious you're looking for something which is meaningful the clustering is a technique to find some relations or similarity between types of people or types of customers so at first when i've used visualization for conclusion i found that the auto encoder lstm and the dynamic time warping both of them are very good you know but they were not enough you know they didn't visualize a meaningful picture from the customers so i thought that it's it's a good idea to combine them you know to use them in a breed approach and it works you know after using some of them actually combining the features i found that the final result is significantly better than the individuals i mean time warping and elastin-based auto encoder and yeah you know but it's not important to use different methods for example there is no significant difference between k means or k medius do you know why because if you have enough information about your customers okay you can cluster them but if you don't pick up some proper features you are not able to cluster them even with some i don't know complex clustering algorithms so i don't care about the you know the clustering method because it's really important to select good features thanks to our sponsor clearml clearml is an open source mlap solution users love to customize it helps you easily track orchestrate and automate ml workflows at scale machine learning is no fad your organization is only going deeper on investment don't let this be a source of technical debt clearml can improve every step in your ml workflow making it a tool loved by data engineers ml engineers devops people and data scientists alike clearml is hands down the best collaborative ml tool with full visibility and extensibility as a contributor on a machine learning team clearml amplifies your efforts as a manager clearml gives you the transparency and metrics to guide your team's efforts effectively when you can't get pulled into the weeds their solution logs your entire process your data and models are versions so there's no questions of provenance or audits that can't be done model repositories give you a clear view of available binaries and you can deploy pipelines directly from code supercharge your entire development process go check out clearml today visit clear.ml to get started thanks to our sponsor bb edit from bare bones software if you're a mac user and you have any need whatsoever for a text editor which let's face it that's got to be all of us you have got to check out bb edit bb edit 14 provides mac users the most powerful professional grade text editor around it's known as capable and rock solid and can process files that are large enough to bring other editors to their knees this tool's been beloved for over 25 years and i can understand why if i'm just typing up some simple hello world i guess any text editor will do but when i really get into the weeds and need to do some data processing maybe i've got a big messy file i need to find all the non-ascii characters and remove them bb edit's there for me it has best of breed multi-file search and replace and great scripting support for text transformations there's a whole menu of text transformations that you'll find handy if you're a serious data scientist or software engineer you're going to bump into situations where you need a powerful text editor so download and try bb edit today you can get it at www dot bones see how much your productivity can increase barebones.com and download bb edit so when i think about lstm my context for it is always how it gets applied in natural language where it makes very intuitive sense right that i might mention something and then ramble on a bit but you still have the context and when i try and map that to the transaction data that a bank has i can definitely see where there would be patterns maybe it could learn my monthly spending pattern or my spending pattern might be weekly some people's monthly that there are these time relative features that could come out i'm less clear on what dynamic time warping does in terms of feature engineering could you maybe go deeper into the method and how it reveals something useful for doing the clustering actually i want to add some information about lscm you said you know the lscm networks or some other networks which are really similar to lscm are used in translation you know tasks or i don't know nlp subjects and it's true but you know there is a same idea between the translation and my work you know actually i'm translating purchases and what does it mean actually i'm looking for predicting purchases let's give an example for example you have a 10 purchases you mask one of them or you put them into the neural network and ask the network to predict for example the next one the element and in this way the networks tries to actually predict based on their previous knowledge and in this way it is trained by the purchases or transactions and you know it's the same as in the translations of i don't know a sequence in english to french so it's sec to sec model actually and it's auto encoder also you know both of them it's a sectorsec model and auto encoder because the input the output are the same and sequence to sequence because we have a sequence of transactions so it's really good to use lstm and actually i want to add for future words to use for example transformers because they are actually trend these days and everybody use them for i don't know translating attacks and i think it works but you know in this paper i don't use them so i use just lstm and about dynamic time warping actually it's an old technique it's not new but it has a very important feature you know it relies on the differences between the time series so we can infer some new information we can get some new features from our data or combine them with their previous features and it's a kind of technique to you make your features a stronger you know and more meaningful because you use them from two separate algorithms and yeah both of them are good but you know each one can focus on some aspects of data and when you look at the results you're getting back can you think of ways that the bank might make them actionable what could this lead to is it for promotion or better customer service how can they use the insights at first it's really important to have enough data i'm sure the banks have enough data about their customers especially transactions and you know actually a strategy to use some simple insights in business what does it mean you know you can use some for example there is three types of people in my insights so i should find the relation between these types of persons or these types of customers and in the next step you will use some other techniques you know it's not enough to use just one clustering and that's it it's finished no you should use other techniques for example churn prediction to predict your customers when they are churning actually or i don't know cohort analysis for other informations from your customers but at first you want to have a overview on your customers and you want to make a decision based on the types of the customers so you need to cluster them so it's really important to cluster them because it's unsupervised learning and there is no previous information there it doesn't require you know a background from the customers and there is no need for labels and it's really good for the initial steps but after that you need to acquire new methods or new analytics to get better in science so if i understand correctly your clustering effort was really on a latent space or maybe the feature space produced and if it's produced through things like pca and the deep neural networks and things it's often the case that that latent space isn't very interpretable i don't know what the numbers mean did you find that to be true here actually i think it's better to see the result in clustering you know it's really hard to predict the model the neural network model and say okay there are good features there is no definitive answer or there's no clear boundary between good features and bad features if you can use them in a clustering and the result is meaningful they are good features so actually i found that a special point i wanna tell you that is that a low error does not necessarily lead to high clustering performance it means sometimes you have a higher loss but your performance in clustering is much better than the previous ones okay so actually i had to use some clustering evaluation methods like seahood or some other methods to evaluate my you know my features with my proper algorithm my proposed algorithm could you expand upon those metrics tell me what you used and how you looked at them you know when you are solving unsupervised problems you should consider some hyper parameters you know for example the number of clusters or i don't know some hyper parameters in a model at first i really suggest that people should care about them and you know it's it's really a test and train uh procedure you know you should test it on your data it depends on your types of data or i don't know your techniques maybe and in my work actually i've used k-means because it works well you know i used k mediums but it wasn't really good because it uses the mediads but actually i was considering the mean is more important for me in this work instead of immediate and you have any insight as to why that's the case to me the median being the you know typical 50th percentile customer i often find it's more resilient to outliers and when i think of a bank you know probably most people have an average amount of money and a very tiny have a huge amount of money huge number of transactions what about the mean uh is there any insight as to why the mean was useful you are right because sometimes the techniques is really rely or outliers you know it has some different results when you have power outlier in your data you should consider about it but if i want to choose between sound for example methods in clustering i will choose density-based clustering methods do you know why because it's really important in a bank industry because you know the information in bank industries are really dense i mean it's really hard to separate data because the number of customers is really high and many customers are similar together every day they for example buy sandy which is all of them are like each other some of them use for example buy some tickets in the sunday so they are similar together and there is no clear boundary and sometimes when you are using k-means or k-mediates you're gonna choose a wrong way to cluster your customers and sometimes it's better to use density-based algorithms and if i had enough time to use other techniques okay i think it was a good idea actually i put them in the future works for example to use some other techniques for clustering i think it's really good but not just k means and k medius you know they are really really limited and they have similar results in the most of the cases make sense yeah well is there any where people can follow you online i prefer to use emails or sometimes twitter linkedins for example yeah i have no problem with all of them thank you so much for coming on data skeptic [Music] that concludes another installment of data skeptic k-means thanks to our sponsors today clear ml bb edit by bare-bones software myself claudia rooster as associate producer vanessa bly guest coordinator our show notes are written by david obembe and our host kyle police

Original Description

Have you ever wondered how you can use clustering to extract meaningful insight from a time-series single-feature data? In today’s episode, Ehsan speaks about his recent research on actionable feature extraction using clustering techniques. Want to find out more? Listen to discover the methodologies he used for his research and the commensurate results.

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Data Skeptic · Data Skeptic · 0 of 60

← Previous Next →

Data Skeptic book giveaway contest winner selection

Data Skeptic book giveaway contest winner selection

OpenHouse - Front end and API overview

OpenHouse - Front end and API overview

OpenHouse Crawling with AWS Lambda

OpenHouse Crawling with AWS Lambda

[MINI] Logistic Regression on Audio Data

[MINI] Logistic Regression on Audio Data

Data Provenance and Reproducibility with Pachyderm

Data Provenance and Reproducibility with Pachyderm

[MINI] Primer on Deep Learning

[MINI] Primer on Deep Learning

Big Data Tools and Trends

Big Data Tools and Trends

[MINI] Automated Feature Engineering

[MINI] Automated Feature Engineering

The Data Refuge Project

The Data Refuge Project

[MINI] The Perceptron

[MINI] The Perceptron

[MINI] Feed Forward Neural Networks

[MINI] Feed Forward Neural Networks

Data Science at Patreon

Data Science at Patreon

[MINI] Backpropagation

[MINI] Backpropagation

[MINI] Generative Adversarial Networks

[MINI] Generative Adversarial Networks

[MINI] AdaBoost

[MINI] AdaBoost

[MINI] The Bootstrap

[MINI] The Bootstrap

[MINI] Gini Coefficients

[MINI] Gini Coefficients

[MINI] Random Forest

[MINI] Random Forest

[MINI] Heteroskedasticity

[MINI] Heteroskedasticity

Urban Congestion

Urban Congestion

[MINI] The CAP Theorem

[MINI] The CAP Theorem

Unstructured Data for Finance

Unstructured Data for Finance

Detecting Terrorists with Facial Recognition?

Detecting Terrorists with Facial Recognition?

Predictive Models on Random Data

Predictive Models on Random Data

[MINI] F1 Score

[MINI] F1 Score

Machine Learning on Images with Noisy Human-centric Labels

Machine Learning on Images with Noisy Human-centric Labels

The Library Problem

The Library Problem

Stealing Models from the Cloud

Stealing Models from the Cloud

Data Science at eHarmony

Data Science at eHarmony

Multiple Comparisons and Conversion Optimization

Multiple Comparisons and Conversion Optimization

Election Predictions

Election Predictions

[MINI] Calculating Feature Importance

[MINI] Calculating Feature Importance

MS Connect Conference

MS Connect Conference

The Police Data and the Data Driven Justice Initiatives

The Police Data and the Data Driven Justice Initiatives

Studying Competition and Gender Through Chess

Studying Competition and Gender Through Chess

[MINI] Goodhart's Law

[MINI] Goodhart's Law

Trusting Machine Learning Models with LIME

Trusting Machine Learning Models with LIME

Predictive Policing

Predictive Policing

Mutli-Agent Diverse Generative Adversarial Networks

Mutli-Agent Diverse Generative Adversarial Networks

[MINI] Convolutional Neural Networks

[MINI] Convolutional Neural Networks

Unsupervised Depth Perception

Unsupervised Depth Perception

[MINI] Max-pooling

[MINI] Max-pooling

Activation Functions

Activation Functions

[MINI] The Vanishing Gradient

[MINI] The Vanishing Gradient

Estimating Sheep Pain with Facial Recognition

Estimating Sheep Pain with Facial Recognition

[MINI] Conditional Independence

[MINI] Conditional Independence

MINI: Bayesian Belief Networks

MINI: Bayesian Belief Networks

Project Common Voice

Project Common Voice

[MINI] Recurrent Neural Networks

[MINI] Recurrent Neural Networks

The video teaches how to use clustering techniques to extract meaningful insights from time-series single-feature data, with applications in customer clustering. It covers the methodologies and results of recent research in this area. By watching this video, viewers can learn how to apply clustering techniques to their own data analysis tasks.

Key Takeaways

Identify time-series single-feature data for analysis
Choose a suitable clustering technique
Apply the clustering technique to the data
Extract actionable features from the clustered data
Analyze the results and draw meaningful insights

💡 Clustering techniques can be used to extract meaningful insights from time-series single-feature data, enabling actionable feature extraction and informed decision-making.

🔒 Pro feature: Ask AI to explain this lesson →

More on: ML Pipelines

View skill →

Building a Dog Breed Identifier App from scratch - DogNet

Building a Dog Breed Identifier App from scratch - DogNet

Aladdin Persson

Complete Dockers For Data Science Tutorial In One Shot

Complete Dockers For Data Science Tutorial In One Shot

Part 6 | Deploy ML Model on Kubernetes | Auto-Scaling with HPA and Monitoring with Prometheus

Part 6 | Deploy ML Model on Kubernetes | Auto-Scaling with HPA and Monitoring with Prometheus

Abonia Sojasingarayar

Vertex Pipelines: Qwik Start

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Automate R scripts with GitHub Actions: Deploy a model

Related AI Lessons

When the Camera Becomes an Exam Proctor: Building an AI-Powered Exam Monitoring System with…

Learn how to build an AI-powered exam monitoring system using Computer Vision and DeepFace to assist professional certification exams

Medium · Python

When the Camera Becomes an Exam Proctor: Building an AI-Powered Exam Monitoring System with…

Build an AI-powered exam monitoring system using Computer Vision and Deep Learning to enhance professional certification exams

Medium · Deep Learning

When the Camera Becomes an Exam Proctor: Building an AI-Powered Exam Monitoring System with…

Build an AI-powered exam monitoring system using Computer Vision and Deep Learning to enhance exam security and integrity

Medium · Cybersecurity

Your Face Is About to Become Your Phone Number

Indonesia's mandatory facial verification for SIM cards is a massive test for biometric identity verification at scale, with implications for developers in computer vision and biometrics

Marketing management for ugc net| Important topics of marketing management ugc net commerce dec 2023

Bhoomi Learning Centre~Dr. Muskan