Kolmogorov-Arnold Networks: MLP vs KAN, Math, B-Splines, Universal Approximation Theorem

Umar Jamil · Beginner ·📐 ML Fundamentals ·2y ago
In this video, I will be explaining Kolmogorov-Arnold Networks, a new type of network that was presented in the paper "KAN: Kolmogorov-Arnold Networks" by Liu et al. I will start the video by reviewing Multilayer Perceptrons, to show how the typical Linear layer works in a neural network. I will then introduce the concept of data fitting, which is necessary to understand Bézier Curves and then B-Splines. Before introducing Kolmogorov-Arnold Networks, I will also explain what is the Universal Approximation Theorem for Neural Networks and its equivalent for Kolmogorov-Arnold Networks called Kolmogorov-Arnold Representation Theorem. In the final part of the video, I will explain the structure of this new type of network, by deriving its structure step by step from the formula of the Kolmogorov-Arnold Representation Theorem, while comparing it with Multilayer Perceptrons at the same time. We will also explore some properties of this type of network, for example the easy interpretability and the possibility to perform continual learning. Paper: https://arxiv.org/abs/2404.19756 Slides PDF: https://github.com/hkproj/kan-notes Chapters 00:00:00 - Introduction 00:01:10 - Multilayer Perceptron 00:11:08 - Introduction to data fitting 00:15:36 - Bézier Curves 00:28:12 - B-Splines 00:40:42 - Universal Approximation Theorem 00:45:10 - Kolmogorov-Arnold Representation Theorem 00:46:17 - Kolmogorov-Arnold Networks 00:51:55 - MLP vs KAN 00:55:20 - Learnable functions 00:58:06 - Parameters count 01:00:44 - Grid extension 01:03:37 - Interpretability 01:10:42 - Continual learning
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

Chapters (14)

Introduction
1:10 Multilayer Perceptron
11:08 Introduction to data fitting
15:36 Bézier Curves
28:12 B-Splines
40:42 Universal Approximation Theorem
45:10 Kolmogorov-Arnold Representation Theorem
46:17 Kolmogorov-Arnold Networks
51:55 MLP vs KAN
55:20 Learnable functions
58:06 Parameters count
1:00:44 Grid extension
1:03:37 Interpretability
1:10:42 Continual learning
Up next
Advanced Data Structures and Problem-Solving Techniques
Coursera
Watch →