AI Research Weekly Update August 18th, 2019
Key Takeaways
The video discusses recent advancements in AI research, including the development of a large transformer model with 8.3 billion parameters, improvements in natural language processing, and breakthroughs in speech recognition and speaker diarization, as covered in research papers from Google, Facebook, and Berkeley.
Full Transcript
[Music] thanks for watching the first episode of AI research weekly update from Henry AI Labs Foley's introduction is going to be the list of blogs source to make this video this includes things like Google's AI research blog Facebook AI 2 blogs like Berkeley's air Research Lab blog please leave blog suggestions in the comments you think should be covered for the making of this video so the way that this weekly update series is going to be organized is gonna cover blog post from August 11 Tuesday August 18 so for this week for example it's going to cover things like in videos project magnetron Facebook's physical reasoning environment and Google's project euphoria announcement I hope this video series can be inspirational to an researchers they decide to software engineers graduate students and anyone else who might be interested in keeping up with artificial intelligence and particularly deep learning research thanks for watching this is the list of AI research lab blogs that have been covered for this first weekly update video the ordering of these blogs is not a rank order it has no meaning which ones are presented first and last and again these blogs their posts are only included if they've posted in this window from August 11th to August 18th [Music] the first blog post covered in this series is in videos project Megatron this is a really interesting research study that trains the largest-ever transformer model language model with eight point three billion parameters by using a really clever multi-gpu model parallelism scheme so language models like Bert and gbt to have used enormous amounts of parameters gbt - especially but this model is 24 times the size of bert and five point six times the size of GPT - so the way that they build this model is through a multi GPU parallelism system so model parallelism is this idea of splitting the layers of the deep neural network across different GPUs so this is pretty straightforward with neural networks that are very sequential but with the transformer model shown here with the query key value mechanism it's a bit more tricky to figure out how to parallelize the model so on the other end of model for parallelism is a data parallelism you would distribute the different datasets to train them about like different subsets of the data across different GPUs so they're multi-gpu training scheme it trains this eight point three billion parameter GPT - and they show these plots showing how their technique is able to scale and improve training efficiency with more GPUs so at just adding more GPUs to training the neural network isn't necessarily you know it's not just a given that it's going to improve the training you need these algorithms to really utilize the multiple GPUs so interestingly as well they show the plot of the different model sizes and it's web text validation perplexity so they show that they do early stopping with the eight point three billion parameter model and they attributed to overfitting so in the end when they train the larger models you can see the 345 million parameter up to the eight point three billion parameter model and how they perform on these different natural language processing techniques so this is a really interesting blog post on model parallelism and how they can use it to train an eight point three billion a transformer language model this week Facebook's AI Research Lab blog announced the fire AI benchmark for physical reasoning what fire is is it's a it's like an environment for reinforcement learning agents where they have to select where to place this red ball on the map such that the green ball touches the blue ball so you see like in in this puzzle the model has like an initial state of the world like the green balls place here this red ball and then the model with the reinforcement learning agent would optimally choose to place the red ball right here such that it drops onto this thing and launches the green ball over to touch the blue ball so the fire benchmark consists of 50 of these kinds of situations and it's really interesting because it requires the reinforcement learning agent to have a physical understanding and reasoning contrastingly to other popular reinforcement learning environments like go Starcraft and dota so it's really interesting in the context of building robots that can learn about physics and specifically do it quickly as they really emphasize this in their post that they try to reward the reports learning agents that can do this in as few trials as possible Facebook additionally published new advances in natural language processing to better connect people this article covers their advances on the WMT machine translation competition using models such as the roberto model from facebook which stands for robustly optimized bird with a new pre training approach so they discuss how successful their techniques and you know the overall research has been on the WMT benchmarks of English to German German English English to Russian and Russian to English they describe how they improve their translation language models by incorporating a highly structured loss function that has like a four word metric a backward translation metric and then a metric fluency so in the article they describe their report they describe some of the recent advances across NLP metrics like general language understanding and reading comprehension from examinations so they also talk about about the need for new data sets and new challenges new metrics things like going from the glue baseline to the the superglue metric so some of the new datasets and challenges that they describe include things like instead of just asking trivia questions like whether jellyfish have a brain they want the natural native processing systems to have in-depth answers to questions such as how to jellyfish function without a brain so overall this blog post is a great way of getting caught up with natural language processing it almost reads like a survey on new advances and it has a lot of links to new datasets and challenges in natural language processing and understanding our first blog post from Google a eyes blog is on joint speech recognition and speaker Diaries ation via sequence transduction so this idea of speaker Diaries ation is a really important problem in understanding medical conversations this is the task of saying who said what did the doctors say this or did the patient say this when transcribing from speech text so they present their model the RNN transducer which improves performance from a twenty percent error rate to two percent error rate in the ER ization error rate of classifying who said what so they described in their blog post the conventional system of tokenizing words and classifying them based on the labels and basically they just detail how the old technique works and then they describe how their recurrent neural network technique combines the speech recognition with the prediction of who is the you know who is saying what in the sequence so overall they show this graph showing how their recurrent neural network transducer system outperforms a conventional system on different kinds of data sets in speaker Dyer ization Google's Project euphoria is a personalized speech recognition for non-standard speech so what they discuss is problems with people who have you know speaking disabilities or heavy accents and how it can be difficult for automatic speech recognition systems to interpret and transcribe speech of text from these speakers so what they do is they collect a data set of people speaking who have ALS 36 hours of audio from 67 speakers and they fine-tune their state-of-the-art speech to text models such as the listen attendants Bell model and the RNN transducer this is the arm and trendek seducer and this is the listen attendants Bell model so these this dataset consists of audio such as this so they show how how the standard speech model would interpret these different you know speech commands and then they show how the fine-tuning on this data set can improve the model from the baseline and then you know after fine-tuning it with the customized dialect with the two different models so overall this shows they show insights into how they can fine-tune and how they can overcome this problem to make automatic speech recognition systems more accessible to people with speaking disabilities or heavy accents the next blog post covered is from Berkeley's AI research lab this blog post talks about evaluating and testing unintended memorization in neural networks this this phenomenon is best described in this cartoon the guys at the laptop typing in long live the revolution our next meeting will be at and then the autocomplete will like reveal this kind of private information so we they want to avoid language models that would do things like give away credit card numbers or other kinds of sensitive information because of the way that it's trained in a way that language models learn to predict the next token in a sequence so what they present in this blog post is a technique for quantifying memorization by using generative models and the likelihood that they place on the sensitive information compared to others they they also show this interesting experiment where they insert this idea of the random number is two eight one two six five zero one seven in a in the penn treebank data set and how when they train the model and then they see it with the random number is two eight one two the model happily predicts the remaining subjects of 6501 7 the final blog post covered in the first episode of AI research weekly update is the batch from deep learning day i this is a weekly newsletter that covers different blog posts and recommend reading the first recommended reading is a tutorial on parameter optimization that covers things like how the atom optimizer works secondly they discussed the greening of AI and the report from the Allen Institute about asking researcher papers to include the efficiency in terms of the floating-point operations and the overall cost of training these and deep neural network models so another thing that presented is shifted data no problem this is a paper that proposes a new replacement layer that alleviates this problem of when you shift images left and right it will result in different classification predictions the next thing is the Edison machine this is a discussion on whether an AI can patent different technologies based on using neural network knowledge domain models that form new associations and come up with new ideas the next idea is neural nets a paper on using Gans and CNN's to automate the process of coming up with instructions for fabric knitting you know for February so the next paper is recomendation rigorous research so this is a study on the reproducibility of recommender systems research showing that of 18 recent neural network models for top-end recommendation only 7 of them were reproducible and six underperformed traditional approaches that don't use deep learning thanks for watching our first attempt at AI research weekly update please leave the comment if you feel like the blog post the research wasn't covered in enough detail or not enough research was covered or if there are certain labs that should be source that aren't on this list thank you so much for watching the first AI research weekly update please subscribe to Henry AI labs for more artificial intelligence and deep learning videos
Original Description
Thanks for watching this first episode of AI Research Weekly Update. Please comment what you liked and disliked about this report and how it can be improved!
Thanks for watching! Please Subscribe!
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Connor Shorten · Connor Shorten · 53 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
▶
54
55
56
57
58
59
60
DenseNets
Connor Shorten
DeepWalk Explained
Connor Shorten
Inception Network Explained
Connor Shorten
StackGAN
Connor Shorten
StyleGAN
Connor Shorten
Progressive Growing of GANs Explained
Connor Shorten
Improved Techniques for Training GANs
Connor Shorten
Word2Vec Explained
Connor Shorten
Must Read Papers on GANs
Connor Shorten
Unsupervised Feature Learning
Connor Shorten
Self-Supervised GANs
Connor Shorten
Embedding Graphs with Deep Learning
Connor Shorten
Transfer Learning in GANs
Connor Shorten
ReLU Activation Function
Connor Shorten
AC-GAN Explained
Connor Shorten
SimGAN Explained
Connor Shorten
DC-GAN Explained!
Connor Shorten
ResNet Explained!
Connor Shorten
Graph Convolutional Networks
Connor Shorten
Neural Architecture Search
Connor Shorten
Henry AI Labs
Connor Shorten
Video Classification with Deep Learning
Connor Shorten
BigGANs in Data Augmentation
Connor Shorten
Introduction to Deep Learning
Connor Shorten
EfficientNet Explained!
Connor Shorten
Self-Attention GAN
Connor Shorten
Curriculum Learning in Deep Neural Networks
Connor Shorten
Deep Learning Podcast #1 | Edward Dixon | Stochastic Weight Averaging
Connor Shorten
Deep Compression
Connor Shorten
Skin Cancer Classification with Deep Learning
Connor Shorten
Deep Learning Podcast #2 | Edward Peake | Deep Learning in Medical Imaging
Connor Shorten
The Lottery Ticket Hypothesis Explained!
Connor Shorten
SqueezeNet
Connor Shorten
GauGAN Explained!
Connor Shorten
AutoML with Hyperband
Connor Shorten
DL Podcast #3 | Yannic Kilcher | Population-Based Search
Connor Shorten
Weakly Supervised Pretraining
Connor Shorten
Image Data Augmentation for Deep Learning
Connor Shorten
Unsupervised Data Augmentation
Connor Shorten
Wide ResNet Explained!
Connor Shorten
RevNet: Backpropagation without Storing Activations
Connor Shorten
GANs with Fewer Labels
Connor Shorten
BigBiGAN Unsupervised Learning!
Connor Shorten
Self-Supervised Learning
Connor Shorten
Multi-Task Self-Supervised Learning
Connor Shorten
Self-Supervised GANs
Connor Shorten
Population Based Training
Connor Shorten
Show, Attend and Tell
Connor Shorten
Siamese Neural Networks
Connor Shorten
WaveGAN Explained!
Connor Shorten
VAE-GAN Explained!
Connor Shorten
Evolution in Neural Architecture Search!
Connor Shorten
AI Research Weekly Update August 18th, 2019
Connor Shorten
Weight Agnostic Neural Networks Explained!
Connor Shorten
AI Research Weekly Update August 25th, 2019
Connor Shorten
Neuroevolution of Augmenting Topologies (NEAT)
Connor Shorten
CoDeepNEAT
Connor Shorten
AI Research Weekly Update September 1st, 2019
Connor Shorten
Randomly Wired Neural Networks
Connor Shorten
Genetic CNN
Connor Shorten
More on: Reading ML Papers
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way
Medium · AI
ICMI 2026 Reviews [D]
Reddit r/MachineLearning
Workshop submission for main conference paper under review [D]
Reddit r/MachineLearning
Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]
Reddit r/MachineLearning
🎓
Tutor Explanation
DeepCamp AI