📰 Dev.to · Rijul Rajesh
Articles from Dev.to · Rijul Rajesh · 94 articles · Updated every 3 hours · View all reads
All
⚡ AI Lessons (9061)
ArXiv cs.AIDev.to · FORUM WEBForbes InnovationOpenAI NewsDev.to AIHugging Face Blog

Dev.to · Rijul Rajesh
2d ago
Understanding Transformers Part 4: Introduction to Self-Attention
In the previous article, we learned how word embeddings and positional encoding are combined to...

Dev.to · Rijul Rajesh
3d ago
Understanding Transformers Part 3: How Transformers Combine Meaning and Position
In the previous article, we learned how positional encoding is generated using sine and cosine waves....

Dev.to · Rijul Rajesh
5d ago
Understanding Transformers Part 2: Positional Encoding with Sine and Cosine
In the previous article, we converted words into embeddings. Now let’s see how transformers add...

Dev.to · Rijul Rajesh
6d ago
Understanding Transformers Part 1: How Transformers Understand Word Order
In this article, we will explore transformers. We will work on the same problem as before:...

Dev.to · Rijul Rajesh
1w ago
Understanding Attention Mechanisms – Part 6: Final Step in Decoding
In the previous article, we obtained the initial output, but we didn’t receive the EOS token yet. To...

Dev.to · Rijul Rajesh
1w ago
Understanding Attention Mechanisms – Part 5: How Attention Produces the First Output
In the previous article, we stopped at using the softmax function to scale the scores. When we scale...

Dev.to · Rijul Rajesh
1w ago
Understanding Attention Mechanisms – Part 4: Turning Similarity Scores into Attention Weights
In the previous article, we just explored the benefits of using dot product instead of cosine...

Dev.to · Rijul Rajesh
1w ago
Cosine Similarity vs Dot Product in Attention Mechanisms
For comparing the hidden states between the encoder and decoder, we need a similarity score. Two...

Dev.to · Rijul Rajesh
2w ago
Understanding Attention Mechanisms – Part 3: From Cosine Similarity to Dot Product
In the previous article, we explored the comparison between encoder and decoder outputs. In this...

Dev.to · Rijul Rajesh
2w ago
Understanding Attention Mechanisms – Part 2: Comparing Encoder and Decoder Outputs
In the previous article, we explored the main idea of attention and the modifications it requires in...

Dev.to · Rijul Rajesh
2w ago
Understanding Attention Mechanisms – Part 1: Why Long Sentences Break Encoder–Decoders
In the previous articles, we understood Seq2Seq models. Now, on the path toward transformers, we need...

Dev.to · Rijul Rajesh
2w ago
Understanding Seq2Seq Neural Networks – Part 8: When Does the Decoder Stop?
In the previous article, we saw the translation being done. But there is an issue. The decoder does...

Dev.to · Rijul Rajesh
2w ago
Understanding Teacher Forcing in Seq2Seq Models
When we learn about seq2seq neural networks, there is a term we should know called Teacher...

Dev.to · Rijul Rajesh
3w ago
Understanding Seq2Seq Neural Networks – Part 7: Generating the Output with Softmax
In the previous article, we were transforming the outputs to the fully connected layer. A fully...

Dev.to · Rijul Rajesh
3w ago
Understanding Seq2Seq Neural Networks – Part 6: Decoder Outputs and the Fully Connected Layer
In the previous article, we were looking at the embedding values in the encoder and the...

Dev.to · Rijul Rajesh
3w ago
Understanding Seq2Seq Neural Networks – Part 5: Decoding the Context Vector
In the previous article, we stopped at the concept of the context vector. In this article, we will...

Dev.to · Rijul Rajesh
3w ago
Understanding Seq2Seq Neural Networks – Part 4: The Encoder and the Context Vector
In the previous article, we stopped with the problem where we wanted to add more weights and biases...

Dev.to · Rijul Rajesh
3w ago
Understanding Seq2Seq Neural Networks – Part 3: Stacking LSTMs in the Encoder
In the previous article, we created an embedding layer for the input vocabulary In this article, we...

Dev.to · Rijul Rajesh
4w ago
Understanding Seq2Seq Neural Networks – Part 2: Embeddings for Sequence Inputs
In the previous article, we just began with the concept of the sequence to sequence problem, and...

Dev.to · Rijul Rajesh
4w ago
Understanding Seq2Seq Neural Networks – Part 1: The Seq2Seq Translation Problem
There will be problems where we have sequences of one type of thing that need to be translated into...

Dev.to · Rijul Rajesh
1mo ago
Understanding Word2Vec – Part 7: How Negative Sampling Speeds Up Word2Vec
In the previous article, we saw the huge number of weights and mentioned about a technnique called...

Dev.to · Rijul Rajesh
1mo ago
Understanding Word2Vec – Part 6: Two Ways Word2Vec Learns Context
In the previous article, we saw the word embeddings concept, and how training causes similar words to...

Dev.to · Rijul Rajesh
1mo ago
Understanding Word2Vec – Part 5: How Training Creates Word Embeddings
In the previous article, we visualized the vectors on a graph and saw how we can represent similarity...

Dev.to · Rijul Rajesh
1mo ago
Understanding Word2Vec – Part 4: Visualizing Word Vectors
In the previous article, we saw how the next-word prediction is done, and how lack of training is...
DeepCamp AI