📰 Dev.to · Rijul Rajesh

Articles from Dev.to · Rijul Rajesh · 94 articles · Updated every 3 hours · View all reads

All ⚡ AI Lessons (9061) ArXiv cs.AI Dev.to · FORUM WEB Forbes Innovation OpenAI News Dev.to AI Hugging Face Blog

Understanding Transformers Part 4: Introduction to Self-Attention

Dev.to · Rijul Rajesh 2d ago

Understanding Transformers Part 4: Introduction to Self-Attention

In the previous article, we learned how word embeddings and positional encoding are combined to...

Understanding Transformers Part 3: How Transformers Combine Meaning and Position

Dev.to · Rijul Rajesh 3d ago

Understanding Transformers Part 3: How Transformers Combine Meaning and Position

In the previous article, we learned how positional encoding is generated using sine and cosine waves....

Understanding Transformers Part 2: Positional Encoding with Sine and Cosine

Dev.to · Rijul Rajesh 5d ago

Understanding Transformers Part 2: Positional Encoding with Sine and Cosine

In the previous article, we converted words into embeddings. Now let’s see how transformers add...

Understanding Transformers Part 1: How Transformers Understand Word Order

Dev.to · Rijul Rajesh 6d ago

Understanding Transformers Part 1: How Transformers Understand Word Order

In this article, we will explore transformers. We will work on the same problem as before:...

Understanding Attention Mechanisms – Part 6: Final Step in Decoding

Dev.to · Rijul Rajesh 1w ago

Understanding Attention Mechanisms – Part 6: Final Step in Decoding

In the previous article, we obtained the initial output, but we didn’t receive the EOS token yet. To...

Understanding Attention Mechanisms – Part 5: How Attention Produces the First Output

Dev.to · Rijul Rajesh 1w ago

Understanding Attention Mechanisms – Part 5: How Attention Produces the First Output

In the previous article, we stopped at using the softmax function to scale the scores. When we scale...

Understanding Attention Mechanisms – Part 4: Turning Similarity Scores into Attention Weights

Dev.to · Rijul Rajesh 1w ago

Understanding Attention Mechanisms – Part 4: Turning Similarity Scores into Attention Weights

In the previous article, we just explored the benefits of using dot product instead of cosine...

Cosine Similarity vs Dot Product in Attention Mechanisms

Dev.to · Rijul Rajesh 1w ago

Cosine Similarity vs Dot Product in Attention Mechanisms

For comparing the hidden states between the encoder and decoder, we need a similarity score. Two...

Understanding Attention Mechanisms – Part 3: From Cosine Similarity to Dot Product

Dev.to · Rijul Rajesh 2w ago

Understanding Attention Mechanisms – Part 3: From Cosine Similarity to Dot Product

In the previous article, we explored the comparison between encoder and decoder outputs. In this...

Understanding Attention Mechanisms – Part 2: Comparing Encoder and Decoder Outputs

Dev.to · Rijul Rajesh 2w ago

Understanding Attention Mechanisms – Part 2: Comparing Encoder and Decoder Outputs

In the previous article, we explored the main idea of attention and the modifications it requires in...

Understanding Attention Mechanisms – Part 1: Why Long Sentences Break Encoder–Decoders

Dev.to · Rijul Rajesh 2w ago

Understanding Attention Mechanisms – Part 1: Why Long Sentences Break Encoder–Decoders

In the previous articles, we understood Seq2Seq models. Now, on the path toward transformers, we need...

Understanding Seq2Seq Neural Networks – Part 8: When Does the Decoder Stop?

Dev.to · Rijul Rajesh 2w ago

Understanding Seq2Seq Neural Networks – Part 8: When Does the Decoder Stop?

In the previous article, we saw the translation being done. But there is an issue. The decoder does...

Understanding Teacher Forcing in Seq2Seq Models

Dev.to · Rijul Rajesh 2w ago

Understanding Teacher Forcing in Seq2Seq Models

When we learn about seq2seq neural networks, there is a term we should know called Teacher...

Understanding Seq2Seq Neural Networks – Part 7: Generating the Output with Softmax

Dev.to · Rijul Rajesh 3w ago

Understanding Seq2Seq Neural Networks – Part 7: Generating the Output with Softmax

In the previous article, we were transforming the outputs to the fully connected layer. A fully...

Understanding Seq2Seq Neural Networks – Part 6: Decoder Outputs and the Fully Connected Layer

Dev.to · Rijul Rajesh 3w ago

Understanding Seq2Seq Neural Networks – Part 6: Decoder Outputs and the Fully Connected Layer

In the previous article, we were looking at the embedding values in the encoder and the...

Understanding Seq2Seq Neural Networks – Part 5: Decoding the Context Vector

Dev.to · Rijul Rajesh 3w ago

Understanding Seq2Seq Neural Networks – Part 5: Decoding the Context Vector

In the previous article, we stopped at the concept of the context vector. In this article, we will...

Understanding Seq2Seq Neural Networks – Part 4: The Encoder and the Context Vector

Dev.to · Rijul Rajesh 3w ago

Understanding Seq2Seq Neural Networks – Part 4: The Encoder and the Context Vector

In the previous article, we stopped with the problem where we wanted to add more weights and biases...

Understanding Seq2Seq Neural Networks – Part 3: Stacking LSTMs in the Encoder

Dev.to · Rijul Rajesh 3w ago

Understanding Seq2Seq Neural Networks – Part 3: Stacking LSTMs in the Encoder

In the previous article, we created an embedding layer for the input vocabulary In this article, we...

Understanding Seq2Seq Neural Networks – Part 2: Embeddings for Sequence Inputs

Dev.to · Rijul Rajesh 4w ago

Understanding Seq2Seq Neural Networks – Part 2: Embeddings for Sequence Inputs

In the previous article, we just began with the concept of the sequence to sequence problem, and...

Understanding Seq2Seq Neural Networks – Part 1: The Seq2Seq Translation Problem

Dev.to · Rijul Rajesh 4w ago

Understanding Seq2Seq Neural Networks – Part 1: The Seq2Seq Translation Problem

There will be problems where we have sequences of one type of thing that need to be translated into...

Understanding Word2Vec – Part 7: How Negative Sampling Speeds Up Word2Vec

Dev.to · Rijul Rajesh 1mo ago

Understanding Word2Vec – Part 7: How Negative Sampling Speeds Up Word2Vec

In the previous article, we saw the huge number of weights and mentioned about a technnique called...

Understanding Word2Vec – Part 6: Two Ways Word2Vec Learns Context

Dev.to · Rijul Rajesh 1mo ago

Understanding Word2Vec – Part 6: Two Ways Word2Vec Learns Context

In the previous article, we saw the word embeddings concept, and how training causes similar words to...

Understanding Word2Vec – Part 5: How Training Creates Word Embeddings

Dev.to · Rijul Rajesh 1mo ago

Understanding Word2Vec – Part 5: How Training Creates Word Embeddings

In the previous article, we visualized the vectors on a graph and saw how we can represent similarity...

Understanding Word2Vec – Part 4: Visualizing Word Vectors

Dev.to · Rijul Rajesh 1mo ago

Understanding Word2Vec – Part 4: Visualizing Word Vectors

In the previous article, we saw how the next-word prediction is done, and how lack of training is...