📰 Dev.to · Gary Jackson
13 articles · Updated every 3 hours · View all reads
All
Articles 77,824Blog Posts 103,271Tech Tutorials 18,969Research Papers 16,890News 13,423
⚡ AI Lessons

Dev.to · Gary Jackson
1mo ago
Chapter 11: The Full GPT - Assembling the Model
Pull everything into a GptModel class, package Adam as a reusable optimiser, and run the real 10,000-step training loop end-to-end.

Dev.to · Gary Jackson
📐 ML Fundamentals
⚡ AI Lesson
1mo ago
Chapter 10: Multi-Head Attention and the MLP Block
Run several attention heads in parallel on embedding slices, add a two-layer MLP for per-position computation, and assemble a transformer block.

Dev.to · Gary Jackson
🧠 Large Language Models
⚡ AI Lesson
1mo ago
Chapter 9: Single-Head Attention - Tokens Looking at Each Other
Build causal self-attention with Q/K/V projections, scaled dot-product scoring, softmax weights, and a KV cache for sequential processing.

Dev.to · Gary Jackson
📐 ML Fundamentals
⚡ AI Lesson
1mo ago
Chapter 8: RMS Normalisation and Residual Connections
Add two stabilisation patterns deep networks need: RMSNorm to keep activations bounded, and residual connections to give gradients a highway.

Dev.to · Gary Jackson
📐 ML Fundamentals
⚡ AI Lesson
1mo ago
Chapter 7: The Training Loop and Adam Optimiser
Assemble a full training loop: forward, loss, backward, and Adam parameter updates with momentum, adaptive scaling, and learning rate decay.

Dev.to · Gary Jackson
1mo ago
Chapter 6: Embeddings, the Forward Pass, and the Loss Function
Give tokens and positions learned vector identities, assemble a minimal forward pass to logits, and compute cross-entropy loss.

Dev.to · Gary Jackson
1mo ago
Chapter 5: Linear Transformation and Softmax
Introduce Linear (matrix-vector multiply) and Softmax (logits to probabilities) - the two workhorse helpers used throughout the model.

Dev.to · Gary Jackson
📐 ML Fundamentals
⚡ AI Lesson
1mo ago
Chapter 4: The Bigram Model - Simplest Possible Language Model
Implement a counting-based bigram model to pin down the next-token prediction task and establish a loss baseline before neural networks enter.

Dev.to · Gary Jackson
1mo ago
Chapter 3: The Tokenizer - Text to Numbers and Back
Build a character-level tokenizer with a BOS delimiter that converts between characters and integer IDs.

Dev.to · Gary Jackson
1mo ago
Chapter 2: Backward - Automatic Gradient Computation
Implement Backward() on Value: topologically sort the graph, then walk it in reverse using the chain rule to fill every .Grad.

Dev.to · Gary Jackson
1mo ago
Chapter 1: The Value Class - Recording the Forward Pass
Build a Value wrapper around double that records every operation, so the backward pass in the next chapter can compute gradients automatically.

Dev.to · Gary Jackson
1mo ago
Chapter 0: Project Setup
Create the .NET console project, download the training data, and scaffold the dispatcher and source files the rest of the course expects.

Dev.to · Gary Jackson
1mo ago
Building a GPT From Scratch in C# - Introduction
Why this course exists and what you'll build - a progressive, from-scratch GPT tutorial in C#.
DeepCamp AI