📰 Dev.to · Gary Jackson

13 articles · Updated every 3 hours · View all reads

All Articles 77,824 Blog Posts 103,271 Tech Tutorials 18,969 Research Papers 16,890 News 13,423 ⚡ AI Lessons

Chapter 11: The Full GPT - Assembling the Model

Dev.to · Gary Jackson 1mo ago

Chapter 11: The Full GPT - Assembling the Model

Pull everything into a GptModel class, package Adam as a reusable optimiser, and run the real 10,000-step training loop end-to-end.

Chapter 10: Multi-Head Attention and the MLP Block

Dev.to · Gary Jackson 📐 ML Fundamentals ⚡ AI Lesson 1mo ago

Chapter 10: Multi-Head Attention and the MLP Block

Run several attention heads in parallel on embedding slices, add a two-layer MLP for per-position computation, and assemble a transformer block.

Chapter 9: Single-Head Attention - Tokens Looking at Each Other

Dev.to · Gary Jackson 🧠 Large Language Models ⚡ AI Lesson 1mo ago

Chapter 9: Single-Head Attention - Tokens Looking at Each Other

Build causal self-attention with Q/K/V projections, scaled dot-product scoring, softmax weights, and a KV cache for sequential processing.

Chapter 8: RMS Normalisation and Residual Connections

Dev.to · Gary Jackson 📐 ML Fundamentals ⚡ AI Lesson 1mo ago

Chapter 8: RMS Normalisation and Residual Connections

Add two stabilisation patterns deep networks need: RMSNorm to keep activations bounded, and residual connections to give gradients a highway.

Chapter 7: The Training Loop and Adam Optimiser

Dev.to · Gary Jackson 📐 ML Fundamentals ⚡ AI Lesson 1mo ago

Chapter 7: The Training Loop and Adam Optimiser

Assemble a full training loop: forward, loss, backward, and Adam parameter updates with momentum, adaptive scaling, and learning rate decay.

Chapter 6: Embeddings, the Forward Pass, and the Loss Function

Dev.to · Gary Jackson 1mo ago

Chapter 6: Embeddings, the Forward Pass, and the Loss Function

Give tokens and positions learned vector identities, assemble a minimal forward pass to logits, and compute cross-entropy loss.

Chapter 5: Linear Transformation and Softmax

Dev.to · Gary Jackson 1mo ago

Chapter 5: Linear Transformation and Softmax

Introduce Linear (matrix-vector multiply) and Softmax (logits to probabilities) - the two workhorse helpers used throughout the model.

Chapter 4: The Bigram Model - Simplest Possible Language Model

Dev.to · Gary Jackson 📐 ML Fundamentals ⚡ AI Lesson 1mo ago

Chapter 4: The Bigram Model - Simplest Possible Language Model

Implement a counting-based bigram model to pin down the next-token prediction task and establish a loss baseline before neural networks enter.

Chapter 3: The Tokenizer - Text to Numbers and Back

Dev.to · Gary Jackson 1mo ago

Chapter 3: The Tokenizer - Text to Numbers and Back

Build a character-level tokenizer with a BOS delimiter that converts between characters and integer IDs.

Chapter 2: Backward - Automatic Gradient Computation

Dev.to · Gary Jackson 1mo ago

Chapter 2: Backward - Automatic Gradient Computation

Implement Backward() on Value: topologically sort the graph, then walk it in reverse using the chain rule to fill every .Grad.

Chapter 1: The Value Class - Recording the Forward Pass

Dev.to · Gary Jackson 1mo ago

Chapter 1: The Value Class - Recording the Forward Pass

Build a Value wrapper around double that records every operation, so the backward pass in the next chapter can compute gradients automatically.

Chapter 0: Project Setup

Dev.to · Gary Jackson 1mo ago

Chapter 0: Project Setup

Create the .NET console project, download the training data, and scaffold the dispatcher and source files the rest of the course expects.

Building a GPT From Scratch in C# - Introduction

Dev.to · Gary Jackson 1mo ago

Building a GPT From Scratch in C# - Introduction

Why this course exists and what you'll build - a progressive, from-scratch GPT tutorial in C#.