Subscribe
Sign in
Home
Notes
LLM Gallery
Support
LLMs From Scratch Book
Reasoning From Scratch Book
Archive
About
Latest
Top
Discussions
Components of A Coding Agent
How coding agents use tools, memory, and repo context to make LLMs work better in practice
Apr 4
•
Sebastian Raschka, PhD
559
55
52
March 2026
A Visual Guide to Attention Variants in Modern LLMs
From MHA and GQA to MLA, sparse attention, and hybrid architectures
Mar 22
•
Sebastian Raschka, PhD
336
14
28
February 2026
A Dream of Spring for Open-Weight LLMs: 10 Architectures from Jan-Feb 2026
A Round Up And Comparison of 10 Open-Weight LLM Releases in Spring 2026
Feb 25
•
Sebastian Raschka, PhD
203
12
20
January 2026
Categories of Inference-Time Scaling for Improved LLM Reasoning
And an Overview of Recent Inference-Scaling Papers
Jan 24
•
Sebastian Raschka, PhD
41
2
December 2025
The State Of LLMs 2025: Progress, Problems, and Predictions
A 2025 review of large language models, from DeepSeek R1 and RLVR to inference-time scaling, benchmarks, architectures, and predictions for 2026.
Dec 30, 2025
•
Sebastian Raschka, PhD
513
39
55
LLM Research Papers: The 2025 List (July to December)
In June, I shared a bonus article with my curated and bookmarked research paper lists to the paid subscribers who make this Substack possible.
Dec 30, 2025
•
Sebastian Raschka, PhD
36
3
From DeepSeek V3 to V3.2: Architecture, Sparse Attention, and RL Updates
Understanding How DeepSeek's Flagship Open-Weight Models Evolved
Dec 3, 2025
•
Sebastian Raschka, PhD
264
13
28
November 2025
Beyond Standard LLMs
Linear Attention Hybrids, Text Diffusion, Code World Models, and Small Recursive Transformers
Nov 4, 2025
•
Sebastian Raschka, PhD
372
28
36
October 2025
Understanding the 4 Main Approaches to LLM Evaluation (From Scratch)
Multiple-Choice Benchmarks, Verifiers, Leaderboards, and LLM Judges with Code Examples
Oct 5, 2025
•
Sebastian Raschka, PhD
367
27
34
September 2025
Understanding and Implementing Qwen3 From Scratch
A Detailed Look at One of the Leading Open-Source LLMs
Sep 6, 2025
•
Sebastian Raschka, PhD
122
6
12
August 2025
From GPT-2 to gpt-oss: Analyzing the Architectural Advances
And How They Stack Up Against Qwen3
Aug 9, 2025
•
Sebastian Raschka, PhD
627
47
55
July 2025
The Big LLM Architecture Comparison
From DeepSeek-V3 to Kimi K2: A Look At Modern LLM Architecture Design
Jul 19, 2025
•
Sebastian Raschka, PhD
1,892
88
169
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts