The main purpose is to list all the resources I’ve collected about Machine Learning (ML), Artificial Intelligence (AI), and Large Language Models (LLM).

ML Basics

For a intro to ML, I recommend the following resources:

Courses:

Videos:

3Blue1Brown’s intro to neural network
- The youtube channel also has a lot of good videos about math (linear algebra, calculus, etc.), physics, and a lot of science topics.
Andrew Karpathy’s Neural Networks: Zero to Hero

There are many more excellent introductory materials.

LLM

The two best courses for LLM are:

Stanford CS224n (winter 2025)
- A very good course for a lot of intuitions for LLM with Youtube videos
Stanford CS336 (spring 2025)
- The most hard-core course for LLM with Youtube videos

Videos:

Chinese: GPT 1-3 paper read by Mu Li
- A famous researcher and youtuber, who has good videos about Transformer, GPT1-4, InstructGPT, CLIP, GAN, Whisper, etc.
Google DeepMind: How to Scale Your Model

Training Infra

Introduction to LLM Parallelism
- A good introduction to the parallelism in LLM training from myself.
The Ultra-Scale Playbook: Training LLMs on GPU Clusters
- Hugging Face’s playbook for training LLMs on GPU clusters. Very comprehensive and detailed.

Reinforcement Learning

RL book from Sutton and Barto (2020)
Youtube videos from EZ.Encoder
- Excellent videos about Deepseek, RL, etc.

Interpretability

A Mathematical Framework for Transformer Circuits (Dec 2021)
- Anthropic’s paper conceptualizing the operation of transformers in a new but mathematically equivalent way and making sense of these small models and gain significant understanding of how they operate internally.
In-Context Learning and Induction Heads (Mar 2022)
- Anthropic’s paper arguing that induction heads may be the mechanistic source of general in-context learning in transformer models of any size.
Towards Monosemanticity: Decomposing Language Models with Dictionary Learning (Oct 2023)
- Anthropic’s paper using a sparse autoencoder to extract a large number of interpretable features from a one-layer transformer.
Sparse Autoencoders Find Highly Interpretable Model Directions (Oct 2023)
- Using sparse autoencoders to find meaningful directions within a model’s activations.
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet (May 2024)
- Applying and scaling these ideas to larger, more capable models like Claude 3 Sonnet.
Scaling and evaluating sparse autoencoders (Jun 2024)
- Exploring the practical aspects and effectiveness of sparse autoencoders at scale.
Transcoders find interpretable LLM feature circuits (Jun 2024)
- Focusing on finding interpretable circuits of features within LLMs.
On the Biology of a Large Language Model (Mar 2025)
- Anthropic’s most recent paper for LLM Interpretability.

Just like you can infinite scroll on TikTok, you can infinite scroll on the papers.

Agent

UCB CS294/194-196 Large Language Model Agents
- A good course from UC Berkeley with Youtube videos, which invited a lot of frontier researchers to give lectures.
- List of topics
  - Inference-Time Techniques & Reasoning (CoT, ReAct, RAG, Planning, etc.)
  - Coding Agents
  - Multimodal Autonomous AI Agents
  - AlphaProof, Science Discovery
  - Reinforcement Learning
  - Safety & Vulnerability
  - etc.
- I’m planning to write a summary for this course.

###

All About Transformer Inference

Recent LLM Papers (that I read and liked)

Need to mention that a lot of the courses and resources already include a lot of good papers.

Speculative deconding papers
- Fast Inference from Transformers via Speculative Decoding (Nov 2022)
- Accelerating Large Language Model Decoding with Speculative Sampling (Feb 2023)
- DistillSpec: Improving Speculative Decoding via Knowledge Distillation (Oct 2023)
- Decoding Speculative Decoding (Feb 2024)

Vision

Intro:

Image Generation 101: an Introduction A good introduction to the image generation from myself.

Courses:

Stanfodd CS231n
- There seemse to be videos only from 2017

Papers

The intro listed a lot of important papers.
Large Language Models are Zero-Shot Reasoners (May 2022)
- The famours “Let’s think step by step” paper.

News & Blogs

Lil’Log
- Lilian Weng’s blog, ex VP of Research at OpenAI.
Chinese: I usually listen to 大飞
Chinese: A good one is aidaily.win
AI Weekly
The bitter lessons Mar. 2019, Richard Sutton.
- The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin.
The Second Half, Apr. 2025, Shunyu Yao.
Welcome to the Era of Experience, Apr. 2025, David Silver & Richard Sutton, DeepMind.
Conciousness, strongly recommend to listen to Jeffrey Hinton’s talk in 2024.
- Will Digital Intelligence Replace Biological Intelligence Nov 2024.
CBMM10 Panel: Research on Intelligence in the Age of AI Panel discussion of Jeffrey Hinton, Demis Hassabis, Illya Sutskever.

Terms

Attention
Chain of Thought
Flash Attention
ReAct
Transformer

Old

SIFT features: Scale-Invariant Feature Transform, old Visision method, outdated.

People

Must Know

Jeffrey Hinton,
Demis Hassabis,
Ilya Sutskever,
Yoshua Bengio,
Yann LeCun, Meta
Richard Sutton
Sam Altman,
Dario Amodei,
Andrew Ng,
Fei-Fei Li,

Big Names

David Silver,
Ian Goodfellow,
Andrew Karpathy
Jared Kaplan, Anthropic codouner and CSO.
Noam Shazeer
Kaiming He
Jeff Dean
Aidan Gomez
Mustafa Suleyman
Ashish Vaswani

Jinspire

ML Resources Index

Catalog

ML Basics

LLM

The two best courses for LLM are:

Videos:

Training Infra

Reinforcement Learning

Interpretability

Agent

Recent LLM Papers (that I read and liked)

Vision

Papers

News & Blogs

Terms

Flash Attention

Old

People

Must Know

Big Names