The main purpose is to list all the resources I’ve collected about Machine Learning (ML), Artificial Intelligence (AI), and Large Language Models (LLM).
ML Basics
For a intro to ML, I recommend the following resources:
Courses:
Videos:
- 3Blue1Brown’s intro to neural network
- The youtube channel also has a lot of good videos about math (linear algebra, calculus, etc.), physics, and a lot of science topics.
- Andrew Karpathy’s Neural Networks: Zero to Hero
There are many more excellent introductory materials.
LLM
The two best courses for LLM are:
- Stanford CS224n (winter 2025)
- A very good course for a lot of intuitions for LLM with Youtube videos
- Stanford CS336 (spring 2025)
- The most hard-core course for LLM with Youtube videos
Videos:
- Chinese: GPT 1-3 paper read by Mu Li
- A famous researcher and youtuber, who has good videos about Transformer, GPT1-4, InstructGPT, CLIP, GAN, Whisper, etc.
- Google DeepMind: How to Scale Your Model
Training Infra
- Introduction to LLM Parallelism
- A good introduction to the parallelism in LLM training from myself.
- The Ultra-Scale Playbook:
Training LLMs on GPU Clusters
- Hugging Face’s playbook for training LLMs on GPU clusters. Very comprehensive and detailed.
Reinforcement Learning
- RL book from Sutton and Barto (2020)
- Youtube videos from EZ.Encoder
- Excellent videos about Deepseek, RL, etc.
Interpretability
- A Mathematical Framework for Transformer Circuits (Dec 2021)
- Anthropic’s paper conceptualizing the operation of transformers in a new but mathematically equivalent way and making sense of these small models and gain significant understanding of how they operate internally.
- In-Context Learning and Induction Heads (Mar 2022)
- Anthropic’s paper arguing that induction heads may be the mechanistic source of general in-context learning in transformer models of any size.
- Towards Monosemanticity: Decomposing Language Models with Dictionary Learning (Oct 2023)
- Anthropic’s paper using a sparse autoencoder to extract a large number of interpretable features from a one-layer transformer.
- Sparse Autoencoders Find Highly Interpretable Model Directions (Oct 2023)
- Using sparse autoencoders to find meaningful directions within a model’s activations.
- Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet (May 2024)
- Applying and scaling these ideas to larger, more capable models like Claude 3 Sonnet.
- Scaling and evaluating sparse autoencoders (Jun 2024)
- Exploring the practical aspects and effectiveness of sparse autoencoders at scale.
- Transcoders find interpretable LLM feature circuits (Jun 2024)
- Focusing on finding interpretable circuits of features within LLMs.
- On the Biology of a Large Language Model (Mar 2025)
- Anthropic’s most recent paper for LLM Interpretability.
Just like you can infinite scroll on TikTok, you can infinite scroll on the papers.
Agent
- UCB CS294/194-196 Large Language Model Agents
- A good course from UC Berkeley with Youtube videos, which invited a lot of frontier researchers to give lectures.
- List of topics
- Inference-Time Techniques & Reasoning (CoT, ReAct, RAG, Planning, etc.)
- Coding Agents
- Multimodal Autonomous AI Agents
- AlphaProof, Science Discovery
- Reinforcement Learning
- Safety & Vulnerability
- etc.
- I’m planning to write a summary for this course.
###
All About Transformer Inference
Recent LLM Papers (that I read and liked)
Need to mention that a lot of the courses and resources already include a lot of good papers.
- Speculative deconding papers
Vision
Intro:
- Image Generation 101: an Introduction A good introduction to the image generation from myself.
Courses:
- Stanfodd CS231n
- There seemse to be videos only from 2017
Papers
-
The intro listed a lot of important papers.
-
Large Language Models are Zero-Shot Reasoners (May 2022)
- The famours “Let’s think step by step” paper.
News & Blogs
- Lil’Log
- Lilian Weng’s blog, ex VP of Research at OpenAI.
- Chinese: I usually listen to 大飞
- Chinese: A good one is aidaily.win
- The bitter lessons Mar. 2019, Richard Sutton.
- The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin.
- The Second Half, Apr. 2025, Shunyu Yao.
-
Welcome to the Era of Experience, Apr. 2025, David Silver & Richard Sutton, DeepMind.
- Conciousness, strongly recommend to listen to Jeffrey Hinton’s talk in 2024.
- CBMM10 Panel: Research on Intelligence in the Age of AI Panel discussion of Jeffrey Hinton, Demis Hassabis, Illya Sutskever.
Terms
- Attention
- Chain of Thought
-
Flash Attention
- ReAct
- Transformer
Old
- SIFT features: Scale-Invariant Feature Transform, old Visision method, outdated.
People
Jeffrey Hinton, Demis Hassabis, Ilya Sutskever,
Yoshua Bengio, Ian Goodfellow, Yann LeCun, Meta Richard Sutton
Sam Altman, Dario Amodei, Fei-Fei Li, Andrew Ng, David Silver,
Andrew Karpathy Jared Kaplan, Anthropic codouner and CSO.