Jinspire

I recently learned more about how LLMs are trained, and I began to notice parallels with my own learning process.

Here are some of the most interesting insights I found:

Pretraining is like schooling. It is foundational. We should keep schooling or “pretraining” ourselves, regardless of age.
Supervised finetuning (SFT) is like working. In a job, we slightly adjust ourselves to prefer certain outputs (behaviors, deliverables) over others.
Reinforcement learning (RL) happens all the time. We are constantly being “RLed by the world.” Rewards like dopamine hits from TikTok doomscrolling keep us hooked, while the lack of immediate rewards for saving money or exercising makes them harder to sustain.

So, what are some practical implications for our own lives?

Embrace Lifelong Pretraining. Keep exposing to new information, ideas, and perspectives — regardless of age.
Output, don’t just read/watch/listen (encode). Teach, write, explain — that’s how knowledge sticks.
Finetune intentionally. Seek out environments and mentors that provide the right supervision for the person we aspire to be.
Be conscious of our reward model. If we don’t set it, the world sets it for us. And the world’s incentives (likes, clicks, quick dopamine) are rarely aligned with our deepest goals.
Learn continually and on-demand. Shift from “learn first, then do” to “learn while doing.” Treat every task as a chance to update our model weights.
Think before talk, plan before act, and take good notes for retrieval.

A Few New Thoughts

Some time ago, I ran a thought experiment:

If humans were LLMs…

- Inventing language → Tokenization
- Taking notes → RAG & context retrieval
- Going to school → Knowledge distillation
- Empathy → Reading hidden states / latent embeddings
- Dividing subjects → Mixture of Experts (MoE)
- Creating new subjects → New tasks, loss functions, and evaluation metrics

There are many more analogies from different perspectives—an interesting lens to explore!

Recently, I’ve been extending this analogy further:

Pretraining is like schooling.
Working is like SFT.
Everyday life is constant RL.
Hallucination is like lying or dreaming.
MoE is more like brain regions than school subjects.
A model looping the same output is like a person getting high and repeating themselves endlessly.

This lens is not perfect, but it makes us reflect: if we are like LLMs, how can we train ourselves better?

Pretraining: The Foundation of Lifelong Schooling

Most of school life is about learning and taking exams — more or less like predicting or selecting the next token.

At first, it feels like rote memorization. But over time, patterns emerge and knowledge crystallizes, allowing us to master a subject and generalize beyond it.

AI researcher Denny Zhou, known for work on reasoning, has argued that reasoning ability already emerges during pretraining(link). If it’s not in the pretrained base, it’s hard to acquire later. Humans are the same: without a broad foundation, it is difficult to pick up deep reasoning skills later in life.

That’s why pretraining is so important. In society, we even see a bias toward certain schools — similar to choosing certain base models to build on.

It is also important to point out that in many places, the schooling system becomes exam-saturated — education optimized almost entirely for test performance. This is similar to overfitting a model on benchmarks: accuracy on the test set rises, but generalization suffers. Real understanding requires a broader distribution of experiences, not just drilling for one metric.

Supervised Fine-Tuning (SFT): Adapting in the Workplace

If pretraining is our foundation, then fine-tuning is how society shapes us later. The work environment is very much like SFT.

Even if our base models (educational backgrounds) differ, the process of being shaped by professional expectations can change us dramatically. A workplace rewards certain responses over others — just like supervised finetuning. Over time, we learn to produce the outputs that are expected, valued, and rewarded.

This also suggests we should finetune intentionally: choose environments, mentors, and projects that push us in the direction we actually want to grow. This means actively seeking projects that stretch us, finding mentors who provide the ‘supervision’ we need, and even consciously adopting the communication styles of people we admire.

Reinforcement Learning (RL): Navigating the World’s Feedback

Every day, we are constantly being shaped by reinforcement learning. The world provides rewards and penalties for our actions, often in the form of immediate feedback loops. As Charlie Munger said, “Never, ever, think about something else when you should be thinking about the power of incentives.”

The most critical component of RL is the reward function. Modern life is filled with dopamine-driven reward loops that can lead us astray. We get instant rewards (dopamine hits) from watching TikTok, so we keep doomscrolling. Conversely, long-term goals like saving money or exercising offer delayed rewards, making them harder to stick with.

Pavlov famously trained dogs through rewards.
Charlie Munger (Buffett’s longtime partner) put it bluntly:

“Show me the incentive, and I will show you the outcome.”
Or in his words: “Never, ever, think about something else when you should be thinking about the power of incentives.”

Reward matters.

If our personal “reward model” is misaligned, we optimize for short-term gratification at the expense of long-term flourishing. This is reward hacking, human-style.

So how do we fix it?

Gamify learning. I’ve been experimenting with “additive learning”: giving myself points and badges for completing tasks, then trading those points for rewards (like phone time). It’s simple, but it works.
Identity as reinforcement. If we see ourselves as a “lifelong learner,” then learning feels less like effort and more like alignment with our identity.
On-demand learning. Richard Sutton, one of the fathers of RL, proposed in his recent “Oak” talk that we should train directly for what is being used, instead of separating pretraining, SFT, and RL. This applies even more to humans: we learn best when the reward is tied directly to real-world use — what Andrej Karpathy, the “vibe coding” propagator, calls “on-demand learning”.

Photo with Andrew Karpathy

Of course, we are not just LLMs. We have consciousness, emotions, and a rich inner world that AI doesn’t. This analogy is a map, not the territory. But like any good map, it can help us navigate the complex landscape of personal growth.

What Are the Implications for Ourselves?

Bringing it all together:

Embrace Lifelong Pretraining The world is too complex to rely solely on our initial schooling. We must continuously engage in broad learning to update our foundational models. Read widely, explore new domains, and stay curious. This builds the robust base needed to adapt to any future “fine-tuning” task.
Output, don’t just read/watch/listen (encode). An encoder by itself is not enough. We can consume endless information (pretraining), but true understanding comes from generating output. This is the core of the Feynman Technique: to truly learn something, try to teach it to someone else. Writing, speaking, and creating are how we distill knowledge and expose the gaps in our thinking.
Finetune intentionally. Don’t just let work or culture shape us passively. Choose environments that reinforce what we want to become. Learn from the greatest.
Be conscious of our reward model. Don’t let the world’s default reward functions dictate our behavior. Consciously define our long-term goals and create immediate, tangible rewards for the actions that lead to them. Align our habits with our identity to make positive behaviors feel natural and effortless.
Learn continually and on-demand. Shift our mindset from a “learn-then-do” model to a “learn-while-doing” one. Treat every task as an opportunity to update our skills. When we encounter a problem, dive deep into the necessary knowledge right then and there. This makes learning relevant and immediately applicable, effectively updating our mental model in real-time.
Think before talk. Add a deliberate pause between stimulus and response. Take one breath, silently outline our point (goal → key claim → 1–2 supports → close), then speak. This tiny buffer reduces knee-jerk replies, improves signal-to-noise, and makes our “outputs” crisper—very much in the spirit of the talk with Sergey Brin, co-founder of Google, and Demis Hassabis, co-founder of DeepMind.
Plan before act. Before acting, do a quick plan-and-execute loop: define the objective, break it into steps, note dependencies, set a stopping criterion (“what proves this step is done?”), then run the steps and reflect. If we hit ambiguity, re-plan. This mirrors agent patterns (ReAct etc.): Plan → Act → Observe → Reflect → Re-plan. It keeps us shipping while staying intentional.
Notes as personal RAG. Treat notes as an external memory we can “retrieve” from: capture atomic snippets (one idea per note), tag them well, link related ideas, and write short summaries after we learn something. When solving a problem, retrieve relevant notes first, then generate our answer. This dramatically cuts relearning time and improves the quality of our “outputs.”
Kindness is moderately independent of intelligence. We can train our mind to become sharper, but cultivating warmth, empathy, and generosity often follows a different path. Both matter, and together they form true wisdom. See Hinton’s talk in Shanghai.

These ideas aren’t just theoretical for me. They’ve inspired me to build tools to help put them into practice, starting with an app I’m developing called “Human Pretraining” and I’m excited to share soon.