Learning

While I’ve always felt that I had a pretty strong intuition for solving problems, the field I want to enter is riddled with complicated jargon and fundemental techniques that can’t be learned passively. This is a record of the papers/articles that I’ve read that really felt impactful, and a few key takeaways (pending).

Tangent: Information is a “Paper” if the link is on ArXiv. Otherwise, it’s an “Article”. This doesn’t mean some information is more formal than others, if anything I find myself learning more from articles just because they’re written in a more approachable style.

The Ultra-Scale Playbook: Training LLMs on GPU Clusters

Present

Article: [https://huggingface.co/spaces/nanotron/ultrascale-playbook?section=our_journey_up_to_now]

Defeating Nondeterminism in LLM Inference

December 16th, 2025

Article: [https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/]

Neural Machine Translation By Jointing Learning to Align and Translate

October 27th, 2025

Paper: [https://arxiv.org/abs/1409.0473]

Adam: A Method For Stochastic Optimization

August 25th, 2025

Paper: [https://arxiv.org/abs/1412.6980]

🥐 David He

Explorer

Learning

The Ultra-Scale Playbook: Training LLMs on GPU Clusters

Defeating Nondeterminism in LLM Inference

Neural Machine Translation By Jointing Learning to Align and Translate

Adam: A Method For Stochastic Optimization

Graph View

Table of Contents

🥐 David He

Explorer

Learning

The Ultra-Scale Playbook: Training LLMs on GPU Clusters §

Defeating Nondeterminism in LLM Inference §

Neural Machine Translation By Jointing Learning to Align and Translate §

Adam: A Method For Stochastic Optimization §

Graph View

Table of Contents

The Ultra-Scale Playbook: Training LLMs on GPU Clusters

Defeating Nondeterminism in LLM Inference

Neural Machine Translation By Jointing Learning to Align and Translate

Adam: A Method For Stochastic Optimization