how to read research papers, in 5 minutes.

9.13.25· 5 min read·

From what I can tell, most people aren’t taught how to read research papers. Inside academia it’s treated as an assumed skill you’ll pick up on your own, and outside there’s this false barrier of being too technical or tedious to bother with.

I spent roughly three years treating paper-reading as a chore. Now it’s something I quite enjoy [2].

Looking back, here are the three most useful principles:

#1. Intuition → empirics → details. In that order.

What’s the most efficient way to digest a paper? Nothing has worked better for me than the framework of: (1) grasp the big picture, (2) check the evidence, and (3) dive into details only if it’s worth the time.

INTUITION. Can you summarize the paper in your own words in three sentences? If nothing else, you’ll want a rough sense of what the paper is doing and why.
EMPIRICS. (Empirics = the evidence and data). How well does the method work, really?

Because almost all papers are trying to advertise themselves, getting to the truth isn’t easy.
1. A great place to start is the charts. How cleanly a paper tells its story through figures is a strong proxy for quality. (If you can, think of the few foundational, long-lived ML papers; clarity like that is rarely found in incremental “+1% on another benchmark” work.)

GPT-3 scaling law GPT-3 scaling law. As models get bigger and use more compute, their loss ("error") steadily goes down. A clear, convincing pattern: more compute = better performance.

Grokking phenomenon Grokking . At first the model just memorizes the training examples (red line jumps up fast). Only much later does it figure out the underlying pattern and start getting new examples right (green line).

Chain-of-thought prompting Chain-of-thought prompting . Beyond result graphs, strong papers often have helpful explanatory visuals. This figure shows how adding step-by-step reasoning to the model's answers makes a huge difference.

DETAILS: Only if you decide it’s worth the time. People differ most when it comes to this:
- Coming from an engineering background, writing some code to follow along the paper made theory tangible, and in turn, much easier to understand. For example: my Hopfield networks code .
- A math PhD I worked with preferred diving straight into the definitions and methods, since “everything else was built on that.” This was always too slow for me.
- In some cases going through the derivations yourself and confirming them can be incredibly useful not just for learning but also because they are not always right [3]!

At any step, you can stop. Often you’ll get the intuition behind 20 papers and only decide to invest further into the 2 most promising ones, which is the point!

Taking a step back, why does speed matter? Because you’ll want to:

#2. Ruthlessly optimize: figure out what works for you.

Read a lot and fast [4], while trying to optimize for these three dimensions:

WHAT: Developing research taste. ML research papers are especially vulnerable to Sturgeon’s Law (90% of everything is crap). If you take too long on each one, you’ll never see enough to build a sense for quality. There were many days I spent hours on a single paper, but the problem is that no paper exists in a vacuum, and each idea’s significance usually comes from how it fits into the larger web of ideas around it (the other approaches, the benchmarks, the inspiration).
HOW: Figure out your process for understanding. I tried physical notepad, Jupyter notebooks, and ended up using a Notion database. Here’s my process:
1. Paste the whole paper into a saved prompt that gives me three paragraphs: intuition, empirics, details. I find it’s a useful bird’s-eye view for layering questions on after.
2. Once finished, add an entry to my Notion database (now my bookshelf ):
  - 3 sentence "TLDR”
  - 2 sentence “Two Cents” (subjective comment)
  - Appreciation and importance ratings (ranking papers is difficult but good practice)
WHERE: Know where to find papers [5]. Some people curate their Twitter feeds, others use Deep Research surveys, others follow citations down rabbit holes. You should try to define your purpose for reading at all. Sometimes I wanted a survey of what’s been tried, other times I wanted to see old problems in entirely new framings [6]; this clarity helps you choose.

This is all to say: just try shit. The worst thing you can do is copy someone else’s recipe (so don’t take my advice beyond using it as a starting point!). I think this is also why people aren’t taught how to read papers, or why the advice is conflicting (do you read the abstract first or last? I’ve been told to do both).

#3. Use AI tools, but make thinking hard.

On one hand, there’s no better time to learn new things: ChatGPT and other AI tools can explain just about anything. On the other hand, I am very, very grateful I learned to code before those tools existed. When friction is stripped away, real learning gets harder. (Ironically, I’ve seen friends take so much longer to pick up programming with AI than they would have without it.)

What happens when that [tacit knowledge — things we know but cannot explain] gets replaced by a second kind: things we claim to know but never really did? It looks the same on the outside — in both cases we sound confident, in both cases we feel informed — but only one of them survives challenge.

- Joan Westenberg, Cognitive Hygiene: Why You Need to Make Thinking Hard Again

As such, I use AI to help me understand (I even craft and save prompts to paste), but when it comes to writing the 3 sentence summary or brief personal comments, I never allow AI. It’s surprisingly difficult, and that’s the point. You can find other ways to build back friction into your learning.

Lastly, hone the skill of asking questions. AI can give you answers, but the burden of asking the right questions is still on you. I remind myself that I am always in one of two buckets:

I understand this topic well enough to comment on it / judge it / use it, or
I don’t.

The only thing separating the latter from the former is a series of questions. In particular, know that many papers can be really hard to grasp if you don’t understand the “prerequisite ideas” [7].

So if you’re confused, you should search for the precise next question you need answered, and fill the gaps one by one. Be unapologetically curious.

Thank you to Dhruv Pai for teaching me most of the above [8], and to Dhruv, Matthew Noto , Tina Mai , Nathan Chen , Krish Parikh, William Zhao, and Vedant Khanna for their feedback across drafts.

Thanks for reading! If you enjoyed this, I'd love to add you to my mailing list for new essays:

#Notes

In interest of keeping the post short, I moved many notes and examples here. I think they're worth reading through!

If you’re totally new to reading papers (or to the field), go through the first few slowly.

Pick a few popular papers and grind through the ambiguity. For my first handful of papers in a new area, I go line-by-line and search up all the unfamiliar terms to build that missing context.

Once you’re past the initial slog, you can start applying the rest of the post (sampling, skimming, streamlining your approach). But I do think the first few should be deliberately hard and long.

[1]

If you’re totally new to reading papers (or to the field), go through the first few slowly.

Pick a few popular papers and grind through the ambiguity. For my first handful of papers in a new area, I go line-by-line and search up all the unfamiliar terms to build that missing context.

Once you’re past the initial slog, you can start applying the rest of the post (sampling, skimming, streamlining your approach). But I do think the first few should be deliberately hard and long.

Side note: why care about reading papers well?

I believe people should understand how things actually work, internally . Too many people work on the surface and it’s for a lack of trying, not a lack of technical ability.
Without reading sources, you defer judgment instead of making it. Recently I came across Neurode , a company building a headband that helps you focus just by wearing it for 20 minutes, “created with individuals with ADHD in mind” and “exclusively for adult use.” They cite 8 papers in their science section , but if you actually read the sources, none of them provide convincing evidence that tRNS helps at all. Certainly not for ADHD, which seems to be central to their marketing (and possibly their fundraising), for which they cite only two studies, both done on children, and both only showing improvements with parent-reported scores (and harm on sleep!). The first one didn’t even have a control group, so it could all be because of the training or expectation (placebo).

[2]

Side note: why care about reading papers well?

I believe people should understand how things actually work, internally . Too many people work on the surface and it’s for a lack of trying, not a lack of technical ability.
Without reading sources, you defer judgment instead of making it. Recently I came across Neurode , a company building a headband that helps you focus just by wearing it for 20 minutes, “created with individuals with ADHD in mind” and “exclusively for adult use.” They cite 8 papers in their science section , but if you actually read the sources, none of them provide convincing evidence that tRNS helps at all. Certainly not for ADHD, which seems to be central to their marketing (and possibly their fundraising), for which they cite only two studies, both done on children, and both only showing improvements with parent-reported scores (and harm on sleep!). The first one didn’t even have a control group, so it could all be because of the training or expectation (placebo).

Even the most influential ML papers can contain mistakes. The Adam optimizer , introduced almost a decade ago and still the default way most neural networks are trained, has over 224,000 citations, making it one of the most cited computer science papers ever. Even so, its original convergence proof contained an error and only later corrected .

Similarly, FlashAttention-2 , a major efficiency paper, has a typo in its derivation (still unfixed in v3 ).

[3]

Similarly, FlashAttention-2 , a major efficiency paper, has a typo in its derivation (still unfixed in v3 ).

A friend pointed out they had the opposite problem of taking “read fast” too far; it’s important to also know when to drill deep on fundamental works and exciting results. I agree, but I’d still argue speed is the safer default, because you can always circle back. The process is iterative, and that’s the point anyway: optimize for what works for you.

[4]

Where do you find good papers?

Learning to source papers well matters, because I feel like most of the time when you struggle with reading an ML paper it's actually because the paper's poorly written and not because you aren't knowledgeable enough (thanks to Sturgeon's law again). And then sometimes it's a good paper written in a hard-to-read way (like Titans ).
AlphaXiv is a decent place to see recently popular papers and a good place to start.
Pay attention to the authors and institutions that put out great work. DeepSeek papers are generally very high quality, and so are those out of Chris Re’s lab, Hazy Research . Authors on one outstanding paper are likely to have other good work, too.
It’s worth mentioning that I worried too much about this. It would’ve been so much better for me to ingest a few papers that aren’t “perfect” than to wait.
- For that matter, for any new research there’s NO ground truth on which papers will be important and which are not. (Even PageRank, the idea that put Google in front, was rejected by a SIGIR).

[5]

Where do you find good papers?

Learning to source papers well matters, because I feel like most of the time when you struggle with reading an ML paper it's actually because the paper's poorly written and not because you aren't knowledgeable enough (thanks to Sturgeon's law again). And then sometimes it's a good paper written in a hard-to-read way (like Titans ).
AlphaXiv is a decent place to see recently popular papers and a good place to start.
Pay attention to the authors and institutions that put out great work. DeepSeek papers are generally very high quality, and so are those out of Chris Re’s lab, Hazy Research . Authors on one outstanding paper are likely to have other good work, too.
It’s worth mentioning that I worried too much about this. It would’ve been so much better for me to ingest a few papers that aren’t “perfect” than to wait.
- For that matter, for any new research there’s NO ground truth on which papers will be important and which are not. (Even PageRank, the idea that put Google in front, was rejected by a SIGIR).

I really love papers that do this kind of reframing. In some sense they stitch together threads from different corners of the field.

Universal Hopfield Networks : connects Hopfield networks (1982) and transformers by showing that the Hopfield update rule is mathematically equivalent to an attention mechanism. Under this lens, even sparse autoencoders can be expressed as UHNs.
Fast Weight Programmers : revisits Schmidhuber’s idea of networks that generate their own temporary weights (1991), connecting to both modern attention mechanisms and state-space models.
Vector quantization is a sparse MLP : shows that the quantization-plus-embedding step used in models like VQ-VAE is mathematically equivalent to a sparse MLP layer.

[6]

I really love papers that do this kind of reframing. In some sense they stitch together threads from different corners of the field.

Universal Hopfield Networks : connects Hopfield networks (1982) and transformers by showing that the Hopfield update rule is mathematically equivalent to an attention mechanism. Under this lens, even sparse autoencoders can be expressed as UHNs.
Fast Weight Programmers : revisits Schmidhuber’s idea of networks that generate their own temporary weights (1991), connecting to both modern attention mechanisms and state-space models.
Vector quantization is a sparse MLP : shows that the quantization-plus-embedding step used in models like VQ-VAE is mathematically equivalent to a sparse MLP layer.

For example, I wasted a lot of time trying to understand FlashAttention (a GPU-optimized version of attention) before knowing how a GPU works.

I always remind myself: the tree is not that deep. Every concept sits on a branch of some larger tree of ideas, and by climbing upward to fill missing gaps, you realize there aren’t nearly as many as it first seems.

I like to think that this approach of backfilling as you go makes you ready to read any paper.

[7]

For example, I wasted a lot of time trying to understand FlashAttention (a GPU-optimized version of attention) before knowing how a GPU works.

I like to think that this approach of backfilling as you go makes you ready to read any paper.

Find other people to learn from, whether it be by joining a reading group, club, or otherwise. Mentorship is one of the strongest accelerators for learning. And it makes it so much more fun!

[8]

Find other people to learn from, whether it be by joining a reading group, club, or otherwise. Mentorship is one of the strongest accelerators for learning. And it makes it so much more fun!

Back to writing