projects

selected projects

ML · infra · interfaces

001

PiTorch: ML on Baremetal Raspberry Pis

How do you get from a $5 computer to a working language model?

We strip away every layer of abstraction and build up from scratch, running and training models on a cluster of Pi Zeros. No PyTorch, no OS, not even a standard library.

4 Raspberry Pi Zeros wired together
BCM2835R0L 0–3BCM2835R1L 4–7BCM2835R2L 8–11BCM2835R3EmbHead
61×
185×
210×
002

Sparsity is Cool: Interpretability Insights into Sparse Attention

· 16 min read

A new wave of sparse attention methods promise faster, more expressive transformers. But why does sparsity help, and can we use that understanding to make it work even better?

We investigate the mechanisms behind sparse attention and propose improvements based on what we find.

Figs 01, 06.

003

Activault: Scalable, Efficient, and Fast Model Activation Storage

· 9 min read

Training interpreter models on frontier LLMs requires collecting billions of activations (the model's "mental state" at each step). At scale, storing these becomes prohibitively expensive.

I built and open-sourced Activault to dramatically reduce these costs.

Figs 01, 05.

004

Sieve: SAEs Beat Baselines on a Real-World Task (A Code Generation Case Study)

· 10 min read

Sparse autoencoders can decode what a language model is "thinking," but can that understanding actually be used to improve model behavior on real tasks?

We show SAE-based steering outperforms classical baselines on code generation, and release Sieve, a pipeline for applying SAEs for fine-grained control.

Figs 01, 02.

005

Tilde Stargazer

· 1 min watch

As part of our launch of Tilde Research, Tina  and I built Stargazer, where you can explore the internals of a Llama model, powered by one of our interpreter models.

When you submit a prompt, each word the model outputs reveals a night sky of constellations, each star a feature the model activated while generating that word, exposing the concepts it drew on before choosing each token.

Stargazer's  backend is no longer live.

Demo video. Posted on X