How do you get from a $5 computer to a working language model?
We strip away every layer of abstraction and build up from scratch, running and training models on a cluster of Pi Zeros. No PyTorch, no OS, not even a standard library.
A new wave of sparse attention methods promise faster, more expressive transformers. But why does sparsity help, and can we use that understanding to make it work even better?
We investigate the mechanisms behind sparse attention and propose improvements based on what we find.
Training interpreter models on frontier LLMs requires collecting billions of activations (the model's "mental state" at each step). At scale, storing these becomes prohibitively expensive.
I built and open-sourced Activault to dramatically reduce these costs.
Sparse autoencoders can decode what a language model is "thinking," but can that understanding actually be used to improve model behavior on real tasks?
We show SAE-based steering outperforms classical baselines on code generation, and release Sieve, a pipeline for applying SAEs for fine-grained control.
As part of our launch of Tilde Research, Tina and I built Stargazer, where you can explore the internals of a Llama model, powered by one of our interpreter models.
When you submit a prompt, each word the model outputs reveals a night sky of constellations, each star a feature the model activated while generating that word, exposing the concepts it drew on before choosing each token.