tinygrad

Original：🇺🇸 English

Translated

Deep learning framework development with tinygrad - a minimal tensor library with autograd, JIT compilation, and multi-device support. Use when writing neural networks, training models, implementing tensor operations, working with UOps/PatternMatcher for graph transformations, or contributing to tinygrad internals. Triggers on tinygrad imports, Tensor operations, nn modules, optimizer usage, schedule/codegen work, or device backends.

6installs

Sourceav/skills

Added on2026-02-07

NPX Install

npx skill4agent add av/skills tinygrad

SKILL.md Content

View Translation Comparison →

tinygrad

A minimal deep learning framework focused on beauty and minimalism. Every line must earn its keep.

Quick Reference

python

from tinygrad import Tensor, TinyJit, nn, dtypes, Device, GlobalCounters

# Tensor creation
x = Tensor([1, 2, 3])
x = Tensor.rand(2, 3)
x = Tensor.kaiming_uniform(128, 784)

# Operations are lazy until realized
y = (x + 1).relu().sum()
y.realize()  # or y.numpy()

# Training context
with Tensor.train():
  loss = model(x).sparse_categorical_crossentropy(labels).backward()
  optim.step()

Architecture Pipeline

Tensor (
```
tinygrad/tensor.py
```
) - User API, creates UOp graph
UOp (
```
tinygrad/uop/ops.py
```
) - Unified IR for all operations
Schedule (
```
tinygrad/engine/schedule.py
```
) - Converts tensor UOps to kernel UOps
Codegen (
```
tinygrad/codegen/
```
) - Converts kernel UOps to device code
Runtime (
```
tinygrad/runtime/
```
) - Device-specific execution

Training Loop Pattern

python

from tinygrad import Tensor, TinyJit, nn
from tinygrad.nn.datasets import mnist

X_train, Y_train, X_test, Y_test = mnist()
model = Model()
optim = nn.optim.Adam(nn.state.get_parameters(model))

@TinyJit
@Tensor.train()
def train_step():
  optim.zero_grad()
  samples = Tensor.randint(512, high=X_train.shape[0])
  loss = model(X_train[samples]).sparse_categorical_crossentropy(Y_train[samples]).backward()
  return loss.realize(*optim.schedule_step())

for i in range(100):
  loss = train_step()

Model Definition

Models are plain Python classes with

__call__

. No base class required.

python

class Model:
  def __init__(self):
    self.l1 = nn.Linear(784, 128)
    self.l2 = nn.Linear(128, 10)
  def __call__(self, x):
    return self.l1(x).relu().sequential([self.l2])

Available nn modules:

Linear

,

Conv2d

,

BatchNorm

,

LayerNorm

,

RMSNorm

,

Embedding

,

GroupNorm

,

LSTMCell

Optimizers:

SGD

,

Adam

,

AdamW

,

LARS

,

LAMB

,

Muon

State Dict / Weights

python

from tinygrad.nn.state import safe_save, safe_load, get_state_dict, load_state_dict, get_parameters

# Save/load safetensors
safe_save(get_state_dict(model), "model.safetensors")
load_state_dict(model, safe_load("model.safetensors"))

# Get all trainable params
params = get_parameters(model)

JIT Compilation

TinyJit

captures and replays kernel graphs. Input shapes must be fixed.

python

@TinyJit
def forward(x):
  return model(x).realize()

# First call captures, subsequent calls replay
out = forward(batch)

Device Management

python

from tinygrad import Device
print(Device.DEFAULT)  # Auto-detected: METAL, CUDA, AMD, CPU, etc.

# Force device
x = Tensor.rand(10, device="CPU")
x = x.to("CUDA")

Environment Variables

Variable	Values	Description
`DEBUG`	1-7	Increasing verbosity (4=code, 7=asm)
`VIZ`	1	Graph visualization
`BEAM`	#	Kernel beam search width
`NOOPT`	1	Disable optimizations
`SPEC`	1-2	UOp spec verification

Debugging

bash

# Visualize computation graph
VIZ=1 python -c "from tinygrad import Tensor; Tensor.ones(10).sum().realize()"

# Show generated code
DEBUG=4 python script.py

# Run tests
python -m pytest test/test_tensor.py -xvs

UOp and PatternMatcher (Internals)

UOps are immutable, cached graph nodes. Use PatternMatcher for transformations:

python

from tinygrad.uop.ops import UOp, Ops
from tinygrad.uop.upat import UPat, PatternMatcher, graph_rewrite

pm = PatternMatcher([
  (UPat(Ops.ADD, src=(UPat.cvar("x"), UPat.cvar("x"))), lambda x: x * 2),
])
result = graph_rewrite(uop, pm)

Key UOp properties:

op

,

dtype

,

src

,

arg

,

tag

Define PatternMatchers at module level - they're slow to construct.

Style Guide

2-space indentation, 150 char line limit
Prefer readability over cleverness
Never mix functionality changes with whitespace changes
All functionality changes must be tested
Run
```
pre-commit run --all-files
```
before commits

Testing

bash

python -m pytest test/test_tensor.py -xvs
python -m pytest test/unit/test_schedule_cache.py -x --timeout=60
SPEC=2 python -m pytest test/test_something.py  # With spec verification

tinygrad

NPX Install

Tags

SKILL.md Content

tinygrad

Quick Reference

Architecture Pipeline

Training Loop Pattern

Model Definition

State Dict / Weights

JIT Compilation

Device Management

Environment Variables

Debugging

UOp and PatternMatcher (Internals)

Style Guide

Testing