rote · companion to IncPy (Guo & Engler, 2011)

01 — What rote does

Memoization without the decorators

Picture an analyze.py that reads a CSV, builds some features, fits a model, and saves a plot. A clean run takes a few minutes. You edit one line in the plotting code and run the file again. Plain Python re-executes everything from the top, including the parts that didn't change.

rote watches the first run. For each function call, it decides whether the work was slow enough to be worth caching and whether the function is safe to cache. When both are true, the return value is written to disk under .rote/. On the next run, the cached values come straight from disk; the only thing that actually executes is the function you edited (and anything downstream of it).

There are no decorators to add. rote run analyze.py rewrites the AST of your script in memory so each top-level function gets wrapped automatically, then runs the rewritten code. The file on disk doesn't change.

02 — How it works

What rote does on every run

Each of the widgets further down the page corresponds to one of these four steps.

01 Trace

While your script runs, rote uses sys.monitoring (PEP 669) to see every function call. It also uses audit hooks (PEP 578) to log file opens, network access, subprocess starts, and any use of exec or eval. Those events feed the next step.
02 Judge

rote memoizes a call only if it passes four checks. The call has to have run for at least a second; no impure I/O can have happened during it; the arguments must hash to the same value at exit as at entry; and the source code (along with everything the function transitively depends on) has to match what was cached. Any single failing check skips the cache write.
03 Store

The return value is serialized using whichever library makes sense for its type: PyArrow IPC for DataFrames, numpy.save for ndarrays, safetensors for tensors, msgpack for primitives, cloudpickle for anything that doesn’t fit those buckets. The bytes go into a content-addressed SQLite cache. Files that the function read are content-hashed too, so a backdated touch -r can’t trick the cache into reusing a stale value.
04 Replay

On the next run, rote checks the function’s identity along with its arguments and any tracked file inputs. If everything matches, the cached value comes back in microseconds. If anything has changed, the entry is invalidated and the function runs normally. The first run pays the write cost; every run after that reuses the result.

Functions that do impure work (network calls, anything that reads time.time(), and so on) are never cached. rote logs why each call was skipped in rote.stats()["invalidation_reasons"], so you can read entries like calls impure stdlib: time.time and tell whether the cache misses you're seeing match what you'd expect.

03 — Edit-rerun loop

Re-running a pipeline after one edit

A small pipeline with four stages: parse, aggregate, train, plot. Pick the stage you just edited. Plain Python re-runs all four. rote re-runs only the stages whose source or inputs changed. The numbers come from running each variant in a fresh Python process (the same shape as saving the script and re-typing the python command), where the plain rerun takes 1.83 s and the warm rote rerun takes 0.38 s.

Edited stage

Timings

1.75 s total — every stage re-runs

parserecompute

aggregaterecompute

trainrecompute

plotedited

353.4 ms total · 4.9× over plain

parsecached

aggregatecached

traincached

plotedited

You edited plot. Stages parse, aggregate, train are served from cache. Stage plot recomputes; downstream stages re-run because their inputs changed.

04 — Differences from the paper

Every place rote behaves differently, with the reason

The paper claim, rote's current measurement or behaviour, and one sentence on what made the change possible. Most rows credit a library or a PEP that shipped after 2011; that's the bulk of the gap. The full log lives in site/DISCREPANCIES.md.

Topic	Paper (2011)	rote (2026)	Why it differs
Edit-rerun speedup paper §4.2 · bench/results/cross_process_pipeline.json	~10× on real workflows (fresh interpreter each run).	4.8× cross-process on a paper-shaped pipeline (1.83 s → 0.38 s; joblib 0.19 s).	Roughly half the paper’s factor. Hardware has moved, and rote content-hashes file dependencies on every hit where the paper trusted (size, mtime). The validation costs cycles but closes a stale-result hole.
File-dependency identity paper §3.5 · tests/unit/test_file_hash_cache.py::test_size_preserving_mtime_backdated_edit_still_invalidates	Keyed on (size, mtime).	Indexed on (dev, ino); validated against (size, mtime_ns, ctime_ns) in a persistent SQLite table.	A `touch -r` rewinding mtime after a same-size overwrite would fool the paper’s scheme; ctime_ns moves anyway because the kernel updates it on every inode write and userspace can’t backdate it without root.
Source-change detection paper §3.2 · src/rote/identity.py enabled by: libcst (2019)	Coarse source-byte hashing — adding a comment busts the cache.	Canonical-AST hash via libcst (strips comments, docstrings, annotations; De Bruijn-renames bound variables).	libcst didn’t exist when the paper was written. The newer machinery means cosmetic edits no longer invalidate.
Serialization format paper Figure 6 · bench/results/serialize_microbench.json enabled by: PyArrow IPC (2016), safetensors (2023)	Pickle variants dominate the warm path (Figure 6).	Type-dispatched: PyArrow IPC → DataFrame, numpy.save → ndarray, safetensors → tensors, msgpack → primitives, cloudpickle as fallback.	PyArrow IPC, numpy’s zero-copy load path, and safetensors all post-date the paper. For DataFrames and ndarrays — the cases that matter to modern research — they beat pickle. For huge homogeneous Python containers pickle still wins; documented openly.
Interpreter compatibility paper §1 · pyproject.toml enabled by: PEP 578 audit hooks (2018), PEP 669 sys.monitoring (2023)	Required a CPython 2.6.3 patch — a custom interpreter binary that stopped tracking upstream years ago.	Pure-Python library on stock CPython 3.12+.	PEP 578 (audit hooks) and PEP 669 (sys.monitoring) let user code observe events the 2011 prototype needed an interpreter fork to see.
Concurrency tests/correctness/test_concurrency.py	Single-process — no shared-cache IPC story.	Multi-process safe via SQLite WAL + atomic blob rename. 16-process hammer test in tests/correctness/.	A modern research workflow runs notebooks and CLI jobs against the same cache. SQLite WAL didn’t see widespread adoption until ~2010 and the audit-hook scaffolding to keep file dependencies honest under concurrency post-dates the paper too.
Argument-mutation detection src/rote/purity.py · tests/unit/test_purity.py	Not modelled — static analysis assumes pure-looking functions are pure.	Copy-on-call fingerprinting: hash arguments at entry, re-hash at exit; any drift disqualifies the call.	A pure-by-inspection function can still mutate a list argument in place. Modern researchers passing DataFrames around hit this constantly.
Coverage of pure long-running calls paper §4.3 · tests/integration/test_realistic_coverage.py	Reported high coverage on the original five-script corpus (fraction of pure calls memoized).	100% of cold compute eliminated on the warm re-run across corpus/realistic/ (five multi-second scripts, ~26 s → 0 s).	Different denominator — work eliminated vs. pure-call fraction. Flagging the mismatch rather than asserting parity.

05 — Speedups

Two ways to measure how fast it is

The first reference point is the speedup the original paper reported in 2011. The second is joblib, which is the most common memoization library for Python research scripts today. Both sets of numbers come from bench/results/*.json. The toggle below picks which one to look at first.

Comparison	Paper (2011)	rote (2026)	Source
Edit-rerun on a multi-stage script, fresh interpreter each run	~10×	4.9×(1.75 s → 353.4 ms)	cross_process_pipeline.json · paper §4.2
Same pipeline, one interpreter, LRU pre-warmed	not measured separately	~48×	paper_pipeline.json

The cross-process row is the one that lines up with the paper's measurement. Roughly half the paper's reported speedup is the order of magnitude we'd expect after fifteen years of hardware progress, plus the cost of rote content-hashing every file dependency on every hit. The in-process number is the upper bound once interpreter startup is amortised; it's listed here as a second data point, not the headline.

06 — Serializers (paper Figure 6, updated)

Picking a serializer by what the function returns

The paper compared three pickle variants. rote uses different serializers depending on the return type. PyArrow IPC handles DataFrames, numpy.save handles ndarrays, safetensors handles tensors, msgpack handles primitives, and cloudpickle is the fallback for anything that doesn't fit those buckets. The chart below also shows the workloads where pickle still wins (large homogeneous Python containers), since those are the cases where the dispatch decision matters most.

Payload	rote serializer	rote	pickle (HIGHEST)	ratio
numpy · 1 M float64	numpy	0.44 ms	0.35 ms	1.26× slower
numpy · 3 M float32	numpy	0.66 ms	1.1 ms	1.71× faster
arrow · 1 M-row table	arrow	2.8 ms	3.6 ms	1.31× faster
dict · 100K items	msgpack	47 ms	11 ms	4.27× slower
list · 1 M ints	msgpack	362 ms	11 ms	32.52× slower

Source: bench/results/serialize_microbench.json. Min of 5 trials per cell.

07 — Call graph

What happens when you edit one node

Click any node to edit it. rote rehashes the function's canonical AST, sees that the new value doesn't match what was cached, and marks the node as missed. Anything further down the pipeline that depended on its output is now stale too. Anything earlier in the pipeline is unaffected. This is the propagation rule from §3.4 of the paper, drawn live so you can watch it.

Click a stage to edit it.

08 — Purity model

The four checks rote runs before caching anything

The paper has a purity model in §3.3 onward; this is a refresher on it. Each card shows what the paper required and which specific signal rote actually checks. If any single signal fails, the cache write is skipped and the reason gets logged in rote.stats()["invalidation_reasons"].

01 §3.3.2 perf guard

Long enough for the cache to be worth it

duration_ns ≥ Config.min_duration_s (default 1 s)

Below the threshold, the cache write costs more than the recomputation would have. This is the perf guard from the paper. rote also tracks the per-call encode time in src/rote/purity.py, so a 1 GB return for a trivial call gets blacklisted even if the body itself ran for long enough.
02 §3.3 + §3.3.1

No impure I/O during the call

no audit-hook event for network / exec / subprocess; any file opened in "w" mode is closed before the call returns

PEP 578 audit hooks (2018) classify network access, subprocesses, exec, file appends, and writes that are still open when the call returns. A with open(...) write that closes inside the call is what paper §3.3.1 calls a self-contained write: still pure, and tracked as a write-dependency.
03 beyond the paper

Arguments weren’t mutated in place

arg fingerprints at entry == arg fingerprints at exit

rote fingerprints mutable arguments when the call starts and again when it returns. If anything moved, the cache write is skipped. The paper assumed pure-looking functions actually were pure; this catches in-place mutation of lists, dicts, and DataFrames that static analysis would miss.
04 §3.4 + §3.5

Source and dependencies still match

blake3(canonical_AST(func) ⊕ transitive_callee_ids ⊕ file_dep_hashes ⊕ global_dep_fingerprints) matches the cached key

The libcst canonical AST means cosmetic edits (a comment, a rename, a formatting change) don’t change the hash; the paper’s source-byte hash would have invalidated all of those. Transitive callee ids cover the §3.4 case. The file_dep_hashes column extends the paper’s §3.5 (size, mtime_ns) with ctime_ns and a stream-hashed content digest, which is how rote catches backdated edits.

09 — Live editor

The hash, live as you type

Edit the function below. Cosmetic edits like adding a comment or renaming a local variable don't change the hash, because the canonicalisation strips them out before hashing. A semantic edit (a literal value, an operator) does change it. The paper hashed raw source bytes (§3.2), which would have invalidated any edit at all; the canonical AST form is what draws the distinction, and that's what libcst gives us.

A short JavaScript canonicalisation runs as soon as the page loads, so the editor responds immediately. Pyodide loads in the background, and once it's ready the same source goes through the real rote.identity.canonical_source function (libcst plus hashlib). Both hashes are displayed; if they disagree, it's the JS approximation that's wrong.

source

08b · the file-dependency adversarial edit

Paper §3.5 keyed file deps on (size, mtime). rote also tracks ctime_ns and a content hash. Toggle the scenarios to see which signal catches each edit.

test_file_hash_cache.py

signal	value	vs baseline	used by
size	4096 B	unchanged	paper + rote
mtime_ns	1716080400000000000	unchanged	paper + rote
ctime_ns	1716080400000000000	unchanged	rote only
content_hash	7f3a91bd4e2c8f4a	unchanged	rote only

cache hitsNothing has changed yet.

Long enough for the cache to be worth it

No impure I/O during the call

Arguments weren’t mutated in place

Source and dependencies still match