Frontier

Directly comparable claims grouped by their profile's frontier keys (e.g. workload + metric + hardware). Verified results rank first; unresolved tensions are flagged. The leaderboard layer Papers-with-Code pioneered — recomputed live from machine-verifiable claims.

[email protected]|workload=zipf-1.1-static|metric=hit-rate-delta-pp|hardware=any-cpu

	subject	value		baseline	claim	discovery
✓ L3	LFU	11.81 pp		LRU	On a static zipfian stream (s=1.1, catalog 10k, 200k requests, capacity 100), LFU eviction…	LFU admission beats LRU by ~12pp hit
✓ L4	LFU	5.81 pp		LRU	On the same static zipfian stream (s=1.1, catalog 10k, 200k requests) but with capacity 10…	LFU's edge over LRU halves at genero

⚠ 2 unresolved tension(s): same measurement, materially different values — treat with scrutiny.

[email protected]|workload=zipf-1.1-static|metric=hit-rate|hardware=any-cpu

	subject	value		baseline	claim	discovery
✓ L4	LFU	83.76 %		—	At capacity 1000 with warmup excluded, LFU reaches an 83.76% (+-10% rel.) steady-state hit…	LFU's edge over LRU halves at genero
✓ L3	LFU	0.6449 fraction		LRU (0.5268)	LFU reaches a 64.5% (±10% rel.) hit rate on this workload.	LFU admission beats LRU by ~12pp hit

⚠ 2 unresolved tension(s): same measurement, materially different values — treat with scrutiny.

[email protected]|workload=membership-mixed-queries|metric=crossover-n|hardware=any-cpu

	subject	value		baseline	claim	discovery
✓ L3	binary-search	8 elements		linear-scan	In CPython, bisect-based binary search becomes faster than linear scan for sorted-list mem…	Binary search overtakes linear scan

[email protected]|workload=membership-mixed-queries|metric=speedup-at-1024|hardware=any-cpu

	subject	value		baseline	claim	discovery
✓ L3	binary-search	45.7 x		linear-scan	At n=1024 binary search is at least 10x faster than linear scan for the same query mix (me…	Binary search overtakes linear scan

[email protected]|workload=3x-hbm-oversubscription-sim|metric=prefetch-hit-rate|hardware=h100-class (simulated)

	subject	value		baseline	claim	discovery
≈ L1	TierKV	100.0 %		reactive-LRU	In discrete-event simulation at 3x HBM oversubscription, scheduler-lookahead prefetching (…	TierKV: Prefetch-Aware Memory Tierin

[email protected]|workload=3x-hbm-oversubscription-sim|metric=tpot-speedup|hardware=h100-class (simulated)

	subject	value		baseline	claim	discovery
≈ L1	TierKV	3.6 x		reactive-LRU	Simulated mean time-per-output-token improves 3.6x over reactive LRU eviction at 3x oversu…	TierKV: Prefetch-Aware Memory Tierin

[email protected]|workload=3x-hbm-oversubscription-sim|metric=throughput-speedup|hardware=h100-class (simulated)

	subject	value		baseline	claim	discovery
≈ L1	TierKV	2.9 x		reactive-LRU	Simulated system throughput improves 2.9x over reactive LRU at 3x oversubscription; larger…	TierKV: Prefetch-Aware Memory Tierin

[email protected]|workload=mixed-gpu-cluster-sim|metric=throughput|hardware=h100+a100+l40s (simulated)

	subject	value		baseline	claim	discovery
≈ L1	HeteroServe	36772 tokens/s		uniform-scheduling	On a simulated mixed-GPU cluster, HeteroServe achieves 36,772 tokens/sec — 2.13x over unif…	HeteroServe: Capability-Weighted Bat

[email protected]|workload=mixed-gpu-cluster-sim|metric=slo-compliance|hardware=h100+a100+l40s (simulated)

	subject	value		baseline	claim	discovery
≈ L1	HeteroServe	68.8 %		uniform-scheduling (26.4)	SLO compliance reaches 68.8% vs 26.4% for uniform scheduling (+42.4pp).	HeteroServe: Capability-Weighted Bat

[email protected]|task=research-synthesis|dataset=50-seed-multimodal-suite|metric=key-point-recall

	subject	value		baseline	claim	discovery
≈ L1	MARCO-high	0.925		unbounded-search (0.88)	At the High tier, MARCO surpasses unbounded recall (0.925 vs 0.880) at 42% lower cost.	MARCO: Budget-Constrained Multi-Moda
≈ L1	MARCO-medium	0.843		unbounded-search (0.88)	At the Medium tier ($0.01/seed), MARCO reaches 0.843 key-point recall vs 0.880 for an unbo…	MARCO: Budget-Constrained Multi-Moda

[email protected]|task=multimodal-parsing|dataset=50-seed-multimodal-suite|metric=entity-f1

	subject	value		baseline	claim	discovery
≈ L1	MARCO-parser	0.962		—	The multi-modal parser achieves 0.962 entity F1 across 50 seeds spanning five modalities.	MARCO: Budget-Constrained Multi-Moda