Frontier

Directly comparable claims grouped by their profile's frontier keys (e.g. workload + metric + hardware). Verified results rank first; unresolved tensions are flagged. The leaderboard layer Papers-with-Code pioneered — recomputed live from machine-verifiable claims.

[email protected]|workload=zipf-1.1-static|metric=hit-rate-delta-pp|hardware=any-cpu
subjectvaluebaselineclaimdiscovery
L3 LFU 11.81 pp
LRU On a static zipfian stream (s=1.1, catalog 10k, 200k requests, capacity 100), LFU eviction… LFU admission beats LRU by ~12pp hit
L4 LFU 5.81 pp
LRU On the same static zipfian stream (s=1.1, catalog 10k, 200k requests) but with capacity 10… LFU's edge over LRU halves at genero
⚠ 2 unresolved tension(s): same measurement, materially different values — treat with scrutiny.
[email protected]|workload=zipf-1.1-static|metric=hit-rate|hardware=any-cpu
subjectvaluebaselineclaimdiscovery
L4 LFU 83.76 %
At capacity 1000 with warmup excluded, LFU reaches an 83.76% (+-10% rel.) steady-state hit… LFU's edge over LRU halves at genero
L3 LFU 0.6449 fraction
LRU (0.5268) LFU reaches a 64.5% (±10% rel.) hit rate on this workload. LFU admission beats LRU by ~12pp hit
⚠ 2 unresolved tension(s): same measurement, materially different values — treat with scrutiny.
[email protected]|workload=membership-mixed-queries|metric=crossover-n|hardware=any-cpu
subjectvaluebaselineclaimdiscovery
L3 binary-search 8 elements
linear-scan In CPython, bisect-based binary search becomes faster than linear scan for sorted-list mem… Binary search overtakes linear scan
[email protected]|workload=membership-mixed-queries|metric=speedup-at-1024|hardware=any-cpu
subjectvaluebaselineclaimdiscovery
L3 binary-search 45.7 x
linear-scan At n=1024 binary search is at least 10x faster than linear scan for the same query mix (me… Binary search overtakes linear scan
[email protected]|workload=3x-hbm-oversubscription-sim|metric=prefetch-hit-rate|hardware=h100-class (simulated)
subjectvaluebaselineclaimdiscovery
L1 TierKV 100.0 %
reactive-LRU In discrete-event simulation at 3x HBM oversubscription, scheduler-lookahead prefetching (… TierKV: Prefetch-Aware Memory Tierin
[email protected]|workload=3x-hbm-oversubscription-sim|metric=tpot-speedup|hardware=h100-class (simulated)
subjectvaluebaselineclaimdiscovery
L1 TierKV 3.6 x
reactive-LRU Simulated mean time-per-output-token improves 3.6x over reactive LRU eviction at 3x oversu… TierKV: Prefetch-Aware Memory Tierin
[email protected]|workload=3x-hbm-oversubscription-sim|metric=throughput-speedup|hardware=h100-class (simulated)
subjectvaluebaselineclaimdiscovery
L1 TierKV 2.9 x
reactive-LRU Simulated system throughput improves 2.9x over reactive LRU at 3x oversubscription; larger… TierKV: Prefetch-Aware Memory Tierin
[email protected]|workload=mixed-gpu-cluster-sim|metric=throughput|hardware=h100+a100+l40s (simulated)
subjectvaluebaselineclaimdiscovery
L1 HeteroServe 36772 tokens/s
uniform-scheduling On a simulated mixed-GPU cluster, HeteroServe achieves 36,772 tokens/sec — 2.13x over unif… HeteroServe: Capability-Weighted Bat
[email protected]|workload=mixed-gpu-cluster-sim|metric=slo-compliance|hardware=h100+a100+l40s (simulated)
subjectvaluebaselineclaimdiscovery
L1 HeteroServe 68.8 %
uniform-scheduling (26.4) SLO compliance reaches 68.8% vs 26.4% for uniform scheduling (+42.4pp). HeteroServe: Capability-Weighted Bat
[email protected]|task=research-synthesis|dataset=50-seed-multimodal-suite|metric=key-point-recall
subjectvaluebaselineclaimdiscovery
L1 MARCO-high 0.925
unbounded-search (0.88) At the High tier, MARCO surpasses unbounded recall (0.925 vs 0.880) at 42% lower cost. MARCO: Budget-Constrained Multi-Moda
L1 MARCO-medium 0.843
unbounded-search (0.88) At the Medium tier ($0.01/seed), MARCO reaches 0.843 key-point recall vs 0.880 for an unbo… MARCO: Budget-Constrained Multi-Moda
[email protected]|task=multimodal-parsing|dataset=50-seed-multimodal-suite|metric=entity-f1
subjectvaluebaselineclaimdiscovery
L1 MARCO-parser 0.962
The multi-modal parser achieves 0.962 entity F1 across 50 seeds spanning five modalities. MARCO: Budget-Constrained Multi-Moda