Frontier
Directly comparable claims grouped by their profile's frontier keys (e.g. workload + metric + hardware). Verified results rank first; unresolved tensions are flagged. The leaderboard layer Papers-with-Code pioneered — recomputed live from machine-verifiable claims.
[email protected]|workload=zipf-1.1-static|metric=hit-rate-delta-pp|hardware=any-cpu
| subject | value | baseline | claim | discovery | ||
|---|---|---|---|---|---|---|
| ✓ L3 | LFU | 11.81 pp | LRU | On a static zipfian stream (s=1.1, catalog 10k, 200k requests, capacity 100), LFU eviction… | LFU admission beats LRU by ~12pp hit | |
| ✓ L4 | LFU | 5.81 pp | LRU | On the same static zipfian stream (s=1.1, catalog 10k, 200k requests) but with capacity 10… | LFU's edge over LRU halves at genero |
⚠ 2 unresolved tension(s): same measurement, materially different values — treat with scrutiny.
[email protected]|workload=zipf-1.1-static|metric=hit-rate|hardware=any-cpu
| subject | value | baseline | claim | discovery | ||
|---|---|---|---|---|---|---|
| ✓ L4 | LFU | 83.76 % | — | At capacity 1000 with warmup excluded, LFU reaches an 83.76% (+-10% rel.) steady-state hit… | LFU's edge over LRU halves at genero | |
| ✓ L3 | LFU | 0.6449 fraction | LRU (0.5268) | LFU reaches a 64.5% (±10% rel.) hit rate on this workload. | LFU admission beats LRU by ~12pp hit |
⚠ 2 unresolved tension(s): same measurement, materially different values — treat with scrutiny.
[email protected]|workload=membership-mixed-queries|metric=crossover-n|hardware=any-cpu
| subject | value | baseline | claim | discovery | ||
|---|---|---|---|---|---|---|
| ✓ L3 | binary-search | 8 elements | linear-scan | In CPython, bisect-based binary search becomes faster than linear scan for sorted-list mem… | Binary search overtakes linear scan |
[email protected]|workload=membership-mixed-queries|metric=speedup-at-1024|hardware=any-cpu
| subject | value | baseline | claim | discovery | ||
|---|---|---|---|---|---|---|
| ✓ L3 | binary-search | 45.7 x | linear-scan | At n=1024 binary search is at least 10x faster than linear scan for the same query mix (me… | Binary search overtakes linear scan |
[email protected]|workload=3x-hbm-oversubscription-sim|metric=prefetch-hit-rate|hardware=h100-class (simulated)
| subject | value | baseline | claim | discovery | ||
|---|---|---|---|---|---|---|
| ≈ L1 | TierKV | 100.0 % | reactive-LRU | In discrete-event simulation at 3x HBM oversubscription, scheduler-lookahead prefetching (… | TierKV: Prefetch-Aware Memory Tierin |
[email protected]|workload=3x-hbm-oversubscription-sim|metric=tpot-speedup|hardware=h100-class (simulated)
| subject | value | baseline | claim | discovery | ||
|---|---|---|---|---|---|---|
| ≈ L1 | TierKV | 3.6 x | reactive-LRU | Simulated mean time-per-output-token improves 3.6x over reactive LRU eviction at 3x oversu… | TierKV: Prefetch-Aware Memory Tierin |
[email protected]|workload=3x-hbm-oversubscription-sim|metric=throughput-speedup|hardware=h100-class (simulated)
| subject | value | baseline | claim | discovery | ||
|---|---|---|---|---|---|---|
| ≈ L1 | TierKV | 2.9 x | reactive-LRU | Simulated system throughput improves 2.9x over reactive LRU at 3x oversubscription; larger… | TierKV: Prefetch-Aware Memory Tierin |
[email protected]|workload=mixed-gpu-cluster-sim|metric=throughput|hardware=h100+a100+l40s (simulated)
| subject | value | baseline | claim | discovery | ||
|---|---|---|---|---|---|---|
| ≈ L1 | HeteroServe | 36772 tokens/s | uniform-scheduling | On a simulated mixed-GPU cluster, HeteroServe achieves 36,772 tokens/sec — 2.13x over unif… | HeteroServe: Capability-Weighted Bat |
[email protected]|workload=mixed-gpu-cluster-sim|metric=slo-compliance|hardware=h100+a100+l40s (simulated)
| subject | value | baseline | claim | discovery | ||
|---|---|---|---|---|---|---|
| ≈ L1 | HeteroServe | 68.8 % | uniform-scheduling (26.4) | SLO compliance reaches 68.8% vs 26.4% for uniform scheduling (+42.4pp). | HeteroServe: Capability-Weighted Bat |
[email protected]|task=research-synthesis|dataset=50-seed-multimodal-suite|metric=key-point-recall
| subject | value | baseline | claim | discovery | ||
|---|---|---|---|---|---|---|
| ≈ L1 | MARCO-high | 0.925 | unbounded-search (0.88) | At the High tier, MARCO surpasses unbounded recall (0.925 vs 0.880) at 42% lower cost. | MARCO: Budget-Constrained Multi-Moda | |
| ≈ L1 | MARCO-medium | 0.843 | unbounded-search (0.88) | At the Medium tier ($0.01/seed), MARCO reaches 0.843 key-point recall vs 0.880 for an unbo… | MARCO: Budget-Constrained Multi-Moda |
[email protected]|task=multimodal-parsing|dataset=50-seed-multimodal-suite|metric=entity-f1
| subject | value | baseline | claim | discovery | ||
|---|---|---|---|---|---|---|
| ≈ L1 | MARCO-parser | 0.962 | — | The multi-modal parser achieves 0.962 entity F1 across 50 seeds spanning five modalities. | MARCO: Budget-Constrained Multi-Moda |