⚛ AttentionHub
A live study · running on this hub

We re-ran the code behind 83 papers.

Not "does the PDF look right" — we built each repo, ran the experiment, and checked the headline number against what the paper claimed, with a tolerance fixed in advance. Here's what came out.

83 attempted · updating as the batch runs
34%
reproduced their headline result as shipped
28 of 83 papers, within a pre-registered tolerance
46%
even built & ran at all
the rest failed to build or crashed on launch
74%
matched, given the code produced a number
so even when it runs, the number often disagrees

Every outcome, honestly

One bar, 83 papers. Failures aren't noise — they're the finding.

28
10
18
12
15
28 reproduced as shipped 10 ran, but the number didn't match 18 built, but the experiment crashed 12 environment wouldn't build 15 exceeded the time budget

Every paper, one tile

83 papers, 83 tiles — colored by what happened. Hover for the paper; the hub owns the assertion, so authors can't self-pass.

✓ reproduced ≠ ran, off / no number ✗ wouldn't build / crashed / timed out
Full table claimed vs got, per paper
paperoutcomeclaimedgoterr
catboost-default-vs-tuned REPRODUCED 0.27 0.2749 1.815%
graph-surprise-svd-ml100k-rmse REPRODUCED 0.934 0.9364 0.257%
hdbscan-blobs-ari REPRODUCED 0.9 0.9692 7.689%
hmmlearn-gaussian-hmm REPRODUCED 0.0 0.0074 0.74%
imbalanced-learn-smote-f1 REPRODUCED 0.0 0.1724 17.24%
lightgbm-speedup-claim REPRODUCED 20.0 11.17 44.15%
ml-copod-breastw-auc REPRODUCED 0.9936 0.9944 0.081%
ml-copod-cardio-auc REPRODUCED 0.8974 0.9219 2.73%
ml-denmune-aggregation-ari REPRODUCED 0.99 0.9927 0.273%
ml-gmm-iris-ari REPRODUCED 0.9 0.9039 0.433%
ml-isoforest-digits-auc REPRODUCED 0.95 0.9865 3.842%
ml-lof-synthetic-auc REPRODUCED 0.99 0.999 0.909%
ml-pyod-knn-synthetic-auc REPRODUCED 1.0 1.0 0.0%
ml-spectral-moons-ari REPRODUCED 1.0 1.0 0.0%
ml-umap-digits-trustworthiness REPRODUCED 0.97 0.9889 1.948%
nlp-crfsuite-conll2002-f1 REPRODUCED 0.77 0.7965 3.442%
nlp-langid-identification-acc REPRODUCED 0.94 1.0 6.383%
nlp-nltk-naivebayes-movie-acc REPRODUCED 0.8 0.81 1.25%
nlp-rankbm25-retrieval-mrr REPRODUCED 1.0 1.0 0.0%
prophet-cv-mape REPRODUCED 0.1 0.0743 25.7%
river-phishing-acc REPRODUCED 0.8879 0.8928 0.552%
sb3-ppo-cartpole REPRODUCED 500.0 500.0 0.0%
sentence-transformers-sts-spearman REPRODUCED 0.85 0.8203 3.494%
sklearn-20newsgroups-tfidf REPRODUCED 0.88 0.882 0.227%
sklearn-digits-svm-acc REPRODUCED 0.97 0.9689 0.113%
ts-mabwiser-sim REPRODUCED 0.9 0.986 9.556%
ts-pymc-coinflip REPRODUCED 0.6667 0.6654 0.195%
ts-qlearning-taxi REPRODUCED 7.9 7.9 0.0%
gensim-word2vec-analogy DIVERGED 0.6 0.2883 51.95%
ml-denmune-jain-ari DIVERGED 1.0 0.2355 76.45%
ml-finch-mnist10k-nmi DIVERGED 0.8905 0.9755 9.545%
ml-kmeans-digits-nmi DIVERGED 0.74 0.6264 15.351%
nanogpt-shakespeare-gpu DIVERGED 1.47 1.8857 28.279%
node2vec-linkpred-auc DIVERGED 0.97 0.7599 21.66%
ts-statsmodels-sarimax-airline DIVERGED 1022.299 922.205 9.791%
ts-thompson-bernoulli-regret DIVERGED 21.0 10.96 47.81%
tslearn-dtw-knn-ucr DIVERGED 0.9 1.0 11.111%
umap-mnist-runtime DIVERGED 42.0 129.54 208.429%
graph-deepwalk-blogcatalog-microf1 RUN_FAILED 0.4151
graph-ncf-neumf-ml1m-hr10 RUN_FAILED 0.73
ml-devnet-annthyroid-auc RUN_FAILED 0.783
ml-pidforest-mammography-auc RUN_FAILED 0.84
ml-pidforest-satimage2-auc RUN_FAILED 0.982
ml-pidforest-thyroid-auc RUN_FAILED 0.876
ml-quickshiftpp-blobs-ari RUN_FAILED 1.0
ml-suod-cardio-iforest-auc RUN_FAILED 0.9216
nlp-doc2vec-imdb-acc RUN_FAILED 0.87
nlp-fasttext-dbpedia-p1 RUN_FAILED 0.98
nlp-glove-analogy-acc RUN_FAILED 75.0
nlp-nbsvm-imdb-acc RUN_FAILED 91.55
nlp-sif-sts-correlation RUN_FAILED 0.717
nlp-vader-tweets-f1 RUN_FAILED 0.96
ts-minirocket-ucr RUN_FAILED 0.969
ts-pmdarima-wineind RUN_FAILED 2908.093
ts-rocket-ucr RUN_FAILED 0.969
xgboost-higgs-auc RUN_FAILED 0.84
beir-bm25-anserini-ndcg BUILD_FAILED 0.65
graph-edmot-cora-modularity BUILD_FAILED 0.4088
graph-openne-node2vec-wiki-microf1 BUILD_FAILED 0.651
graph-vgae-cora-auc BUILD_FAILED 0.914
huggingface-bert-glue-gpu BUILD_FAILED 0.93
nlp-brightmart-textcnn-acc BUILD_FAILED 0.65
nlp-flair-sentiment-acc BUILD_FAILED 1.0
nlp-textcnn-mindspore-sst2 BUILD_FAILED 0.7971
pomegranate-hmm-speedup BUILD_FAILED 13
pyserini-bm25-beir-ndcg BUILD_FAILED 0.679
ts-deeppilco-cartpole BUILD_FAILED 0.1
vit-pytorch-cifar-gpu BUILD_FAILED 0.88
graph-pygat-cora-acc TIMEOUT 0.84
graph-pygcn-cora-acc TIMEOUT 0.815
nlp-fasttext1607-agnews-acc TIMEOUT 92.5
nlp-han-agnews-acc TIMEOUT 92.7
nlp-scapt-absa-restaurant-acc TIMEOUT 90.0
nlp-simcse-sts-spearman TIMEOUT 76.25
nlp-textcnn-mr-acc TIMEOUT 76.1
nlp-textcnn-sst2-acc TIMEOUT 85.99
ts-darts-airpassengers TIMEOUT 5.11
ts-dlinear-etth1 TIMEOUT 0.375
ts-nbeats-m4 TIMEOUT 13.114
ts-nhits-ettm2 TIMEOUT 0.255
ts-pyro-eightschools TIMEOUT 4.4
ts-sb3-dqn-mountaincar TIMEOUT -100.849
ts-statsforecast-m4 TIMEOUT 0.94

Read this number honestly