References — every cited work, linked to the real paper. ← back to the result audit
| 1 | Azaria, A. & Mitchell, T. The Internal State of an LLM Knows When It's Lying. Findings of EMNLP, 2023. arXiv:2304.13734. arXiv title: The Internal State of an LLM Knows When It's Lying | arXiv 2304.13734 ↗ link verified |
| 2 | Orgad, H. et al. LLMs Know More Than They Show. ICLR, 2025. arXiv:2410.02707. arXiv title: LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations | arXiv 2410.02707 ↗ link verified |
| 3 | Chen, C. et al. INSIDE: LLMs' Internal States Retain the Power of Hallucination Detection. ICLR, 2024. arXiv:2402.03744. arXiv title: INSIDE: LLMs' Internal States Retain the Power of Hallucination Detection | arXiv 2402.03744 ↗ link verified |
| 4 | Sriramanan, G. et al. LLM-Check. NeurIPS, 2024. | Google Scholar ↗ Google Scholar search |
| 5 | Farquhar, S. et al. Detecting hallucinations in large language models using semantic entropy. Nature 630, 2024. | Google Scholar ↗ Google Scholar search |
| 6 | Kuhn, L. et al. Semantic Uncertainty. ICLR, 2023. arXiv:2302.09664. arXiv title: Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation | arXiv 2302.09664 ↗ link verified |
| 7 | Kossen, J. et al. Semantic Entropy Probes. arXiv:2406.15927, 2024. arXiv title: Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs | arXiv 2406.15927 ↗ link verified |
| 8 | Ma, H. et al. Semantic Energy: Detecting LLM Hallucination Beyond Entropy. arXiv:2508.14496, 2025. arXiv title: Semantic Energy: Detecting LLM Hallucination Beyond Entropy | arXiv 2508.14496 ↗ link verified |
| 9 | Karpowicz, M.P. On the Fundamental Impossibility of Hallucination Control in Large Language Models. arXiv:2506.06382, 2025. arXiv title: On the Fundamental Impossibility of Hallucination Control in Large Language Models | arXiv 2506.06382 ↗ link verified |
| 10 | Simhi, A. et al. Distinguishing Ignorance from Error in LLM Hallucinations. arXiv:2410.22071, 2024. arXiv title: Distinguishing Ignorance from Error in LLM Hallucinations | arXiv 2410.22071 ↗ link verified |
| 11 | Simhi, A. et al. HACK: Hallucinations Along Certainty and Knowledge Axes. arXiv:2510.24222, 2025. arXiv title: HACK: Hallucinations Along Certainty and Knowledge Axes | arXiv 2510.24222 ↗ link verified |
| 12 | Marin, J. A Geometric Taxonomy of Hallucinations in LLMs. arXiv:2602.13224, 2026. arXiv title: A Geometric Taxonomy of Hallucinations in LLMs | arXiv 2602.13224 ↗ link verified |
| 13 | Cherukuri, K. & Varshney, L.R. Hallucination Basins: A Dynamic Framework for Understanding and Controlling LLM Hallucinations. arXiv:2604.04743, 2026. arXiv title: Hallucination Basins: A Dynamic Framework for Understanding and Controlling LLM Hallucinations | arXiv 2604.04743 ↗ link verified |
| 14 | Akarlar, G.\,A. Hallucination as Trajectory Commitment: Causal Evidence for Asymmetric Attractor Dynamics in Transformer Generation. arXiv:2604.15400, 2026. arXiv title: Hallucination as Trajectory Commitment: Causal Evidence for Asymmetric Attractor Dynamics in Transformer Generation | arXiv 2604.15400 ↗ link verified |
| 15 | Kalai, A.T. et al. Why Language Models Hallucinate. arXiv:2509.04664, 2025. arXiv title: Why Language Models Hallucinate | arXiv 2509.04664 ↗ link verified |
| 16 | Fernando, T. & Guitchounts, G. Dynamics of the Transformer Residual Stream. arXiv:2605.14258, 2026. arXiv title: Dynamics of the Transformer Residual Stream: Coupling Spectral Geometry to Network Topology | arXiv 2605.14258 ↗ link verified |
| 17 | Lieberum, T. et al. Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2. arXiv:2408.05147, 2024. arXiv title: Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2 | arXiv 2408.05147 ↗ link verified |
| 18 | Lu, Y. et al. Beyond Finite Layer Neural Networks. ICML, 2018. | Google Scholar ↗ Google Scholar search |
| 19 | Li, K. et al. Inference-Time Intervention. NeurIPS, 2023. arXiv:2306.03341. arXiv title: Inference-Time Intervention: Eliciting Truthful Answers from a Language Model | arXiv 2306.03341 ↗ link verified |
| 20 | Zou, A. et al. Representation Engineering. arXiv:2310.01405, 2023. arXiv title: Representation Engineering: A Top-Down Approach to AI Transparency | arXiv 2310.01405 ↗ link verified |
| 21 | Rimsky, N. et al. Steering Llama 2 via Contrastive Activation Addition. ACL, 2024. arXiv:2312.06681. arXiv title: Steering Llama 2 via Contrastive Activation Addition | arXiv 2312.06681 ↗ link verified |
| 22 | Obeso, O., Arditi, A., Ferrando, J., Freeman, J., Holmes, C. & Nanda, N. Real-Time Detection of Hallucinated Entities in Long-Form Generation. arXiv:2509.03531, 2025. arXiv title: Real-Time Detection of Hallucinated Entities in Long-Form Generation | arXiv 2509.03531 ↗ link verified |
| 23 | Yeom, J., Sok, J., Kim, H., Park, S., Park, J. & Kim, T. Hallucination as Commitment Failure: Larger LLMs Misfire Despite Knowing the Answer. arXiv:2605.22007, 2026. arXiv title: Hallucination as Commitment Failure: Larger LLMs Misfire Despite Knowing the Answer | arXiv 2605.22007 ↗ link verified |
| 24 | Hendrycks, D. et al. Measuring Massive Multitask Language Understanding (MMLU). ICLR, 2021. arXiv:2009.03300. arXiv title: Measuring Massive Multitask Language Understanding | arXiv 2009.03300 ↗ link verified |
| 25 | Clark, P. et al. Think you have Solved Question Answering? Try ARC. arXiv:1803.05457, 2018. arXiv title: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge | arXiv 1803.05457 ↗ link verified |
| 26 | Zellers, R. et al. HellaSwag: Can a Machine Really Finish Your Sentence? ACL, 2019. arXiv:1905.07830. arXiv title: HellaSwag: Can a Machine Really Finish Your Sentence? | arXiv 1905.07830 ↗ link verified |
| 27 | Lin, S. et al. TruthfulQA: Measuring How Models Mimic Human Falsehoods. ACL, 2022. arXiv:2109.07958. arXiv title: TruthfulQA: Measuring How Models Mimic Human Falsehoods | arXiv 2109.07958 ↗ link verified |
| 28 | Mihaylov, T. et al. Can a Suit of Armor Conduct Electricity? (OpenBookQA). EMNLP, 2018. arXiv:1809.02789. arXiv title: Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering | arXiv 1809.02789 ↗ link verified |
| 29 | Talmor, A. et al. CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge. NAACL, 2019. arXiv:1811.00937. arXiv title: CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge | arXiv 1811.00937 ↗ link verified |
| 30 | Joshi, M. et al. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension. ACL, 2017. arXiv:1705.03551. arXiv title: TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension | arXiv 1705.03551 ↗ link verified |
| 31 | Marcenko, V.A. & Pastur, L.A. Distribution of eigenvalues for some sets of random matrices. Math. USSR-Sbornik, 1967. | Google Scholar ↗ Google Scholar search |
| 32 | Bai, Z. & Silverstein, J.W. Spectral Analysis of Large Dimensional Random Matrices. Springer, 2010. | Google Scholar ↗ Google Scholar search |
| 33 | Guo, C. et al. On Calibration of Modern Neural Networks. ICML, 2017. arXiv:1706.04599. arXiv title: On Calibration of Modern Neural Networks | arXiv 1706.04599 ↗ link verified |
| 34 | Geifman, Y. & El-Yaniv, R. Selective Classification for Deep Neural Networks. NeurIPS, 2017. arXiv:1705.08500. arXiv title: Selective Classification for Deep Neural Networks | arXiv 1705.08500 ↗ link verified |
| 35 | Hu, E.J. et al. LoRA: Low-Rank Adaptation of Large Language Models. ICLR, 2022. arXiv:2106.09685. arXiv title: LoRA: Low-Rank Adaptation of Large Language Models | arXiv 2106.09685 ↗ link verified |
| 36 | Bricken, T. et al. Towards Monosemanticity: Decomposing Language Models With Dictionary Learning. Transformer Circuits Thread, 2023. | Google Scholar ↗ Google Scholar search |
| 37 | Templeton, A. et al. Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet. Transformer Circuits Thread, 2024. | Google Scholar ↗ Google Scholar search |
| 38 | Lindsey, J. et al. On the Biology of a Large Language Model. Transformer Circuits Thread, 2025. | Google Scholar ↗ Google Scholar search |