Next Generation Benchmark for LLM-Driven Fake News Detection
Evaluating the capabilities of Large Language Models in detecting and analyzing misinformation in real-time scenarios
A comprehensive benchmark designed to evaluate LLM performance in fake news detection
Evaluate models on current events and emerging misinformation patterns
Multi-dimensional evaluation across temporal dimensions: before, during, and after event occurrence
Regular benchmark updates to reflect the evolving landscape of misinformation
Testing across world-wide events and emerging misinformation patterns
Current rankings of LLM models on LiveFact benchmark
| Rank | Model | Organization | Overall Score | Before Event (CLS) | Before Event (INF) | During Event (CLS) | During Event (INF) | After Event (CLS) | After Event (INF) |
|---|
How we evaluate LLM performance on fake news detection
We curate a diverse dataset of verified true and false news from multiple sources, covering current global events.
Each news item is tested at three temporal points: before, during, and after the event occurrence to evaluate temporal reasoning.
Each model is tested on identical prompts and evaluated based on accuracy, confidence calibration, and reasoning quality.
The benchmark is updated regularly with new test cases to reflect emerging misinformation trends.
Join our community and contribute to advancing fake news detection research
Have a model you'd like to see on the leaderboard? Submit it for evaluation.
Submit ModelDownload the LiveFact dataset and evaluation scripts for your research on Hugging Face.
Download DatasetJoin the LiveFact maintenance team and participate in the ongoing research and development of this benchmark.
Join the Team