Large Language Models sometimes generate content that sounds plausible but is factually incorrect. This is called hallucination.
Hallucination: AI-generated content that is nonsensical, unfaithful to the source, or factually incorrect, presented with apparent confidence.
Detecting hallucinations is critical for building trustworthy AI systems, especially in high-stakes domains like healthcare, law, and education.
Marie Curie was a Polish-French physicist and chemist who conducted pioneering research on radioactivity. She was born in Warsaw in 1867. She was the first woman to win a Nobel Prize, the first person to win Nobel Prizes in two different sciences (Physics in 1903 and Chemistry in 1911), and the first female professor at the University of Paris. She discovered the elements polonium and radium, and her research contributed to the development of X-ray machines during World War I.
Identify each factual claim in the response and rate its accuracy:
"The 2024 Paris Olympics featured 32 sports. The opening ceremony took place along the Seine River on July 26, 2024. Over 10,000 athletes participated from 206 countries."
The 2024 Paris Olympics included 32 sports with over 10,000 athletes from 206 nations. The spectacular opening ceremony was held on July 26, 2024, along the Seine River, featuring stunning light shows and performances by famous French artists. It was considered one of the most innovative opening ceremonies in Olympic history.
Which parts of the summary are grounded in the source vs. hallucinated?
Grounded (in source):
Hallucinated (not in source):
Einstein famously said, "Imagination is more important than knowledge. Knowledge is limited. Imagination encircles the world." He made this statement in a 1929 interview with George Sylvester Viereck for the Saturday Evening Post, where he discussed his views on creativity and scientific thinking.
This response contains a mix of accurate and potentially hallucinated information. Rate each element:
"Aspirin is generally safe for most adults. The typical dose for pain relief is 325-650mg every 4-6 hours, not exceeding 4000mg per day. However, aspirin should be avoided by children under 16 due to the risk of Reye's syndrome, and by people with bleeding disorders. It can also interact with blood thinners like warfarin. Always consult a doctor before starting any medication regimen."
Even if most information is accurate, small errors in medical contexts can be dangerous. Assess the severity of potential errors:
"The Eiffel Tower was completed in 1889 for the World's Fair. It took two years and two months to build, with construction starting in 1887. The tower stands at 324 meters (1,063 feet) tall, making it the tallest structure in Paris. When it was first built, it was 300 meters tall, but the addition of broadcasting antennas increased its height. The tower was originally designed as a temporary structure for the 1889 World's Fair, but it was so popular that it was kept permanently."
This response may contain internal contradictions. Can you identify any?
How should self-contradictory responses be labeled?
Compare your annotations with your group. Where did you disagree?
Why is hallucination detection difficult?
Hallucination detection requires both knowledge and judgment about what level of error matters.