Worksheet: LLM Hallucination Detection

Identifying when AI makes things up

Course: Natural Language Annotation for Machine Learning Task Type: Factuality verification / Span labeling

Author: Jin Zhao

Background

Large Language Models sometimes generate content that sounds plausible but is factually incorrect. This is called hallucination.

Hallucination: AI-generated content that is nonsensical, unfaithful to the source, or factually incorrect, presented with apparent confidence.

Types of Hallucinations

Intrinsic: Output contradicts the source material

Extrinsic: Output cannot be verified from the source (may or may not be true)

Fabrication: Completely made-up facts (fake citations, non-existent events)

Why This Matters

Detecting hallucinations is critical for building trustworthy AI systems, especially in high-stakes domains like healthcare, law, and education.

Part 2: Grounded vs. Ungrounded Claims

Source Document:

"The 2024 Paris Olympics featured 32 sports. The opening ceremony took place along the Seine River on July 26, 2024. Over 10,000 athletes participated from 206 countries."

LLM Summary:

The 2024 Paris Olympics included 32 sports with over 10,000 athletes from 206 nations. The spectacular opening ceremony was held on July 26, 2024, along the Seine River, featuring stunning light shows and performances by famous French artists. It was considered one of the most innovative opening ceremonies in Olympic history.

Question 2

Which parts of the summary are grounded in the source vs. hallucinated?

Grounded (in source):

32 sports 10,000+ athletes 206 nations July 26, 2024 Seine River

Hallucinated (not in source):

"spectacular" Light shows French artists "most innovative"

Are the hallucinated parts necessarily false?

Part 4: Hallucination Severity

LLM Response (Medical Context):

"Aspirin is generally safe for most adults. The typical dose for pain relief is 325-650mg every 4-6 hours, not exceeding 4000mg per day. However, aspirin should be avoided by children under 16 due to the risk of Reye's syndrome, and by people with bleeding disorders. It can also interact with blood thinners like warfarin. Always consult a doctor before starting any medication regimen."

Question 4

Even if most information is accurate, small errors in medical contexts can be dangerous. Assess the severity of potential errors:

Low severity: Minor details might be off but core advice is sound Medium severity: Some claims need verification High severity: Medical advice should not come from unverified AI Critical: Even small errors could cause harm

How should hallucination severity affect annotation guidelines?

Part 5: Self-Contradictory Hallucinations

LLM Response:

"The Eiffel Tower was completed in 1889 for the World's Fair. It took two years and two months to build, with construction starting in 1887. The tower stands at 324 meters (1,063 feet) tall, making it the tallest structure in Paris. When it was first built, it was 300 meters tall, but the addition of broadcasting antennas increased its height. The tower was originally designed as a temporary structure for the 1889 World's Fair, but it was so popular that it was kept permanently."

Question 5

This response may contain internal contradictions. Can you identify any?

How should self-contradictory responses be labeled?

Mark both contradicting claims as hallucinations Verify which is correct, mark only the false one Flag as "self-contradictory" without fact-checking Reject the entire response

Part 6: Group Discussion

Question 6

Compare your annotations with your group. Where did you disagree?

Claim boundaries What counts as "grounded" Hallucination severity How to verify claims Extrinsic hallucination handling

Describe one specific disagreement:

Part 7: Reflection

Question 7

Why is hallucination detection difficult?

Requires extensive world knowledge Hallucinations sound plausible Sources may also be wrong Claims can be partially true Granularity of claims varies All of the above

Key Takeaway

Hallucination detection requires both knowledge and judgment about what level of error matters.

Not all unverifiable claims are false; not all false claims are equally harmful
Context determines severity (medical vs. creative writing)
The line between "embellishment" and "fabrication" is fuzzy
Annotation guidelines must balance thoroughness with practicality