Worksheet: Question Answering Annotation

Span extraction and answer validation
Course: Natural Language Annotation for Machine Learning Task Type: Extractive QA / Span selection
Author: Jin Zhao

Background

Extractive Question Answering (QA) involves selecting a span of text from a passage that answers a given question. This is the task behind systems like SQuAD.

Extractive QA: Given a passage and a question, identify the minimal span of text that correctly and completely answers the question.

Annotation Challenges

Part 1: Basic Span Selection

Passage 1:

The Eiffel Tower is a wrought-iron lattice tower on the Champ de Mars in Paris, France. It was constructed from 1887 to 1889 as the entrance to the 1889 World's Fair. The tower is 330 meters tall and was the tallest man-made structure in the world for 41 years until the Chrysler Building was completed in 1930.

Q1: When was the Eiffel Tower built?
Q2: How tall is the Eiffel Tower?
Question 1

For Q1, which answer span is most appropriate?

Part 2: Multiple Valid Answers

Passage 2:

Albert Einstein was born in Ulm, Germany on March 14, 1879. The theoretical physicist developed the theory of relativity, one of the two pillars of modern physics. Einstein received the Nobel Prize in Physics in 1921 for his discovery of the law of the photoelectric effect.

Q3: Where was Einstein born?
Question 2

Multiple spans could answer this question. Which are valid? (Check all)

Part 3: Unanswerable Questions

Passage 3:

The Amazon rainforest produces approximately 20% of the world's oxygen. It spans across nine countries in South America, with Brazil containing 60% of the forest. The Amazon River, which flows through the forest, is the largest river by discharge volume of water in the world.

Q4: How many species live in the Amazon rainforest?
Question 3

This question cannot be answered from the passage. How should it be annotated?

Part 4: Inference-Required Questions

Passage 4:

The company reported revenue of $50 million in Q1 and $75 million in Q2. Operating costs remained stable at $30 million per quarter throughout the first half of the year.

Q5: What was the company's profit in Q2?
Question 4

The answer ($45 million) requires arithmetic (75 - 30). Can this be answered?

Part 5: Question Writing

Passage 5:

Marie Curie was awarded two Nobel Prizes: one in Physics (1903, shared with Pierre Curie and Henri Becquerel) and one in Chemistry (1911). She remains the only person to have won Nobel Prizes in two different sciences.

Question 5

Write 3 questions that can be answered from this passage:

Part 6: Group Comparison

Question 6

Compare your span selections with your group. Where did you disagree?

Part 7: Reflection

Question 7

Why is QA annotation challenging?

Key Takeaway

QA annotation requires balancing precision (exact span) with completeness (sufficient answer), while handling the reality that some questions simply cannot be answered from the given text.

  • Span boundaries are conventions, not absolute truths
  • Multiple valid answers should be acknowledged in annotation
  • Unanswerable questions are as important as answerable ones
  • Question quality affects data quality