Extractive Question Answering (QA) involves selecting a span of text from a passage that answers a given question. This is the task behind systems like SQuAD.
Extractive QA: Given a passage and a question, identify the minimal span of text that correctly and completely answers the question.
The Eiffel Tower is a wrought-iron lattice tower on the Champ de Mars in Paris, France. It was constructed from 1887 to 1889 as the entrance to the 1889 World's Fair. The tower is 330 meters tall and was the tallest man-made structure in the world for 41 years until the Chrysler Building was completed in 1930.
For Q1, which answer span is most appropriate?
Albert Einstein was born in Ulm, Germany on March 14, 1879. The theoretical physicist developed the theory of relativity, one of the two pillars of modern physics. Einstein received the Nobel Prize in Physics in 1921 for his discovery of the law of the photoelectric effect.
Multiple spans could answer this question. Which are valid? (Check all)
The Amazon rainforest produces approximately 20% of the world's oxygen. It spans across nine countries in South America, with Brazil containing 60% of the forest. The Amazon River, which flows through the forest, is the largest river by discharge volume of water in the world.
This question cannot be answered from the passage. How should it be annotated?
The company reported revenue of $50 million in Q1 and $75 million in Q2. Operating costs remained stable at $30 million per quarter throughout the first half of the year.
The answer ($45 million) requires arithmetic (75 - 30). Can this be answered?
Marie Curie was awarded two Nobel Prizes: one in Physics (1903, shared with Pierre Curie and Henri Becquerel) and one in Chemistry (1911). She remains the only person to have won Nobel Prizes in two different sciences.
Write 3 questions that can be answered from this passage:
Compare your span selections with your group. Where did you disagree?
Why is QA annotation challenging?
QA annotation requires balancing precision (exact span) with completeness (sufficient answer), while handling the reality that some questions simply cannot be answered from the given text.