Evaluating summarization quality requires assessing multiple dimensions simultaneously. Unlike simple classification, summary evaluation involves subjective judgments about what information is important and how well it's conveyed.
Key Evaluation Dimensions:
Scientists at MIT have developed a new battery technology that could charge electric vehicles in just 10 minutes. The research, published in Nature Energy, shows that the lithium-ion batteries use a modified electrode structure. Lead researcher Dr. Sarah Chen noted that while promising, the technology is still 3-5 years from commercial production. Current EV batteries typically require 30-60 minutes for a full charge.
MIT researchers have created a revolutionary battery that charges EVs in 10 minutes, making range anxiety a thing of the past. The breakthrough, led by Dr. Sarah Chen, will be available to consumers within 2 years.
Identify specific faithfulness errors in Summary A:
A study in Nature Energy describes lithium-ion batteries with modified electrode structures.
Faithfulness:
Relevance:
Summary B is faithful but misses key information. Which is worse: unfaithful or incomplete?
The city council approved a $50 million budget for road repairs. Mayor Johnson praised the decision, calling it "long overdue." Critics argue the money should go to schools instead. The repairs will focus on the downtown area first, then expand to suburbs over three years.
Road repairs will start downtown. $50 million was approved by the council. Mayor Johnson praised it. Critics want school funding. Suburbs will be repaired later.
The city council approved $50 million for road repairs that will begin downtown and expand to suburbs over three years. While Mayor Johnson praised the decision, critics argue the funding should go to schools instead.
Both summaries contain the same information. Rate their coherence:
Summary C coherence:
Summary D coherence:
Rank the summaries from best to worst overall:
Rank these summary errors from most to least severe:
Compare ratings with your group. Where did you disagree?
Why is summarization evaluation difficult?
Summary evaluation is inherently multi-dimensional, and annotators must balance competing criteria while maintaining consistent standards.