Worksheet: A Hard NLP Classification Task — Toxicity Detection

Experience why annotation is difficult even for "simple" NLP tasks
Course: Natural Language Annotation for Machine Learning Task Type: Binary classification (with ambiguity)
Author: Jin Zhao

Background

You are part of a team building an AI system that classifies online comments as:

This system will be used for content moderation, meaning mistakes can affect real people.

Your goal is to:

Part 1: Label Definition (Before Annotating)

Read the following preliminary label definitions:

Toxic: Language that is insulting, demeaning, or hostile toward a person or group.

Not Toxic: Language that does not meet the above criteria.

Question 1

Are these definitions sufficient to label comments consistently?

Part 2: Individual Annotation

For each comment below: Assign a label, Write why you chose it, and Note any uncertainty.

Example 1
"Wow, great job."
Example 2
"Nice work, genius."
Example 3
"This idea is stupid."
Example 4
"Sure, that'll work. Just like last time."
Example 5
"I can't believe you actually think this makes sense."

Part 3: Group Comparison

Discuss your answers with 2–3 classmates.

Question 2

Which examples had disagreement in your group? (Check all that apply.)

Question 3

What caused the disagreement? (Check all that apply.)

Part 4: Dataset-Level Decisions

You now see the full annotation results for Example 2:

"Nice work, genius."

Out of 10 annotators:

6
labeled Toxic
4
labeled Not Toxic
Question 4

How should this example be handled in the dataset?

Part 5: Revising the Task Definition

Question 5

Which of the following guideline changes would most reduce disagreement?

Part 6: Reflection

Question 6

Why is this classification task difficult even though it has only two labels?

Question 7

If you trained a model on this data without resolving disagreement, what might happen?

Key Takeaway (for class discussion)

Classification difficulty is driven by human judgment, not label count.

Even simple label sets can produce:

  • High disagreement
  • Biased models
  • Unstable predictions