Semester Project - COSI 230B

Overview

Groups of 3-4 students will complete a semester-long annotation project following the MATTER/MAMA cycle methodology. This is an essential component of the course where you will:

Design an annotation specification and guidelines for an NLP task
Annotate a dataset with your team
Evaluate inter-annotator agreement
Refine task guidelines based on disagreement analysis
Train and evaluate baseline NLP models
Report your findings

Project Documents

Project Overview

Full project description, milestones, and expectations

PDF

Group Contract Instructions

Template for assigning roles and responsibilities within your group

PDF

Annotation Goal Presentation Instructions

Guidelines for the Week 9 annotation goal presentation

PDF

Getting to Know Your Group

Before diving into the project, take time to establish your team:

Find out your common interests in annotation tasks
Find regular times to meet in person outside of class — this is critical for project quality
Identify what skills and resources each person brings:
- Knowledge of potential datasets
- Ideas for initial schema, workflows, etc.
- Specific skills (programming, data processing, writing, project management, etc.)

Project Timeline

Group Contract

Due Wed 2/25 by 11:59pm

A document assigning responsibility for specific tasks within the project. Divide work by skill — every member participates in annotation. Include group members, skills, task assignments, communication plan, meeting schedule, and data sharing plan. Submit signed document on LATTE.

Draft Annotation Schema

Due March 9

In-class presentation and write-up on your topic, dataset, and goals. Include an intuitive design for tagset and attributes, a small pilot annotation, and a brief literature review. Schema does not need to be fully formalized yet.

Annotation Goal Presentations

Week 9 (Mar 9, 11) — Slides due on LATTE by 3/9 @ 9:00am

10-minute group presentation covering: planned dataset and sampling technique, document balance (genre, time period, authorship), data source and size, copyright/licensing, existing annotations, planned task and goals, and discussion of 2-3 relevant papers.

MAMA/MATTER Cycle

March

Perform parallel, independent pilot annotation on a subset of your corpus. Measure inter-annotator agreement (IAA), review disagreements, consider task complexity. Revise guidelines and repeat until you achieve satisfactory IAA.

Full Annotation Specification

Due late March

Formal annotation schema, v1.0 of guidelines, and presentation. The specification must be clear enough for non-group annotators, with relevant examples (positive and negative) and expected edge cases. Schema should be operationalized within an annotation tool.

Full Annotation Task

April

Your group will not annotate its own task — each group will be given the schema and specification from another group. Grading is based on meeting deadlines and following the other group's instructions.

Annotation Report & Final Presentation

Due during finals week in May

In-class presentation, ACL-style paper (4 pages), and peer evaluation.

Deliverables

Final Paper (ACL-style, 4 pages)

Overview of task goals and annotation specification
Characterization of your dataset (data distribution, annotation distribution, etc.)
Difficulties during data collection (solved and unsolved), with possible improvements for future iterations
Annotation quality:
- Quantitative analysis of annotation reliability and interpretation
- Qualitative analysis of annotator disagreements
Machine learning experiment:
- Experimental design — baseline system, baseline features, features engineered from annotations
- Experimental results

Final Presentation

15-minute in-class presentation
Present annotation task, methodology, and results
Q&A with class

Dataset Submission

Annotated corpus with documentation
Annotation guidelines document
Data in standard format (JSON, XML, or specified format)

Peer Evaluation

Evaluate group members' contributions
Individual accountability component
Submitted confidentially

Grading Rubric

Component	Weight	Description
Annotation Schema Design	20%	Clarity, completeness, and appropriateness of the schema
Guidelines Quality	20%	Clear, comprehensive, with good examples
Dataset Quality	20%	Consistency, coverage, proper adjudication
IAA Analysis	15%	Appropriate metrics, thoughtful analysis of disagreements
Final Paper	15%	Writing quality, organization, academic rigor
Presentation	10%	Clarity, engagement, Q&A handling

Project Ideas

Here are some potential annotation tasks to consider:

Traditional NLP Tasks

Named entity recognition for a specific domain
Sentiment analysis with aspect-level annotations
Relation extraction for a knowledge domain
Event detection and argument extraction
Coreference resolution for a specific genre

LLM-Related Tasks

Preference annotation for chatbot responses
Hallucination detection in LLM outputs
Safety/toxicity annotation
Instruction-following evaluation
Code generation quality assessment

Resources

ACL Paper Format Templates
Annotation Tools
Dataset sources: Hugging Face, Kaggle, LDC, academic papers

Tip: Start early! Annotation takes longer than you expect, and you'll need multiple rounds of revision.