COSI 230B | Spring 2026

Natural Language Annotation for Machine Learning

Resources & Readings

Resources

Textbooks, tools, and readings

Required Textbook

Natural Language Annotation for Machine Learning: A Guide to Corpus-Building for Applications
James Pustejovsky & Amber Stubbs
O'Reilly Media, 2012

Available online through O'Reilly

Required Software

Annotation Tools

Recommended LLM Access

Foundational Readings

Inter-annotator Agreement
Ron Artstein
Handbook of Linguistic Annotation, 2017
Computing Krippendorff's Alpha-Reliability
Klaus Krippendorff
2011
The Benefits of a Model of Annotation
Rebecca Passonneau & Bob Carpenter
TACL, 2014

LLM-Based Annotation

ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks
Gilardi, Alizadeh, & Kubli
arXiv:2303.15056, 2023
Large Language Models for Data Annotation and Synthesis: A Survey
Tan et al.
EMNLP, 2024
AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators
He et al.
arXiv:2303.16854, 2023
Automated Annotation with Generative AI Requires Validation
Pangakis, Wolken, & Fasching
arXiv:2306.00176, 2023

RLHF and Preference Learning

Training language models to follow instructions with human feedback
Ouyang et al.
NeurIPS, 2022
Constitutional AI: Harmlessness from AI Feedback
Bai et al.
arXiv:2212.08073, 2022
Direct Preference Optimization
Rafailov et al.
NeurIPS, 2023

LLM-as-Judge

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Zheng et al.
arXiv:2306.05685, 2023
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
Liu et al.
EMNLP, 2023

Low-Resource and Multilingual

MasakhaNER: Named Entity Recognition for African Languages
Adelani et al.
TACL, 2021
MEGA: Multilingual Evaluation of Generative AI
Ahuja et al.
EMNLP, 2023

Brandeis Library Resources

The Brandeis Library offers resources and services including: