Homework Assignments
5 assignments worth 40% of your final grade
Assignment Policy
Each homework assignment has equal weight. Assignments are announced and submitted via MOODLE.
Homework 0: Dataset Exploration
Description: Explore existing annotated NLP datasets using a provided worksheet. Analyze datasets to understand annotation schemas, data formats, and task characteristics.
Deliverables: Completed Excel worksheet with dataset analysis
Skills: Dataset analysis, understanding annotation schemas, data format familiarity
Homework 1: Annotation Tools Exploration
Description: Hands-on experience with annotation tools and data formats.
Tasks:
- Set up and use brat annotation tool
- Set up and use Label Studio
- Annotate sample data from provided corpora (wikinews, reddit)
- Compare annotation interfaces and workflows
Deliverables:
- Annotated files from brat
- Annotated files from Label Studio
- Reflection on tool comparison
Homework 2: Data Wrangling with Pandas
Description: Implement an NLPDataFrame class that extends pandas for NLP annotation tasks.
Tasks:
- Load data from multiple formats (JSON, CSV, XML)
- Implement preprocessing methods (lowercase, normalize numbers, strip whitespace/punctuation)
- Implement summary statistics methods
- Serialize processed data
Deliverables:
- Completed
hw2.pyfile - Serialized
nlp_data.bz2file
Homework 3: Inter-Annotator Agreement
Description: Calculate agreement metrics for annotation data.
Tasks:
- Calculate observed agreement for span annotations (token-level and entity-level)
- Compute Cohen's Kappa for two annotators
- Compute Fleiss' Kappa for multiple annotators
- Interpret agreement scores and discuss implications
Deliverables: Written solutions showing all calculations and work
Homework 4: Sentiment Analysis Fine-tuning
Description: Fine-tune a sentiment classifier on annotated movie review data.
Tasks:
- Load and preprocess the movie review dataset
- Fine-tune a pre-trained model for sentiment classification
- Evaluate model performance
- Analyze results and errors
Deliverables: Completed Jupyter notebook with code, results, and analysis
Submission Guidelines
- All assignments are submitted through MOODLE
- Follow the naming conventions specified in each assignment
- Include your name and student ID in all submissions
- If using generative AI tools, disclose their use as required by course policy