Instructor: Manning, Chris

(return to course)

Assignments

Programming Assignments

1. N-Gram Language Models (Lectures 1-4)
- We'll build a language model based on n-gram statistics estimated from a large corpus, and test our model's ability to help with a speech recognition task
2. Word Alignment Models for Machine Translation (Lectures 5-9)
- Read: an update on the decoder)
- We'll build word alignment models based on IBM models 1 and 2. It will be trained and tested on the Hansard corpus, consisting of parallel English and French sentences.
- Paste in your language model from PA1, and with the provided Greedy Decoder, you have a complete statistical machine translation system, to try out on the provided French, German, and Spanish corpora.
3. Maximum Entropy Markov Models & Treebank Parsing (Lectures 10-3)
- This assignment looks at named entity recognition and parsing. The aim is to examine whether pre-chunking of named entities can improve the performance of a statistical parser trained on financial newswire text when applied to the task of parsing biomedical research articles. You will build a maximum entropy classifier, which will be incorporated into a maximum entropy Markov model for doing named entity recognition on biomedical text. You will also implement the parsing algorithm for a broad coverage statistical treebank parser. We have included in the support code the ability to chunk entities into a single word, and then to pass this chunked sentence to the parser, so that you can then informally compare the performance of the parser on chunked and unchunked input.

Final Project

There will be a final programming project on a topic of your own choosing. See the final project guide for more information.
Final Programming Project Guidelines

Download complete set of course materials. (Includes all available handouts, assignments, exams, and computer software. Does not include video assets)