Course Details

Show All

Course Description

This course is designed to introduce students to the fundamental concepts and ideas in natural language processing (NLP), and to get them up to speed with current research in the area. It develops an in-depth understanding of both the algorithms available for the processing of linguistic information and the underlying computational properties of natural languages. Wordlevel, syntactic, and semantic processing from both a linguistic and an algorithmic perspective are considered. The focus is on modern quantitative techniques in NLP: using large corpora, statistical models for acquisition, disambiguation, and parsing. Also, it examines and constructs representative systems.

Prerequisites:
• Adequate experience with programming and formal structures (e.g., CS106B/X and CS103B/X).
• Programming projects will be written in Java 1.5, so knowledge of Java (or a willingness to learn on your own) is required.
• Knowledge of standard concepts in artificial intelligence and/or computational linguistics (e.g., CS121/221 or Ling 180).
• Basic familiarity with logic, vector spaces, and probability. Intended Audience:
• Graduate students and advanced undergraduates specializing in computer science, linguistics, or symbolic systems.

Due to copyright issues, video downloads and lecture slides are not available for Natural Language Processing.

Instructor

FPO

Manning, Christopher D.

Manning works on systems that can intelligently process and produce human languages. Particular research interests include probabilistic models of language, statistical natural language processing, information extraction, text mining, robust textual infererence, statistical parsing, grammar induction, constraint-based theories of grammar, and computational lexicography.

My current research focuses on robust but linguistically sophisticated probabilistic natural language processing, and opportunities to use it in real-world domains. Particularly topics include richer models for probabilistic parsing, grammar induction, text categorization and clustering, incorporating probabilistic models into constraint-based syntactic theories such as Head-driven Phrase Structure Grammar and Lexical Functional Grammar, electronic dictionaries and their usability, particularly for indigenous languages, information extraction and presentation, and linguistic typology.

My research at Stanford is currently supported by an IBM Faculty Partnership Award, ARDA, Scottish Enterprise, and DARPA. Previous funding at Stanford comes from a Terman Fellowship, NSF (for GIB), NTT, NHK, and the Australian Reseach Council.

Assignments

Programming Assignments

1. N-Gram Language Models (Lectures 1-4)
- We'll build a language model based on n-gram statistics estimated from a large corpus, and test our model's ability to help with a speech recognition task
2. Word Alignment Models for Machine Translation (Lectures 5-9)
- Read: an update on the decoder)
- We'll build word alignment models based on IBM models 1 and 2. It will be trained and tested on the Hansard corpus, consisting of parallel English and French sentences.
- Paste in your language model from PA1, and with the provided Greedy Decoder, you have a complete statistical machine translation system, to try out on the provided French, German, and Spanish corpora.
3. Maximum Entropy Markov Models & Treebank Parsing (Lectures 10-3)
- This assignment looks at named entity recognition and parsing. The aim is to examine whether pre-chunking of named entities can improve the performance of a statistical parser trained on financial newswire text when applied to the task of parsing biomedical research articles. You will build a maximum entropy classifier, which will be incorporated into a maximum entropy Markov model for doing named entity recognition on biomedical text. You will also implement the parsing algorithm for a broad coverage statistical treebank parser. We have included in the support code the ability to chunk entities into a single word, and then to pass this chunked sentence to the parser, so that you can then informally compare the performance of the parser on chunked and unchunked input.

Final Project

There will be a final programming project on a topic of your own choosing. See the final project guide for more information.
Final Programming Project Guidelines

Exams

Quiz 1 Lectures 1-3
Quiz 2 Lectures 4-5
Quiz 3 Lectures 6-7
Quiz 4 Lectures 8-9
Quiz 5 Lectures 10-11
Quiz 6 Lectures 12-13
Quiz 7 Lectures 14-15
Quiz 8 Lectures 16-17
*Solutions are not available for the quizzes.

Course Sessions (18):

Show All

Lecture 1

Watch Online: Download:
Right Click, and Save As
Duration:
Watch Now Download 1 hr 13 min*
Topics: Logistics, Goals Of The Field Of NLP, Is The Problem Just Cycles?, Why NLP Is Difficult? The Hidden Structure Of Language, Why NLP Is Difficult: Newspaper Headlines, Machine Translation, Machine Translation History, Centauri/Arcturan Example
* Segments of this lecture have been edited out due to copyright restrictions.

Transcripts

Lecture 2

Watch Online: Download:
Right Click, and Save As
Duration:
Watch Now Download 1 hr 14 min*
Topics: Questions That Linguistics Should Answer, Machine Translation (MT), Probabilistic Language Models, Evaluation, Sparsity, Smoothing, How Much Mass To Withhold?
* Segments of this lecture have been edited out due to copyright restrictions.

Transcripts

Lecture 3

Watch Online: Download:
Right Click, and Save As
Duration:
Watch Now Download 1 hr 15 min
Topics: Finish Smoothing From Last Lecture, Kneser-Ney Smoothing, Practical Considerations, Machine Translation (Lecture 3), Tokenization (Or Segmentation), Statistical MT Systems, IBM Translation Models

Transcripts

Lecture 4

Watch Online: Download:
Right Click, and Save As
Duration:
Watch Now Download 1 hr 15 min
Topics: Review Statistical Mt, Model 1, The Em Algorithm, Em And Hidden Structure, Em Algorithm Demonstration In Excel Spreadsheet, Assignment 1

Transcripts

Lecture 5

Watch Online: Download:
Right Click, and Save As
Duration:
Watch Now Download 1 hr 10 min*
Topics: IBM Model 1-2 (Review), IBM Model 3, IBM Model 4, IBM Model 5, Mt Evaluation, Bleu Evaluation Metric, A Complete Translation System, Flaws Of Word-Based Mt, Phrased-Based Stat-Mt
* Segments of this lecture have been edited out due to copyright restrictions.

Transcripts

Lecture 6

Watch Online: Download:
Right Click, and Save As
Duration:
Watch Now Download 1 hr 13 min*
Topics: Continue Of Machine Translation, Syntax-Based Model, Information Extraction & Named Entity Recognition, Information Extraction, Named Entity Extraction, Precision And Recall, Naive Bayes Classifiers
* Segments of this lecture have been edited out due to copyright restrictions.

Transcripts

Lecture 7

Watch Online: Download:
Right Click, and Save As
Duration:
Watch Now Download 1 hr 15 min
Topics: Continue Of Naive Bayes Classifier, Joint V.S. Conditional Models, Features, Examples, Feature-Based Classifiers, Comparison To Naïve-Bayes, Building A Maxent Model

Transcripts

Lecture 8

Watch Online: Download:
Right Click, and Save As
Duration:
Watch Now Download 1 hr 16 min*
Topics: Details Of Maxent Model, Maxent Examples, Convexity, Feature Interaction, Classification, Smoothing, Inference In Systems
* Segments of this lecture have been edited out due to copyright restrictions.

Transcripts

Lecture 9

Watch Online: Download:
Right Click, and Save As
Duration:
Watch Now Download 1 hr 7 min*
Topics: MEMM, Hmm Pos Tagging Model, Summary Of Tagging, NER, Information Extraction And Integration, Landscape Of IE Tasks, Machine Learning Methods, Relation Extraction, Clustering
* Segments of this lecture have been edited out due to copyright restrictions.

Transcripts

Lecture 10

Watch Online: Download:
Right Click, and Save As
Duration:
Watch Now Download 1 hr 17 min
Topics: Parsing, Classical NLP Parsing, Two Views Of Linguistic Structure, Attachment Ambiguities, A Simple Prediction, What Is Parsing?, Top-Down Parsing, Bottom-Up Parsing, Parsing Of PCFGs

Transcripts

Lecture 11

Watch Online: Download:
Right Click, and Save As
Duration:
Watch Now Download 1 hr 17 min
Topics: Chomsky Normal Form, Cocke-Kasami-Younger (CKY) Constituency Parsing, Extended CKY Parsing, Efficient CKY Parsing, Evaluating Parsing Accuracy, How Good Are PCFGs?, Improve PCFG Parsing Via Unlexicalized Parsing, Markovization

Transcripts

Lecture 12

Watch Online: Download:
Right Click, and Save As
Duration:
Watch Now Download 1 hr 5 min
Topics: Guest Lecturer: Dan Jurafsky, Syntactic Variations Versus Semantic Roles, Some Typical Semantic Roles, Two Solutions To The Difficulty Of Defining Semantic Roles, PropBank, FrameNet, Information Extraction Versus Semantic Role Labeling, Evaluation Measures, Parsing Algorithm, Combining Identification And Classification Models, Summary

Transcripts

Lecture 13

Watch Online: Download:
Right Click, and Save As
Duration:
Watch Now Download 1 hr 16 min
Topics: Lexicalized Parsing, Parsing Via Classification Decisions: Charniak (1997), Sparseness & The Penn Treebank, Complexity Of Lexicalized PCFG Parsing, Complexity Of Lexicalized PCFG Parsing, Overview Of Collins’ Model, Choice Of Heads, The Latest Parsing Results, Parsing And Search Algorithms

Transcripts

Lecture 14

Watch Online: Download:
Right Click, and Save As
Duration:
Watch Now Download 1 hr 18 min
Topics: Parsing As Search, Agenda-Based Parsing, What Can Go Wrong?, Search In Modern Lexicalized Statistical Parsers, Dependency Parsing, Naïve Recognition/Parsing, Discriminative Parsing, Discriminative Models

Transcripts

Lecture 15

Watch Online: Download:
Right Click, and Save As
Duration:
Watch Now Download 1 hr 7 min*
Topics: Why Study Computational Semantics?, Precise Semantics. An Early Example: Chat-80, Programming Language Interpreter, Logic: Some Preliminaries, Compositional Semantics, A Simple DCG Grammar With Semantics, Augmented CFG Rules, Semantic Grammars
* Segments of this lecture have been edited out due to copyright restrictions.

Transcripts

Lecture 16

Watch Online: Download:
Right Click, and Save As
Duration:
Watch Now Download 1 hr 15 min
Topics: An Introduction To Formal Computational Semantics, Database/ Knowledgebase Interfaces, Typed Lambda Calculus, Types Of Major Syntactic Categories, Adjective And PP Modification, Why Things Get More Complex, Generalized Quantifiers, Representing Proper Nouns With Quantifiers, Questions With Answers!, How Could We Learn Such Representations?

Transcripts

Lecture 17

Watch Online: Download:
Right Click, and Save As
Duration:
Watch Now Download 1 hr 12 min*
Topics: Lexical Semantics, Lexical Information And NL Applications, Polysemy Vs Homonymy, WordNet, Word Sense Disambiguation, Corpora Used For WSD Work, Evaluation, Lexical Acquisition, Vector-Based Lexical Semantics, Measures Of Semantic Similarity
* Segments of this lecture have been edited out due to copyright restrictions.

Transcripts

Lecture 18

Watch Online: Download:
Right Click, and Save As
Duration:
Watch Now Download 1 hr 15 min*
Topics: Question Answering Systems And Textual Inference, A Brief (Academic) History, Top Performing Systems, Answer Types In State-Of-The-Art QA Systems, Semantics And Reasoning For QA, The Textual Inference Task, Why We Need Sloppy Matching, QA Beyond TREC
* Segments of this lecture have been edited out due to copyright restrictions.

Transcripts