Lecture 16 - Applications of Reinforcement Learning

Expand/Collapse Video

Course Details

Show All

Course Description

This course provides a broad introduction to machine learning and statistical pattern recognition.

Topics include: supervised learning (generative/discriminative learning, parametric/non-parametric learning, neural networks, support vector machines); unsupervised learning (clustering, dimensionality reduction, kernel methods); learning theory (bias/variance tradeoffs; VC theory; large margins); reinforcement learning and adaptive control.
The course will also discuss recent applications of machine learning, such as to robotic control, data mining, autonomous navigation, bioinformatics, speech recognition, and text and web data processing.
Students are expected to have the following background:

Prerequisites: - Knowledge of basic computer science principles and skills, at a level sufficient to write a reasonably non-trivial computer program.
- Familiarity with the basic probability theory. (Stat 116 is sufficient but not necessary.)
- Familiarity with the basic linear algebra (any one of Math 51, Math 103, Math 113, or CS 205 would be much more than necessary.)

Instructor

FPO

Ng, Andrew

Ng's research is in the areas of machine learning and artificial intelligence. He leads the STAIR (STanford Artificial Intelligence Robot) project, whose goal is to develop a home assistant robot that can perform tasks such as tidy up a room, load/unload a dishwasher, fetch and deliver items, and prepare meals using a kitchen. Since its birth in 1956, the AI dream has been to build systems that exhibit "broad spectrum" intelligence. However, AI has since splintered into many different subfields, such as machine learning, vision, navigation, reasoning, planning, and natural language processing. To realize its vision of a home assistant robot, STAIR will unify into a single platform tools drawn from all of these AI subfields. This is in distinct contrast to the 30-year-old trend of working on fragmented AI sub-fields, so that STAIR is also a unique vehicle for driving forward research towards true, integrated AI.

Ng also works on machine learning algorithms for robotic control, in which rather than relying on months of human hand-engineering to design a controller, a robot instead learns automatically how best to control itself. Using this approach, Ng's group has developed by far the most advanced autonomous helicopter controller, that is capable of flying spectacular aerobatic maneuvers that even experienced human pilots often find extremely difficult to execute. As part of this work, Ng's group also developed algorithms that can take a single image,and turn the picture into a 3-D model that one can fly-through and see from different angles.

Handouts

Course Handouts

info.pdf Course Information
schedule.pdf Course Schedule
AI-classes.pdf Other AI Courses

Lecture Handouts

cs229-notes1.pdf Linear Regression, Classification and logistic regression, Generalized Linear Models
cs229-notes2.pdf Generative Learning algorithms
cs229-notes3.pdf Support Vector Machines
cs229-notes4.pdf Learning Theory
cs229-notes5.pdf Regularization and model selection
cs229-notes6.pdf The perceptron and large margin classifiers
cs229-notes7a.pdf The k-means clustering algorithm
cs229-notes7b.pdf Mixtures of Gaussians and the EM algorithm
cs229-notes8.pdf The EM algorithm
cs229-notes9.pdf Factor analysis
cs229-notes10.pdf Principal components analysis
cs229-notes11.pdf Independent Components Analysis
cs229-notes12.pdf Reinforcement Learning and Control

Review Notes

Linear Algebra Review and Reference cs229-linalg.pdf
Probability Theory Review cs229-prob.pdf
Matlab Review
logistic_grad_ascent.txt sigmoid.txt
matlab_session.txt
Convex Optimization Overview, Part I cs229-cvxopt.pdf
Convex Optimization Overview, Part II cs229-cvxopt2.pdf
Hidden Markov Models cs229-hmm.pdf
Gaussian Processes
cs229-gp.pdf compute_kernel_matrix.txt
gp_demo.txt sample_gp_prior.txt

Resources

Advice on applying machine learning:
Slides from Andrew's lecture on getting machine learning algorithms to work in practice can be found here.
Previous projects:
A list of last year's final projects can be found here.
Matlab Resources
Here are a couple of Matlab tutorials that you might find helpful: Matlab Tutorial and A Practical Introduction to Matlab. For emacs users only: If you plan to run Matlab in emacs, here are matlab.el, and a helpful emac's file.
Octave Resources
For a free alternative to Matlab, check out GNU Octave. The official documentation is available here. Some useful tutorials on Octave include Octave Tutorial and Octave on Wiki.
Viewing PostScript and PDF files:
Depending on the computer you are using, you may be able to download a PostScript viewer or PDF viewer for it if you don't already have one.

Course Sessions (20):

Show All

Lecture 1

Watch Online: Download:
Right Click, and Save As
Duration:
Watch Now Download 1 hr 9 min
Topics: The Motivation & Applications of Machine Learning, The Logistics of the Class, The Definition of Machine Learning, The Overview of Supervised Learning, The Overview of Learning Theory, The Overview of Unsupervised Learning, The Overview of Reinforcement Learning

Transcripts

Lecture 2

Watch Online: Download:
Right Click, and Save As
Duration:
Watch Now Download 1 hr 16 min
Topics: An Application of Supervised Learning - Autonomous Deriving, ALVINN, Linear Regression, Gradient Descent, Batch Gradient Descent, Stochastic Gradient Descent (Incremental Descent), Matrix Derivative Notation for Deriving Normal Equations, Derivation of Normal Equations

Transcripts

Lecture 3

Watch Online: Download:
Right Click, and Save As
Duration:
Watch Now Download 1 hr 13 min
Topics: The Concept of Underfitting and Overfitting, The Concept of Parametric Algorithms and Non-parametric Algorithms, Locally Weighted Regression, The Probabilistic Interpretation of Linear Regression, The motivation of Logistic Regression, Logistic Regression, Perceptron

Transcripts

Lecture 4

Watch Online: Download:
Right Click, and Save As
Duration:
Watch Now Download 1 hr 13 min
Topics: Newton's Method, Exponential Family, Bernoulli Example, Gaussian Example, General Linear Models (GLMs), Multinomial Example, Softmax Regression

Transcripts

Lecture 5

Watch Online: Download:
Right Click, and Save As
Duration:
Watch Now Download 1 hr 16 min
Topics: Discriminative Algorithms, Generative Algorithms, Gaussian Discriminant Analysis (GDA), GDA and Logistic Regression, Naive Bayes, Laplace Smoothing

Transcripts

Lecture 6

Watch Online: Download:
Right Click, and Save As
Duration:
Watch Now Download 1 hr 14 min
Topics: Multinomial Event Model, Non-linear Classifiers, Neural Network, Applications of Neural Network, Intuitions about Support Vector Machine (SVM), Notation for SVM, Functional and Geometric Margins

Transcripts

Lecture 7

Watch Online: Download:
Right Click, and Save As
Duration:
Watch Now Download 1 hr 16 min
Topics: Optimal Margin Classifier, Lagrange Duality, Karush-Kuhn-Tucker (KKT) Conditions, SVM Dual, The Concept of Kernels

Transcripts

Lecture 8

Watch Online: Download:
Right Click, and Save As
Duration:
Watch Now Download 1 hr 17 min
Topics: Kernels, Mercer's Theorem, Non-linear Decision Boundaries and Soft Margin SVM, Coordinate Ascent Algorithm, The Sequential Minimization Optimization (SMO) Algorithm, Applications of SVM

Transcripts

Lecture 9

Watch Online: Download:
Right Click, and Save As
Duration:
Watch Now Download 1 hr 14 min
Topics: Bias/variance Tradeoff, Empirical Risk Minimization (ERM), The Union Bound, Hoeffding Inequality, Uniform Convergence - The Case of Finite H, Sample Complexity Bound, Error Bound, Uniform Convergence Theorem & Corollary

Transcripts

Lecture 10

Watch Online: Download:
Right Click, and Save As
Duration:
Watch Now Download 1 hr 13 min
Topics: Uniform Convergence - The Case of Infinite H, The Concept of 'Shatter' and VC Dimension, SVM Example, Model Selection, Cross Validation, Feature Selection

Transcripts

Lecture 11

Watch Online: Download:
Right Click, and Save As
Duration:
Watch Now Download 1 hr 22 min
Topics: Bayesian Statistics and Regularization, Online Learning, Advice for Applying Machine Learning Algorithms, Debugging/fixing Learning Algorithms, Diagnostics for Bias & Variance, Optimization Algorithm Diagnostics, Diagnostic Example - Autonomous Helicopter, Error Analysis, Getting Started on a Learning Problem

Transcripts

Lecture 12

Watch Online: Download:
Right Click, and Save As
Duration:
Watch Now Download 1 hr 14 min
Topics: The Concept of Unsupervised Learning, K-means Clustering Algorithm, K-means Algorithm, Mixtures of Gaussians and the EM Algorithm, Jensen's Inequality, The EM Algorithm, Summary

Transcripts

Lecture 13

Watch Online: Download:
Right Click, and Save As
Duration:
Watch Now Download 1 hr 15 min
Topics: Mixture of Gaussian, Mixture of Naive Bayes - Text clustering (EM Application), Factor Analysis, Restrictions on a Covariance Matrix, The Factor Analysis Model, EM for Factor Analysis

Transcripts

Lecture 14

Watch Online: Download:
Right Click, and Save As
Duration:
Watch Now Download 1 hr 21 min
Topics: The Factor Analysis Model,0 EM for Factor Analysis, Principal Component Analysis (PCA), PCA as a Dimensionality Reduction Algorithm, Applications of PCA, Face Recognition by Using PCA

Transcripts

Lecture 15

Watch Online: Download:
Right Click, and Save As
Duration:
Watch Now Download 1 hr 17 min
Topics: Latent Semantic Indexing (LSI), Singular Value Decomposition (SVD) Implementation, Independent Component Analysis (ICA), The Application of ICA, Cumulative Distribution Function (CDF), ICA Algorithm, The Applications of ICA

Transcripts

Lecture 16

Watch Online: Download:
Right Click, and Save As
Duration:
Now Playing Download 1 hr 13 min
Topics: Applications of Reinforcement Learning, Markov Decision Process (MDP), Defining Value & Policy Functions, Value Function, Optimal Value Function, Value Iteration, Policy Iteration

Transcripts

Lecture 17

Watch Online: Download:
Right Click, and Save As
Duration:
Watch Now Download 1 hr 17 min
Topics: Generalization to Continuous States, Discretization & Curse of Dimensionality, Models/Simulators, Fitted Value Iteration, Finding Optimal Policy

Transcripts

Lecture 18

Watch Online: Download:
Right Click, and Save As
Duration:
Watch Now Download 1 hr 17 min
Topics: State-action Rewards, Finite Horizon MDPs, The Concept of Dynamical Systems, Examples of Dynamical Models, Linear Quadratic Regulation (LQR), Linearizing a Non-Linear Model, Computing Rewards, Riccati Equation

Transcripts

Lecture 19

Watch Online: Download:
Right Click, and Save As
Duration:
Watch Now Download 1 hr 16 min
Topics: Advice for Applying Machine Learning, Debugging Reinforcement Learning (RL) Algorithm, Linear Quadratic Regularization (LQR), Differential Dynamic Programming (DDP), Kalman Filter & Linear Quadratic Gaussian (LQG), Predict/update Steps of Kalman Filter, Linear Quadratic Gaussian (LQG)

Transcripts

Lecture 20

Watch Online: Download:
Right Click, and Save As
Duration:
Watch Now Download 1 hr 17 min
Topics: Partially Observable MDPs (POMDPs), Policy Search, Reinforce Algorithm, Pegasus Algorithm, Pegasus Policy Search, Applications of Reinforcement Learning

Transcripts