Stanford Engineering Everywhere | CS229 - Machine Learning

Lecture 18 - State-action Rewards

Bookmarks
- 00:01:25
  The Review Of The Previous Class
- 00:04:13
  State-action Rewards
- 00:09:59
  Finite Horizon MDPS
- 00:11:50
  The Concept Of Dynamical Systems
- 00:14:32
  Examples Of Dynamical Models
- 00:28:29
  Linear Quadratic Regulation (LQR)
- 00:41:18
  Linearizing A Non-linear Model
- 00:53:17
  Computing Rewards
- 01:05:55
  Riccati Equation
About the Lecture

TITLE:

Lecture 18 - State-action Rewards

DURATION:

1 hr 17 min

TOPICS:

Course Details

Course Description

This course provides a broad introduction to machine learning and statistical pattern recognition.

Topics include: supervised learning (generative/discriminative learning, parametric/non-parametric learning, neural networks, support vector machines); unsupervised learning (clustering, dimensionality reduction, kernel methods); learning theory (bias/variance tradeoffs; VC theory; large margins); reinforcement learning and adaptive control.
The course will also discuss recent applications of machine learning, such as to robotic control, data mining, autonomous navigation, bioinformatics, speech recognition, and text and web data processing.
Students are expected to have the following background:

Prerequisites: - Knowledge of basic computer science principles and skills, at a level sufficient to write a reasonably non-trivial computer program.
- Familiarity with the basic probability theory. (Stat 116 is sufficient but not necessary.)
- Familiarity with the basic linear algebra (any one of Math 51, Math 103, Math 113, or CS 205 would be much more than necessary.)

Instructor

Ng, Andrew

Ng's research is in the areas of machine learning and artificial intelligence. He leads the STAIR (STanford Artificial Intelligence Robot) project, whose goal is to develop a home assistant robot that can perform tasks such as tidy up a room, load/unload a dishwasher, fetch and deliver items, and prepare meals using a kitchen. Since its birth in 1956, the AI dream has been to build systems that exhibit "broad spectrum" intelligence. However, AI has since splintered into many different subfields, such as machine learning, vision, navigation, reasoning, planning, and natural language processing. To realize its vision of a home assistant robot, STAIR will unify into a single platform tools drawn from all of these AI subfields. This is in distinct contrast to the 30-year-old trend of working on fragmented AI sub-fields, so that STAIR is also a unique vehicle for driving forward research towards true, integrated AI.

Ng also works on machine learning algorithms for robotic control, in which rather than relying on months of human hand-engineering to design a controller, a robot instead learns automatically how best to control itself. Using this approach, Ng's group has developed by far the most advanced autonomous helicopter controller, that is capable of flying spectacular aerobatic maneuvers that even experienced human pilots often find extremely difficult to execute. As part of this work, Ng's group also developed algorithms that can take a single image,and turn the picture into a 3-D model that one can fly-through and see from different angles.

Handouts

Course Handouts

info.pdf	Course Information
schedule.pdf	Course Schedule
AI-classes.pdf	Other AI Courses

Lecture Handouts

cs229-notes1.pdf	Linear Regression, Classification and logistic regression, Generalized Linear Models
cs229-notes2.pdf	Generative Learning algorithms
cs229-notes3.pdf	Support Vector Machines
cs229-notes4.pdf	Learning Theory
cs229-notes5.pdf	Regularization and model selection
cs229-notes6.pdf	The perceptron and large margin classifiers
cs229-notes7a.pdf	The k-means clustering algorithm
cs229-notes7b.pdf	Mixtures of Gaussians and the EM algorithm
cs229-notes8.pdf	The EM algorithm
cs229-notes9.pdf	Factor analysis
cs229-notes10.pdf	Principal components analysis
cs229-notes11.pdf	Independent Components Analysis
cs229-notes12.pdf	Reinforcement Learning and Control

Review Notes

Linear Algebra Review and Reference

cs229-linalg.pdf

Probability Theory Review

cs229-prob.pdf

Matlab Review

logistic_grad_ascent.txt	sigmoid.txt
matlab_session.txt

Convex Optimization Overview, Part I

cs229-cvxopt.pdf

Convex Optimization Overview, Part II

cs229-cvxopt2.pdf

Hidden Markov Models

cs229-hmm.pdf

Gaussian Processes

cs229-gp.pdf	compute_kernel_matrix.txt
gp_demo.txt	sample_gp_prior.txt

Resources

Advice on applying machine learning:
Slides from Andrew's lecture on getting machine learning algorithms to work in practice can be found here.

Previous projects:
A list of last year's final projects can be found here.

Matlab Resources
Here are a couple of Matlab tutorials that you might find helpful: Matlab Tutorial and A Practical Introduction to Matlab. For emacs users only: If you plan to run Matlab in emacs, here are matlab.el, and a helpful emac's file.

Octave Resources
For a free alternative to Matlab, check out GNU Octave. The official documentation is available here. Some useful tutorials on Octave include Octave Tutorial and Octave on Wiki.

Viewing PostScript and PDF files:
Depending on the computer you are using, you may be able to download a PostScript viewer or PDF viewer for it if you don't already have one.

Assignments

Assignment	Assignment Data Files	Solution	Solution Data Files
Problem Set 1	PS1-data.zip	Solution Set 1	ps1_solution-data.zip
Problem Set 2	PS2-data.zip	Solution Set 2
Problem Set 3	PS3-data.zip	Solution Set 3	ps3_solution-data.zip
Problem Set 4	PS4-data.zip	Solution Set 4	ps4_solution-data.zip

Course Sessions (20):

Show All

Lecture 1

Watch Online:	Download: Right Click, and Save As	Duration:
Watch Now	Download	1 hr 9 min
Topics: The Motivation & Applications of Machine Learning, The Logistics of the Class, The Definition of Machine Learning, The Overview of Supervised Learning, The Overview of Learning Theory, The Overview of Unsupervised Learning, The Overview of Reinforcement Learning

Transcripts

HTML
PDF

Lecture 2

Watch Online:	Download: Right Click, and Save As	Duration:
Watch Now	Download	1 hr 16 min
Topics: An Application of Supervised Learning - Autonomous Deriving, ALVINN, Linear Regression, Gradient Descent, Batch Gradient Descent, Stochastic Gradient Descent (Incremental Descent), Matrix Derivative Notation for Deriving Normal Equations, Derivation of Normal Equations

Transcripts

HTML
PDF

Lecture 3

Watch Online:	Download: Right Click, and Save As	Duration:
Watch Now	Download	1 hr 13 min
Topics: The Concept of Underfitting and Overfitting, The Concept of Parametric Algorithms and Non-parametric Algorithms, Locally Weighted Regression, The Probabilistic Interpretation of Linear Regression, The motivation of Logistic Regression, Logistic Regression, Perceptron

Transcripts

HTML
PDF

Lecture 4

Watch Online:	Download: Right Click, and Save As	Duration:
Watch Now	Download	1 hr 13 min
Topics: Newton's Method, Exponential Family, Bernoulli Example, Gaussian Example, General Linear Models (GLMs), Multinomial Example, Softmax Regression

Transcripts

HTML
PDF

Lecture 5

Watch Online:	Download: Right Click, and Save As	Duration:
Watch Now	Download	1 hr 16 min
Topics: Discriminative Algorithms, Generative Algorithms, Gaussian Discriminant Analysis (GDA), GDA and Logistic Regression, Naive Bayes, Laplace Smoothing

Transcripts

HTML
PDF

Lecture 6

Watch Online:	Download: Right Click, and Save As	Duration:
Watch Now	Download	1 hr 14 min
Topics: Multinomial Event Model, Non-linear Classifiers, Neural Network, Applications of Neural Network, Intuitions about Support Vector Machine (SVM), Notation for SVM, Functional and Geometric Margins

Transcripts

HTML
PDF

Lecture 7

Watch Online:	Download: Right Click, and Save As	Duration:
Watch Now	Download	1 hr 16 min
Topics: Optimal Margin Classifier, Lagrange Duality, Karush-Kuhn-Tucker (KKT) Conditions, SVM Dual, The Concept of Kernels

Transcripts

HTML
PDF

Lecture 8

Watch Online:	Download: Right Click, and Save As	Duration:
Watch Now	Download	1 hr 17 min
Topics: Kernels, Mercer's Theorem, Non-linear Decision Boundaries and Soft Margin SVM, Coordinate Ascent Algorithm, The Sequential Minimization Optimization (SMO) Algorithm, Applications of SVM

Transcripts

HTML
PDF

Lecture 9

Watch Online:	Download: Right Click, and Save As	Duration:
Watch Now	Download	1 hr 14 min
Topics: Bias/variance Tradeoff, Empirical Risk Minimization (ERM), The Union Bound, Hoeffding Inequality, Uniform Convergence - The Case of Finite H, Sample Complexity Bound, Error Bound, Uniform Convergence Theorem & Corollary

Transcripts

HTML
PDF

Lecture 10

Watch Online:	Download: Right Click, and Save As	Duration:
Watch Now	Download	1 hr 13 min
Topics: Uniform Convergence - The Case of Infinite H, The Concept of 'Shatter' and VC Dimension, SVM Example, Model Selection, Cross Validation, Feature Selection

Transcripts

HTML
PDF

Lecture 11

Watch Online:	Download: Right Click, and Save As	Duration:
Watch Now	Download	1 hr 22 min
Topics: Bayesian Statistics and Regularization, Online Learning, Advice for Applying Machine Learning Algorithms, Debugging/fixing Learning Algorithms, Diagnostics for Bias & Variance, Optimization Algorithm Diagnostics, Diagnostic Example - Autonomous Helicopter, Error Analysis, Getting Started on a Learning Problem

Transcripts

HTML
PDF

Lecture 12

Watch Online:	Download: Right Click, and Save As	Duration:
Watch Now	Download	1 hr 14 min
Topics: The Concept of Unsupervised Learning, K-means Clustering Algorithm, K-means Algorithm, Mixtures of Gaussians and the EM Algorithm, Jensen's Inequality, The EM Algorithm, Summary

Transcripts

HTML
PDF

Lecture 13

Watch Online:	Download: Right Click, and Save As	Duration:
Watch Now	Download	1 hr 15 min
Topics: Mixture of Gaussian, Mixture of Naive Bayes - Text clustering (EM Application), Factor Analysis, Restrictions on a Covariance Matrix, The Factor Analysis Model, EM for Factor Analysis

Transcripts

HTML
PDF

Lecture 14

Watch Online:	Download: Right Click, and Save As	Duration:
Watch Now	Download	1 hr 21 min
Topics: The Factor Analysis Model,0 EM for Factor Analysis, Principal Component Analysis (PCA), PCA as a Dimensionality Reduction Algorithm, Applications of PCA, Face Recognition by Using PCA

Transcripts

HTML
PDF

Lecture 15

Watch Online:	Download: Right Click, and Save As	Duration:
Watch Now	Download	1 hr 17 min
Topics: Latent Semantic Indexing (LSI), Singular Value Decomposition (SVD) Implementation, Independent Component Analysis (ICA), The Application of ICA, Cumulative Distribution Function (CDF), ICA Algorithm, The Applications of ICA

Transcripts

HTML
PDF

Lecture 16

Watch Online:	Download: Right Click, and Save As	Duration:
Watch Now	Download	1 hr 13 min
Topics: Applications of Reinforcement Learning, Markov Decision Process (MDP), Defining Value & Policy Functions, Value Function, Optimal Value Function, Value Iteration, Policy Iteration

Transcripts

HTML
PDF

Lecture 17

Watch Online:	Download: Right Click, and Save As	Duration:
Watch Now	Download	1 hr 17 min
Topics: Generalization to Continuous States, Discretization & Curse of Dimensionality, Models/Simulators, Fitted Value Iteration, Finding Optimal Policy

Transcripts

HTML
PDF

Lecture 18

Watch Online:	Download: Right Click, and Save As	Duration:
Now Playing	Download	1 hr 17 min
Topics: State-action Rewards, Finite Horizon MDPs, The Concept of Dynamical Systems, Examples of Dynamical Models, Linear Quadratic Regulation (LQR), Linearizing a Non-Linear Model, Computing Rewards, Riccati Equation

Transcripts

HTML
PDF

Lecture 19

Watch Online:	Download: Right Click, and Save As	Duration:
Watch Now	Download	1 hr 16 min
Topics: Advice for Applying Machine Learning, Debugging Reinforcement Learning (RL) Algorithm, Linear Quadratic Regularization (LQR), Differential Dynamic Programming (DDP), Kalman Filter & Linear Quadratic Gaussian (LQG), Predict/update Steps of Kalman Filter, Linear Quadratic Gaussian (LQG)

Transcripts

HTML
PDF

Lecture 20

Watch Online:	Download: Right Click, and Save As	Duration:
Watch Now	Download	1 hr 17 min
Topics: Partially Observable MDPs (POMDPs), Policy Search, Reinforce Algorithm, Pegasus Algorithm, Pegasus Policy Search, Applications of Reinforcement Learning

Transcripts

HTML
PDF

Stanford University

CS229 - Machine Learning

Lecture 18 - State-action Rewards

Course Details

Course Description

Instructor

Handouts

Course Handouts

Lecture Handouts

Review Notes

Resources

Assignments

Course Sessions (20):

Lecture 1

Transcripts

Lecture 2

Transcripts

Lecture 3

Transcripts

Lecture 4

Transcripts

Lecture 5

Transcripts

Lecture 6

Transcripts

Lecture 7

Transcripts

Lecture 8

Transcripts

Lecture 9

Transcripts

Lecture 10

Transcripts

Lecture 11

Transcripts

Lecture 12

Transcripts

Lecture 13

Transcripts

Lecture 14

Transcripts

Lecture 15

Transcripts

Lecture 16

Transcripts

Lecture 17

Transcripts

Lecture 18

Transcripts

Lecture 19

Transcripts

Lecture 20

Transcripts

Stanford University