Instructor (Stephen Boyd):I guess weíre on. This is the last lecture Ė you can turn off all amplification in here. I donít need any amplification here. Thanks. Weíll finish up a couple of topics in reachability and Iíll just do a couple of topics in observability. Itís not a hugely important topic. And then I wanna reserve some time at the end of this class to just say a few things at the very highest levels about what the class is, how all this stuff fits together, and all this sort of stuff, so thatís what weíll do. So continuous time reachability we looked at last time, which in fact was I guess for the people here in real time was this morning, and we looked at whatís the reachability subspace for a continuous time system, so thatís the set of all points you can hit in T seconds starting from zero in a continuous time system. The answer turned out to be real simple in this case. Itís not interesting or subtle like in a continuous time system where at each step for a while anyway interesting things can happen. You can sort of hit a few points and some more, and all this kind of stuff. It just all happens at once. What happens is if T is positive, then the set of points you can hit is the range of controllability matrix period. It doesnít change with T if T is positive. What will change of course is the size of the input required to hit a target state if you do this real quickly, as opposed to if you do it on a long leisurely timeframe. So we also looked at this time. If you wanna look at the least norm input for reachability, youíre looking for the input that minimizes this integral of the square of the norm of U. And we worked through that by discretizing it, and the formula that came out was this, quite straightforward. It was this integral of a positive semi-definite matrix. The integral itself is positive definite. Thereís this inverse, and then thatís multiplied by this thing over here, which is B transpose time E to the T minus tau. Now this is actually Ė thatís the system running backwards in time. I think I commented on it this morning. And all the parts of this correspond perfectly to the discrete time case. Instead of having an exponential, you have a power. These are just time propagators for example. The other thing I should mention is that this whole thing is really something just like C transpose C C transpose inverse. The difference is C is now like this kind of big complex operator. Itís not a matrix as it was in the discrete time case. But this is the analog of CC transpose here Ė or sorry, CC transpose, and thatís C transpose here is what this is supposed to Ė this is the analog. What happens then is that you can work out a formula for the minimum energy to hit the state in the amount of time T. Itís a quadratic form, and itís a quadratic form that depends on Q of T inverse where Q is this integral, so itís an integral of E to the tau A BB transpose E to the tau A transpose, very familiar here because this is nothing more Ė this is sort of a Ė thatís the time propagator for what an action at now Ė what effect it has on the state tau seconds in the future. You integrate this action Ė thatís sort of like the same as summing in the discrete time case Ė and you invert. And this in fact Ė I guess you had one number problem on this Ė this is a Gram matrix in fact. This is an integral of Ė thatís a Gram matrix, and itís an integral of something times something transpose. Thatís a Gram matrix. Now we can conclude all sorts of interesting things here. For example, if T is bigger, you integrate, you add more positive semi-definite terms or whatever you wanna call an integrand to this thing, and Q gets bigger, and that means Q inverse gets smaller. So here again, you have the idea that as the time horizon gets longer, the minimum energy required to hit a target in longer time goes down. The difference now is that T can be arbitrarily short, and you may need a very large input. By the way, as T gets shorter and shorter, this generates inputs, which in fact converge into something like the impulsive inputs that we looked at earlier. So there are impulsive inputs that get you to transfer U to a state immediately. Of course, an impulsive input is extremely large. Here, this will actually construct the impulsive inputs. Now we could also look at what happens when T goes to infinity. This matrix always exists because this thing is monotone increasing in the matrix sense. Thatís non-decreasing, and therefore the inverse Ė this is a positive semi-definite matrix Ė sorry, a positive definite matrix, and itís decreasing as a function of T, so this goes down. Itís the quadratic form that tells you how much energy it requires to hit a point, any point in state space. This goes down. It has a limit therefore, and weíll call that limit P. Itís the same as before. And that gives you the minimum energy to reach a point arbitrarily leisurely. So thatís what that means. Itís the same story. If the matrix A is stable, then this integral converges here. Nothing goes to infinity here if A is stable. Why? Well, because if A is stable, E to the tau A, all the terms in it look like polynomials times decaying exponentials. So this term simply converges, and therefore P converges to the inverse of that, and is therefore positive definite. And that says basically if the system is stable, then the amount of energy required to hit a point arbitrarily leisurely is gonna always cost energy. So thereís no point you can get to for free for example. Thereís no point you can get to with arbitrarily low energy. It makes sense. If itís stable, that sort of means that left to its own dynamics, the system state will go back to zero. So it means that if you wanna get to some state, you have to fight against that, and thatís what tells you that P is gonna be positive definite. So thatís basically the idea here. We can look at the idea of general state transfer. Thatís transferring a state from an initial condition TI to some final state T desired. Itís all very similar. Itís the same as the discrete time case. What you do is you say this. You say that the final state is equal to this. Itís the initial state then Ė thatís the time propagator. That propagates you forward in time TF minus TI seconds here. Thatís this part. So this is where you would land if you were zero. If you did nothing, this is where you would land. This is the effect of the input over that interval like this. So the whole thing, you realize the following. This is equal to that. This is equal to X desired if and only if you put this on the other side, if U steers X of zero to X of TI minus TF to this thing here which is X desired minus this time which sort of compensates for the drift term. And in fact, all of this kinda makes sense. This says that if you wanted to steer something to a certain place, what youíd do is this. You first find out what would happen if you did nothing, and that would be over here. I subtract that from this, and then I act as if Iím going from zero to hit that point, and everything will work out. Thatís by linearity, so this all works out. Now for continuous time itís a lot easier because basically the system is either controllable or itís uncontrollable, and that means you donít get any of the subtleties. It means basically if itís controllable, you can get from anywhere to anywhere else in arbitrarily small time, possibly with a very large input, but nevertheless you can go from anywhere to anywhere else in any amount of time as long as itís positive. You can do it in zero time if youíre allowing yourself to use impulsive inputs, so thatís how that works. Okay. So letís just look at an example to see how this works. Hereís three masses stuck between two walls, and we have some springs, and we have some dampers, and we can apply two tensions here. I think weíve seen this example before. So okay, and your state is six-dimensional. Itís the positions and the displacements of all Ė sorry, itís the displacements and the velocities of the three masses, and the system looks like this. The blocks here make sense. This basically says that Ė this top three are the positions, and it says the position dot is equal to Ė you read this as zero I, so it says that the top three DDT are equal to the bottom three. Thatís by definition because the bottom three states are the derivatives. And then these two come from the stiffness and from the damping. And then these two tell you how the two tensions apply forces to the different masses here. Thatís what this is. So we wanna steer the state letís say from E1 Ė so E1 means the following. E1 means that this mass displaced one unit to the right, and the other two are at zero, and everyone is at rest. Thatís what E1 is. And we wanna take that to the zeroth state. By the way, if we do nothing, itís gonna go to the zeroth state asymptotically anyway, and I canít remember what the eigenvalues are, but we looked at this example before and I think the eigenvalues are on the order of one. Maybe thereís a slow one on the order of 0.2 or something like that. It doesnít matter. The point is that the whole thing will decay to zero at some rate. By the way, that builds the intuition here easily. It says that left to its own devices, this is stable. The state is going to go to zero anyway. Therefore, if I give you a long enough time period, long enough that just by through natural dynamics the state has decayed, itís gonna cost very little to drive it exactly to zero. By the way, with its own natural dynamics, the state is not gonna go to zero exactly. It will just get small. They get very small. But the point is once itís small, it takes a very little kick to actually move it directly to zero, and therefore we expect it to take very little energy. So what happens is the following. Hereís a plot of the energy required to steer it from one, displacement one, then zero zero, and zero initial velocity to zero in T seconds. Very interesting plot, it says basically for six seconds and more, the amount of energy required is extremely small. By the way, is it zero? It canít possibly be. If it were zero, it would mean youíd apply no input. If you apply no input, it never gets exactly to zero. But the point is itís extremely small. So this is explained by the fact that the natural dynamics has made it small anyway. Whatís interesting here is that this thing now ramps up very aggressively. By the way, does this thing go to infinity or stop like in the discrete time case? No, this keeps going. So you could do this 0.01 seconds. Thereís an input. The energy required is absolutely enormous. You can work out what it is, so you can do it arbitrarily fast. But you can see exactly whatís happening here is that you can move the whole thing to zero as quickly as you like. It just takes more and more energy, and this is quite obvious here. By the way, the practical use would be somewhere around here where you would be pulling this thing back to zero a lot faster than if you just waited for it settle out. Actually, I should mention something about this. There are lots of applications of this, tons. A lot of them have to do with things where you have a mechanical system where you move something from one place to another. I just thought of another one thatís not mechanical. You move from one place to another. When it arrives there, itís sort of wiggling a little bit, but you canít use it until it stabilizes. The first example is a disk drive head positioned. So you say please seek Track 255 or something. So this thing zooms over here, lands close, but when it ends there, itís sort of shaky. Itís gotta track the track very carefully. You canít use it when itís shaking. So in fact, what you do when you land there is you do something like this. You actually actively apply an input that will basically do active damping, that will remove all the wiggling in it and allow you to track faster. So things like this are actually used now. The other applications I know in X Y stages, so in lithography. These are extremely expensive machines for printing integrated circuits and things like that, and they move over a couple centimeters at a time, and they move, and then they align themselves to accuracies that are just unbelievable. Itís like at the nanometer level. It has to be if youíre looking at print technology, which is actually quite ridiculous to imagine something that moves two centimeters then Ė this state isnít even wiggling at the nanometer level, and stabilize itself absolutely to something at the nanometer level. So the way thatís done Ė oh, and by the way the reason Ė itís of course stable, so you can move over there and just wait for the whole thing to stop wiggling, and then expose your wafer. The problem is this machine costs some huge sum of money, so you can have the cost divided by the throughput. So if 90 percent of the time, youíre waiting for the thing to stop wiggling so that you can do your lithography, thatís not a good investment, so in fact they use active control to do this where you move it over two centimeters, and you definitely apply inputs that will do active damping, and will cause it to stop shaking early, so thatís one. And Iíll mention one more just for fun from MRI. So in MRI you do these things. I donít know any of the details. I donít know any of the physics, but you line up all the magnetic axes, and then you let go, and they slowly drift down like this, and you donít do another scan slice until theyíve totally relaxed again, or something like this. Thatís the method. You can actually use a method where you act Ė instead of waiting Ė I guess they have times like T1 and Ė does anyone here do MRI? Too bad, someone couldíve got me out Ė they couldíve told me what T1 and T2 Ė thereís like T1 and T2 are the two relaxation times or something like this. You wait one of these amounts of times or something like that to do it again. If you wanna do fast throughput MRI, you donít wait. You actively force the spins down to zero. How? Using these formulas here like this one. And you improve the throughput. And again, you have a big expensive machine whose throughput is now better because youíre actually not waiting for something to stabilize out. Youíre forcing it to stabilize it out fast, and therefore itís ready for reuse again. Sorry. That was just a weird aside. If you wanna look at what some of these inputs look like, we can do that. Pretty much almost for the dot that I drew which was in three seconds, this is the input that you would apply. These are the two forces. Itís symmetric, but then the whole problem is symmetric. These are the two tensions you would apply. I donít think theyíre obvious, so this is what it looks like for Ė this is the least norm input for bringing the system to zero in three seconds. If you relax that just to four seconds, it gets a lot smaller. Why? Because now the natural dynamics itself is helping you, so thatís four. And if you look at the output for U equals zero Ė this is the Ė I guess now we have our answer. This says if you just kind of let the thing go with no input, youíd see that itís pretty small in eight seconds. Of course, itís not zero, but itís already kinda Ė so when youíre taking it to zero in three seconds, you can see youíre actually doing something. Depending on the accuracy with which this thing has to be zero before it can be reused, you can see that youíre actually doing something, and youíre doing something that is substantially better than just letting this thing settle out by itself. So I think these are all kind of obvious here. Letís see if I can Ė yeah, these just show things like the output for three and four, and things like that, but theyíre kind of obvious. So I think that finishes up this topic of controllability. I wanna emphasize that except for the fact that some of it was continuous time, thereís absolutely nothing new in controllability. Itís just applied linear algebra. If you know what range is, if you know how to solve linear equations, if you know how to find the least norm solution of a set of linear equations, then all of this is just an application of that. Okay. Now weíll look at the last topic in the class. Like controllability, itís not important. I think youíve even done, or maybe Ė controllability stuff I think you were doing well before it had a name, I think. Didnít we have a problem on the midterm where you had to find an input that took you somewhere? There was one. Yeah, okay. So thatís my point. All we did now is just give a name to it. Thatís all. So the same is true of the observability and state estimation. These are just exercises. So what weíll do is weíll look at this idea of state estimation and discrete time observability and stuff like that, but letís get the setup here. So we consider a discrete time system Ė this is in the general case that the setup would be something like this. Youíd have X of T is A of X of T plus B U of T plus W of T, and W of T is called a state disturbance or noise. Another name for this is a process noise. Thatís another one you hear. And you have all of this in economics, too, and itís got a different name there, and I forget what itís called, but itís got Ė in things like chemical process control, this would be called the process noise. Itís got all sorts Ė Iím trying to remember what it is in economics, in econometrics. I canít remember, but maybe Iíll remember during Ė it doesnít matter. Itís got some other name, but itís identical to this. And here also you have a V of T, which is a noise on your measurement here. So thatís called sensor noise or error. Itís also called measurement noise. Itís got all sorts of names. The assumption here is that A, B, C, and D are known, but what you want to do is youíre gonna observe input and output over some time interval, and you want to estimate the state, and there are lots of problems about estimating the state. You might wanna estimate the current state, the initial state. You might wanna estimate the next state Ė in finance, that would be of tremendous interest, right? To estimate what the returns for tomorrow would be. That would be of great interest. So there are lots of problems you wanna do, and this is gonna be based on U and Y. And this is basically gonna be a particular application of just this, and Iím gonna do some horrible notation. What Iím writing down now is notation from Week 5 or something like that. Itís gonna be just a big problem of this form, which basically is you have some parameters you wanna estimate. You call it X. You have a linear mapping that maps X into some measurements, plus a corrupting noise, and then you ask a question like based on Y, what can you say about X. Okay? Well of course, thereís the Ė we can do the platonic case. Weíll get that. Thatís easy. The answer is it depends on the null space of A or something like that, but weíll get to that later. Then once you introduce the noise, it gets more interesting, and now you can say interesting things about how to do this. For example, you might use least squares here, and weíll see thatís whatís gonna be the same here, but itís gonna be more complicated, but you could map all of this big complicated thing into something that looks like that. Itís bad notation because why? Weíre already using here Ė X is here and A is different from what it is or something like that. So letís see how that works. So the state estimation problem is this. You want to estimate the state at some time S from the inputs up to time T minus one from the outputs up to time T minus one. And if S is zero, youíre estimating the initial state. Thatís initial state estimation. If youíre doing S equals T minus one, thatís current state estimation. If youíre doing S equals T, thatís called the one step ahead predictor is what youíre doing because youíre trying to guess what itís gonna be on the next step. And thereís all sorts of other variations on it. If S is something like T minus five, some people call that a smoother or something like that because youíre sort of using future observations to go back and make your most intelligent guess about what your state was five samples ago. If you were actually trying Ė if S is T plus five, youíre doing a five state predictor, a five epic predictor. Youíre trying to predict what will the state be in five samples, five epics in the future. By the way, in a lot of these problems, despite using all sorts of fancy methods needless to say, your answer could be useless. That kind of goes without saying. If I ask you Ė if you canít predict tomorrowís returns, you certainly canít predict next summerís returns or something like that. Thatís silly. And thatís fine. Of course, that comes up here. We donít dwell on it really there, but itís a fact of course. If I donít give you good enough measurements with which to estimate the parameter, then of course you can be as sophisticated as you like, and you wonít predict anything thatís worthwhile at all, and the same is true here. Maybe here itís kinda more dramatic. So if you have something that estimates a state, itís sometimes called an observer or a state estimator, and itís got other names in other fields, and itís used in lots of other fields. Now a generic notation which is pretty widely used is something like this. You would call X hat of S given T minus one is an estimate. Itís denotes an estimate of X of S given input and output information all the way up to T minus one. Now the notation of course is meant to suggest something like a conditional expectation. This is again if you have that background, and there are in fact cases where X hat of S given T minus one is nothing more than the conditional expectations of X and S given input and output information up to T minus one. But here Iím just using it to say it is an estimate of X of S based on that information. It doesnít have to be a statistical method you use. Weíll show you how to use least squares or something. Thatís the idea. Now by the way, I claimed that you can just do this right now because all you would do is put it in this form. I mean you would have to stuff the right matrix A and it would be a pain, and thereíd be some debugging youíd have to get right, but I promise you all of those state estimation [inaudible], they look just like this. You have to figure out what X is. You would have to stuff A in the right way, and all that kind of stuff, and then youíd use least squares, and that would be that. Actually, that wouldnít be a bad Ė youíd actually make estimators that would be pretty good. Weíll go through the more standard discussion of all this. Letís start with a noiseless case. Actually, thatís probably a good idea to start for anything because if you canít do it in the noiseless case, then you shouldnít be trying to do it when thereís noise, so weíll start with this. Letís just find the initial state with no state or measurement noise. Then you have X T plus one is A X of T plus B U of T, Y of T is C X of T plus D U of T, and what you can do is I can write down the output as Ė first of all, I can write it down as a function of the initial state. Thatís what we wanna estimate. Thatís a matrix OT. OT is this matrix here because the output Y of zero is C X of zero. Y of one is C X of one, and a portion of that due to X of zero is C A X of zero. Thatís this, and so on. Notice this thing is OT. O is for observer as opposed to C, a script C which is for controller. And it looks very similar to your friend B, A B, A squared B, and so on, except this one is stacked vertically. The C is on the other side, and itís C Ė and in fact, the components should be totally obvious here. Like for example, C A to the sixth is a block row in this, and it basically says that Ė itís easy to describe exactly what C A to the sixth is. Itís this. It takes the initial state. It propagates it forward six samples, and then it takes it at C, and then operates to map a state to an output measurement. So C A to the sixth is the matrix which maps state now to what the output will see in six samples. I think I got that right. Thatís what C A to the sixth Ė so thatís what all these are. Itís a time propagator followed by a measurement matrix, nothing else. Now T Ė it may have been on like your first homework or stupid like that. T is basically Ė this describes the mapping of how the input maps to the output. I think it was Ė was that on the first homework or something? I think so. Something like this was anyway. Thatís easy to work out. Again, you just have to stuff the matrices in the right way. T is for Toeplitz because thatís a Toeplitz matrix, and a block [inaudible] and it looks like this. This is what tells you how Y of zero depends on U of zero, and the fact that this is lower block triangular tells you Ė it means something. It means causality because this says that basically Y of zero Ė so I put U of zero down to U of T minus one. By the way, note that Iíve decided now to go back to writing Us in order, not in reverse time like we did before. I can write them any way I like. I just have to be clear about it. So if I do this, then this first row says that Y of zero is D U of zero and has nothing whatsoever to do with U of two, U of one, up to U of T minus one. That makes perfect sense because the output at time zero has nothing to do with what you put into the system at one, two, three and so on. Itís causal, and thatís what this is. And all of these things should make perfect sense. B C A to the T minus two B, thatís B takes you from an input, has an immediate effect. A to the T minus two propagates you forward in time. And C maps the effect on state to the effect on an output, so this makes perfect sense here.
Student:[Inaudible] dynamics [inaudible]?
Instructor (Stephen Boyd):No, itís actually causal because we fix the initial state.
Student:So you estimate this only if they are looking at the initial state?
Instructor (Stephen Boyd):Thatís right. No, sorry. No, itís not. Itís as if the initial state Ė the term for the initial state is here, so Iíve separated out Ė well, no. I guess actually I do accept what you said because weíre looking at the initial state, therefore the initial state appeared here. If we were looking at the final state Ė
Student:Then weíd be on [inaudible].
Instructor (Stephen Boyd):Youíve got it. Exactly. But in any case, what I can say is this. This is just an exercise in matrix stuffing. You just need to put it in a form that looks like this. A measure that you know is equal to Ė in this case, it would go something like this. An output which you know is equal to a matrix times the state youíre looking for. I wrote it down for X equals zero Ė for T equals zero, but it couldíve been T equals five. It doesnít matter. It couldíve been T plus five, meaning look five states in the future, so that would be that matrix is known times the thing you wanna get. And then plus another term which is something you know. So this thing here has a very strong meaning. This term here is very interesting. This is the effect on the output due to the input. Now you know the input, and you know that, so in fact you know this term. Itís actually your predicted output is what it really is. Itís what youíd predict the output was if the initial state was zero. So thatís what this is. And actually, thatíll just go away in fact, that term. So you can rewrite these equations this way. This way you write OT times X of zero Ė thatís what we want Ė is equal to something we know Ė the output, we measured it Ė minus Ė and then this is a beautiful thing. Thatís something we know times something we know, so this is whole right-hand side is something we know. But the interpretation of the right-hand side is quite nice. This is what we directly measured. We also know this. Possibly we measured it. When we measure this and propagate it through the matrix, presumably that exists as 15 lines of C somewhere, so this actually goes down Ė this goes down in a computer. And this basically is your estimate of what you would see if the state were zero. You subtract that from what you did see, and this is Ė now it may exist only in some processor. It may not have to exist in practice as an analog signal. It may exist only in a processor. This signal over here on the right is beautiful. Itís the output kind of cleaned up from what you Ė from the effects of the input or something like that. Let me try that again. Itís the output Ė Iíve got it. Iím gonna try it again. Itís the output with the effects of the input removed, but it was removed synthetically. It wasnít done in analog. It was done in a processor. Thatís what this is. Then this says okay whatís left is only the component of the output which is due to the initial state, although this would be X of S Ė itíd be some other state of it, if youíre estimating something else. And now youíre back up to Week 3, which is to say you have a skinny Ė you have A X minus B with A skinny. Presumably you have more measurements than you have unknowns, and thatís it. And you can do all sorts of things with it. Thereís no noise at all if you can just solve the equation with your favorite left inverse. If thereís noise, you might wanna do least squares. That might be the right thing to do in some cases, but you wouldnít do it always, or you might not do it always. Okay, so with no noise we can answer everything. It goes like this. It says we can determine the initial state of the system only if the null space of this observability matrix is zero. That says the observability matrix, the columns are independent. Thatís what it says. By the way, note the nice kind of duality to what we looked at early with controllability. So in controllability you have B, A B, A squared B, and so on, and everything came down to Ė this idea of controllability came down to Ė in that one it came down to something very simple. It was that this fat matrix should be full rank Ė should be on to. In this case, itís a skinny matrix and it should be full rank. The meaning in the first case is thereís an input which will take you from anywhere to anywhere else, at least in N steps. In this case, it says something like this. If I observe N or more samples of the input and output, I can predict the state exactly if thereís no noise. Thatís what this says. So thatís the analog of what it means to be full rank. In this case, you call the thing observable. Notice that the input here has no effect whatsoever on estimating the state. Now there actually are cases when this is not quite right because basically it can just be subtracted off immediately. This is very Ė now people would say that doesnít make any sense. I would rather get it Ė itís better if youíre not doing anything Ė someone else would say no, you gather better data. You can estimate the state better if youíre wiggling the control stick or something. No, it turns out these are all just wrong. It makes no difference because you can subtract it off. There is one issue. If it turns out that your model is not quite what the thing is, then here the true T is not quite what you think it is, and when you do the subtracting you might get left with some residuals or something. Thatís a secondary issue. In this case, it doesnít affect it at all. Okay. So again thereís a dual to all of this, and I think Iíll speed up because itís actually kind of boring, and thereís some things I wanna mention at the Ė thereís some stuff I wanna cover at the end. So by the Cayley-Hamilton theorem, you know when you go C, CA, C Ė when you get up to CA to the N, youíve just added a bunch more rows which are linear combinations of rows above. So it says basically if you can figure out Ė if you can work out what the state is from measurements from U and Y at all, you can do it in N steps. As in the control case, it doesnít mean itís a good idea to do it. You may want to take much more data and then average Ė I guess so far we have no noise, so that comment doesnít make any sense, but when you add noise it will make sense. So I think weíll go over Ė I think what Iíll do is just go forward to what would be obvious here. Iíve lost that [inaudible] least squares observers. Thank you. There we go. I got it. So the most obvious thing to do in the case when thereís actual measurement noise is to set it up exactly a least squares problem, and youíd have this. Youíd estimate your state to be the observer matrix pseudo-inverse here, which in this case if itís actually an observable system is O Ė I guess itís skinny, so itís O transpose then quantity O transpose inverse times this thing like that times this thing. And this thing has got all sorts of names. I donít know. Letís see. This is something like itís your guess of the output due to the initial state. Thatís what this signal in here is because this is your guess of the output that would be due to the input that you actually measured if there were no initial state. Then this is what you really did measure. The difference is the output thatís due to the non-zero initial state. So thatís how you get all of this. I can mention something about this, and I think Iíll even Ė but I wonít go on and do the rest. I just say this alone here, if you wrote this in recursive form, you would actually have a very rudimentary form of something called a Kalman filter which you may or may not have heard of. But this thing which is Ė this is just an exercise from like Week 4 of the class. Itís very simple to state this. You get things that will work Ė these will work unbelievably well. I mean this beats anything, any kind of intuitive scheme you could make up if you wanted to track a vehicle or something like that. Just something this simple which would end up being just a handful of lines would actually make estimates of states and positions that would beat any intuitive method you could possibly imagine, but thatís kinda the theme of the class. So actually, I think Iím gonna quit there because I claim all of this is just sort of variations on applications. We could look at one application but I donít know. It was gonna be Ė itís an application with our friend the range measurements. Maybe Iíll just do that very quickly and weíll look at that. So hereís the example. Itís just to give you a rough flavor of how these things work. By the way, there are entire classes you can take on state estimation, so you can take one in finance, or you can take one in aero-astro, where itís used to track vehicles, or itís used for navigation. So you can take entire classes, probably three or four in wildly different fields, and itís just on this material. Some of them are kind of cool. Actually, a lot of them are kind of cool. So weíll just do a very simple baby example. Itís a particle thatís moving in R2 with uniform velocity. I have linear noisy range measurements from various directions like that. You can see theyíre all kinda concentrated from here. By the way, if I have range sensors from here, thatís not great. If someone gave you a bunch of range sensors and then arranged them to make your estimation job easy, youíd spread them around. In fact, you might even do something like put Ė youíd probably spread them around evenly, maybe even at four cross points like this because then youíre sort of measuring different things. A bad thing to do would be to put all the range sensors in the same place kinda like this. Right? Because if you put all the range measurements in the same place, are these getting making different measurements? Oh yes, they are, but theyíre not that different, so youíre sort of Ė you can imagine now that your estimation might not be that good. So thatís it. And weíll make the range noises have an error on the order of one. And weíll assume that the RMS value of the velocity is not much more than two. So actually, we can write this out as a linear system. To describe a particle with uniform velocity is just basically this. It says X T plus one is gonna Ė the bottom block says that Ė the bottom block of X T which is the velocity is equal to the identity times the velocity. It means it doesnít change. So the bottom two entries of X of T are constant, and the top two change Ė actually, theyíre [inaudible]. Then your measurement matrix is gonna look like that. So thatís what you have. The true initial position and velocities are these. This makes no difference, but then here would be an example of a track here. Whatís shown here are that line shows you the true to the particles moving at a constant speed. That would be the range measurement if the range measurements were perfect, but you can see we put a lot of noise on the range measurements, so basically any single range measurement would be Ė a snapshot of range measurement would be completely useless. Youíd make an outrageously bad estimate of the position and velocity, obviously from this. What you can do now is actually just carry out this least squares observer fitting, and we can actually work out the actual RMS position error, and you get something like this. Itís hardly surprising what you see, but the velocity error actually goes to zero, and the position error actually will Ė it wonít go to zero, but it will go to a small steady state number. So thatís what it is. Thatís how these things work. And this is just a taste of it. Technically, you could write all the code to make this plot. You couldíve worked everything out and so on because itís just an application. So that officially finishes the material for the class. The truth is that the last two topics we looked at were just sort of applications, so a little bit more than an application, but they were not real material. The singular value decomposition was the last real material. What I wanna do now though is actually Ė I wanted to have a bunch of comments about the whole like how does it all fit together and things like that. So thatís what I wanted to look at next. The first thing is I wanted to say a few things about linear algebra, not that I havenít been talking about linear algebra for a quarter, but I thought Iíd just say a few things about it. So just to organize my thoughts, so it doesnít come out totally randomly, I wanna zoom out a little bit here. The first thing that should be kind of obvious Ė if you havenít seen it, you will see it or something like that, but the point is it comes up in a lot of practical contexts. It comes up in EE and mechanical engineering, civil engineering, aero-astro, operations research, economics, machine learning, vision Ė it goes on and on. Whatís interesting is that actually that wasnít always the case. So signal processing 25 years ago basically went on and on with how you process a scalar signal for example. You would barely Ė you could go grab an issue of [inaudible] signal processing. You wonít find a matrix in it. Now totally the opposite. Everything involves ideas like Ė itís extremely rare to find something that doesnít. The same is true in economics, and in finance, and in things like that. In aero-astro and control 25 years ago, the idea of sort of a multivariable system with multiple Ė that was this exotic thing that only PhD level students would study. Thatís completely wrong. All real systems are designed this way, no exceptions. So thatís just the way things work now. You donít design a control system for a modern airplane or something like that, or a process or any Ė you donít even make an ABS system or anything like that Ė you donít do it without matrices. And this is a change. This was simply not true 20 years ago. 20 years ago, there were people who talked about matrices, and they were made fun of in fact by the people who were designing systems that worked anyway. I wasnít one of those people. So the point thatís Ė it should also be kind of obvious is that nowadays linear algebra you can actually do it. If youíre fiddling around, you can play with MATLAB. MATLAB is a toy and itís not Ė I have arguments with people about this, but in fact no real implementation uses MATLAB, although there are people who say thatís not true. It is a toy in my opinion, and real implementations are not done this way, and thatís often true. Now 20 years ago a class like this, there would be zero competition component, none, absolutely zero. By the way, you can imagine how boring it would be. I just Ė we wonít go there. It was kinda boring actually, to tell you the truth. Luckily for you, you were born later than this point. Now of course, you have MATLAB. Youíve been playing around with that. By the way, there are other variations on MATLAB you can use. Thereís Octave and things like that. And the truth is that these days if you know what youíre doing with some object-oriented system, you can make it look like MATLAB except that it will actually be a real language. So Python could easily be made to look like MATLAB for example. So you could write things like Y equals A X times Ė you could write Y equals A X plus V and all the right things would happen. Thereís real codes. LAPACK is something. You shouldíve just heard that name. I donít want you to leave this class without hearing that name. Even though we talked nothing about how to actually carry out these confrontations, itís important that you know that things by the way that MATLAB does, which does nothing but parse what you type in, the actual numerics is done by open source public domain software written by professors and graduate students. Itís LAPACK. Then when you get into large systems, thereís other schemes like that as well, but this is a good thing to know about. And just to give you a rough idea, Iím sure you kind of know this, but maybe no one made a big deal about it. If you have letís say 1000 variables, and you wanna solve Ė I donít know. Letís say Ė letís do 1000 variables Ė letís do a least squares problem that looks like this, 1000, and letís make this like 2000. It can be anything you like. It can be I have 2000 measurements to estimate 1000 parameters. By the way, this is obviously not a small problem. Thatís two million real numbers to describe this matrix. Everybody see what Iím talking about here? This is serious business. You could make a big story about how sophisticated this is. Youíd be taking in 2000 measurements and doing the optimal Ė you could make a big long song and dance about how youíre doing Ė Iím blending 2000 measurements to get 1000 estimates of blah blah blah. How fast do you think Ė how long does it take to do that? You might have some rough idea. Does anyone know? You type A backslash B in MATLAB for this.
Instructor (Stephen Boyd):What? Youíre right. About a second. So the answerís about a second. Might be two seconds, I donít know, something like that. What? 2000 by 1000. Do it twice though so it gets loaded into Ė okay, sorry. But you know, itís on the order of a second, two, something like that.
By the way, this is just stunning, the fact that you can do this. This is something unheard of. This is what happens when you ignore Mooreís Law for letís say 20 years or something, and then come back and check again how fast it is. This is the kind of thing that happens. Now in fact Ė and this is a beautiful topic. Itís something actually some of you might wanna look at. By the way, I donít Ė I think things like this have not really propagated too far, so thereís a lot of times when you might wanna solve Ė do least squares with 2000 measurements to estimate 1000 things. Thatís a big thing. Thatís like a big system. The fact that you can do this in seconds with total reliability I think actually has not diffused into general knowledge. Like I had a professor in my office, and I asked him to estimate how long this would take, and he said half an hour. I wonít reveal his name.
Instructor (Stephen Boyd):How many? Four. On your laptop.
Student:No, on [inaudible].
Instructor (Stephen Boyd):Okay. Four seconds. There we go. So itís four. Thatís because itís too warm. Okay, fine. Four seconds gives you a rough idea.
Actually, an interesting thing that we havenít talked about at all Ė as it turns out, if you wanna do things like least squares, solve linear equations when Ė you can do this for much bigger, way bigger matrices provided they have special structure. Now, special structure Ė the typical one would be sparsity. Sparsity means a lot of the entries are zero, in which case thereís methods that are just amazing and extend to things like 10,000, 40,000, a million depending on your luck. Thereís other structures like for example if itís banded or tri-diagonal and all this kind of stuff. And thereís just a whole world of knowing the structures and how they imply fast methods. And this would take you to things like how people solve PDEs and all sorts of other stuff. Thatís worth knowing. Actually, I encourage you in the future to learn about these things. So please donít live your life just in MATLAB. Thereís something Ė even if you do, please look at Ė know whatís under the hood. And actually, know then what would happen if Ė even if you have no intention of ever doing it Ė of actually figuring out what a real implementation would do as well. Thatís a good thing. I wonít say much more about that. Thereís whole classes you could take on this at Stanford, entire ones, literally two on numerical linear algebra that I know of. Actually, I know of a couple others, so thereís maybe two or three. You can even take one in geophysics and all sorts of other Ė Now I wanna say something about linear algebra and sort of my view of how it all fits together. I think thereís really like several levels of understanding. You could add a fourth one, but the most basic is this. The most basic level of understanding is kind of at the high school level, and it goes like this. What you should know is this. If you have 17 variables and 17 equations, thereís usually a solution, and thatís when you were tortured by solving whatever three equations and three unknowns for no particular reason whenever you were tortured doing this. And then maybe sometimes 17 equations and 17 variables, there wasnít a solution, or there was more than one, and somebody waved their hand or something like that. And the same high school view would say something like this. Itís actually just degree of freedom accounting is what it is. So for example, if you have 80 variables and 60 equations, you just do some accounting and you say, ďWell, Iíve got 20 extra degrees of freedom.Ē You could use those degrees of freedom for anything youíd like, but thereís basically a 20 Ė this is roughly Ė a 20-dimensional subspace of freedom. You could use it to minimize the two norms or the Euclidian norm, which is what weíve been doing, and you get a least norm solution. You could use it to minimize something else. You could use it to do anything, other stuff. Thereís just degrees of freedom you can absorb. Of course, if thatís an estimation problem, this is probably not good news because it basically means that the only thing you can do is Ė you donít have enough measurements to estimate what you want. Now so itís very important to always start I think when you look at a problem from this point of view, just do the basic accounting. What are we talking here? Do I have more measurements than unknowns? Less? How many degrees of freedom do I have less than all that? Now of course, what you know at the next level, you learn things about singularity, rank, range, null space and so on. And what that would be for example would be here. If someone asks you, youíd say, ďLook, I have 17 equations and 17 Ė but thereís no solution.Ē Why is that? And you go, ďAh well, you see that 17 by 17 matrix was singular.Ē And I go, ďWhat does that mean?Ē And you say, ďWell, what it means is this. Thereís two ways to view it. The first is you thought you had 17 different equations. Actually, you donít. You have 16 because equation No. 13 is derivable from the others.Ē Everybody know what Iím talking about here? So in fact, Iím talking about the left Ė Iím talking about the rows being dependent. They could say something like this. Theyíd say you thought you had 17 variables, 17 knobs to wiggle, or whatever you wanna call it. Thatís what you thought you had. Actually, you had 16. You know why? Because the action of this fourth variable can be exactly replicated by an action of these other variables. Okay? So you thought you had 17 knobs, 17 design variables, 17 inputs, whatever you wanna call it to mess with, but you didnít. And that was hidden in this matrix, which was 17 by 17 so it wasnít obvious to see by the eye. Okay? So thatís the next step. Now, the platonic view Ė this is really basically called math. Itís very important because Ė and it basically says something like this. It says that this matrix has Rank 5. It says this is full rank, so it really certifies that there really are 20 extra degrees of freedom here. Thatís very important to understand. Now Iíll tell you the good thing about it. Now the good thing about this is everything is precise and unambiguous. Thereís no such thing as Ė thereís no waffling, no nothing, and you donít Ė thereís no nonsense. Itís either singular or itís not, period. Its rank is 17, or itís 15, but itís not like anything in between, and it doesnít depend on whoís looking at it, or anything like that. Things are true or false. Whatís nice about this at least Ė itís nice because itís very clear and unambiguous. Unfortunately, it can be misleading in practice, and weíve already seen at least one example of that. For example, if I have a 17 by 17 matrix and itís singular, if I pick literally any random entry and perturb it by an arbitrarily small amount, the matrix becomes nonsingular, and I say done. Itís all done. Now you can invert it. And then youíd better leave quickly before they actually Ė but theoretically they can now invert it, and thereís no problem. In this case, it would be misleading to say for example to make a statement, which you could on the basis of just a platonic mathematical view Ė you could make the statement that nonsingularity is not problem in practice. Then theyíd say, ďWhy would you say that?Ē Youíd say, ďNo problem. Any matrix thatís singular, if I perturb its entries by random numbers so small that they couldnít possibly affect that practical application, I guarantee you with probability one that the resulting matrix is nonsingular. Done. End of story.Ē At this level, you canít argue with it. Itís correct. Thatís interesting. Thatís a mathematical argument for why you donít need to know the math. Did you notice that? Thatís what it was. Now at the next level Ė this is Level 3. These are sort of levels of Ė the next level we have the quantitative Ė now the quantitative level, itís based not on qualitative things like Rank 3. Itís based on ideas like least squares and SVD. So there youíd say things like Ė someone would say, ďWhatís the rank of that?Ē And youíd go, ďAh, itís around six.Ē And theyíd say, ďWhat are you talking about? How can the rank be about six?Ē And youíd go, ďIt depends on you know how big your Ė whatís your noise level, and what do you consider small and big.Ē So itís a bit wishy-washy, but itís actually more useful in practice. As long as you donít think when youíre talking about this that youíre actually doing math, which youíre not at that point, so you have to keep them very separate. So this is very useful, and in this case no one would be fooled by someone coming along and offering to sprinkle some perturbation dust on your matrix to make it nonsingular. No one would be fooled by it because theyíd say, ďYeah. Go ahead. Sprinkle all the dust you like.Ē You sprinkle all the dust on it, and all that happens is one of the singular values, the zero now becomes ten to the minus ten, and you say, ďAre you done?Ē And the person says, ďYes. Itís nonsingular. Iím leaving now.Ē And you say, ďWell, it may be nonsingular, but the minimum singular value is ten to the minus ten. I canít do anything with it that I would want to do with a nonsingular matrix because for example the inverse is gigantic.Ē The condition number is still ten to the ten or more. Now the interpretation in this analysis, it really depends on the practical context, and these ideas are extremely useful in practice. So I think these ideas by themselves I think are Ė if you sort of have arrested development here, itís maybe not so good, arrested mathematical development Iím talking about. Actually curiously, arrested development here is probably okay in my opinion. It doesnít lead you to Ė because you know you donít know a lot of things, and itís no problem because youíre not expecting to work, and you type A backslash B, and it says singular [inaudible], you go, ďOh well. Iíll try something else. This is not a big deal.Ē Here you can actually be a little bit dangerous because you can walk around and talk about things like controllability and things like this, and singularity, and rank, and so on. This is actually quite Ė this stuff is actually very useful in practice. And I wanna talk about one more level above this, and that would be the algorithmic and computational sophistication. So that would be Ė another level beyond this would be to actually know how fast and how well you could actually carry out those calculations in practice. And that would be for example what it would take for example if you were going to letís say start Google. Letís just say because thatís nothing but a singular value decomposition, but itís one based on an eigen in a billion by billion matrix. And I can tell you right now you canít type a billion by billion matrix into MATLAB and ask it Ė and type SVD of it. Okay? So you have that next level where you know how fast you can do things when you do them in real time. You can know things like thatís a big matrix, but you know what? Itís very sparse. I bet we can do this. We didnít talk about any of these things, but you can do things like calculate the Ė let me just give one example of that, and then Iíll move on to the next topic. Hereís the last example. You remember early on, there was this idea of representing a huge matrix as an outer product. If you did this, for example you had a method to speed up simulation of it by some huge amount if youíve succeeded Ė and later, early on you learned that the smallest factorization you could do depends on the rank. It is the rank. And you learn a method for doing it, QR. Fine. Later in the class, when we studied SVD, you learned a method for getting an approximation thatís low rank. Then you can take a matrix thatís absolutely full rank, giant, and you could say, ďThis three rank approximation is good enough,Ē and you win big because you can now multiply it so fast, itís amazing. You can do compression. You can do all sorts of crazy stuff now. Now that works if A is up to say letís say 1000 by 2000 or something like that, maybe 2000 by 2000, maybe a bit more. But the real interesting one is when A is a million by a million. You canít store a million by a million matrix, let alone compute its SVD using the LAPACK routines obviously. And yet thatís exactly the case where a low rank approximation would be like super cool because then you can say, ďHey, you know your million by million matrixĒ Ė people would laugh at you. They would say, ďItís not a million Ė you canít even store a million by million matrix, so donít tell me about it.Ē And you go, ďYeah, but what Iím telling you is I have a Rank 3 approximation of it.Ē So there you canít just form the matrix and call SVD. Thatís ridiculous. There are methods that do the following, that will actually calculate the Rank 3 Ė that will calculate the SVD sequentially, so theyíll Ė and theyíre extremely efficient and fast. The only thing they need of A is you have to be able to multiply A, and you have to multiply by A transpose. You have to provide routines for doing that. So if A for example is the forward mapping in for example MRI, or PET, or anything like that where you have a fast method for example based on Fourier transforms, you can actually do low rank approximation. I can tell no one has any idea what Iím talking about. Thatís fine. All Iím saying is thereís another level of sophistication where all of the ideas here you can use on very large systems. So let me go on and just answer one more question, or address one more issue. Itís like whatís next. So thereís lots and lots of classes Ė now of course, after you take the final, you may have a different view of all these things. Oh, I forgot to tell you. Since we give our final, or rather Ė I do think thereís no way we can construct our final to be within the Ė itís a blatant violation of the registrarís rules, by the way Ė so at first I thought we could just call it Homework 10, but it counts for a lot, but it turns out youíre not allowed to assign homework nonetheless Ė Anyway, so I should mention when youíre doing the final, if you wanna take a break, you can go fill out the teaching evaluations which is fine with me. Thatís why we have tenure, so you can do anything you like, but if you get really pissed off, you can always do that just to unwind for a little bit. And thatís actually I think a Ė itís possible because of our flagrant and open violation Ė and now on video violation of the registrarís Ė sorry. Weíll go back to that. Yes. So thereís a sequel to this class, but it wonít be taught until next year. Now I am teaching this class, and this class 364A and B winter and spring, so I should say a little bit Ė I mean thereís zillions of other classes you could take on all the topics Iíve been talking about, but let me say a little bit about what this class is about. Thereís a simple way to do it. This class, a lot of the techniques here relied on sort of least squares Ė least squares, least norm, and some variations on that, and so there were analytical formulas like A transpose Ė and so I can tell you what this class is. This class basically says what if you Ė it extends least squares to other things like the maximum value or the sum of the absolute values. Also, in 263 we donít really have any way to handle things like to say that a vector is positive. You see what Iím talking about? Like if we do Y equals A X plus V, and X is a power or something like that here, if we do least squares and then X would be a vector of powers, Y would be some observed temperatures or something like that, thatís measurement error. We have no way of telling least squares that X is positive. We would just calculate a least squares solution, and if weíre really lucky, the estimated X will come out to be positive. If not, itíll be negative and weíll have a bit of a problem. So it turns out Ė by the way, people have dealt with this for years using all sorts of ad hoc methods, in fact using all the ideas you know about like regularization and things like that, and theyíre completely ad hoc. It turns out you can handle those flawlessly. You lose your analytical formulas, but you can calculate just as fast as you could before with least squares. Itís even more so when you do things like when you design inputs, which weíve been doing. We just did least squares inputs. The least squares input that takes a system from here to a zero state, or from one state to another, you could do things like add conditions like this, which is actually kinda cool. And these are quite real now. You could add a condition like that. Thatís a real condition. You will not find if you go up to an MRI machine a note that says under no circumstances shall the sum of the squares of the RF pulse be more than this. It will not say that. Or when you buy a motor for a disk drive or something like that, thereíll be current limits, thereíll be things like that, and theyíre gonna look like this. It turns out you can solve those problems exactly, and thatís what 364 is. When you go to 364, there actually is a lot more fun problems you can solve. You can actually really do all sorts of things like finance, and machine learning, and stuff like that. So thatís what this class is about. Iíd be happy to answer any questions if anybody has any. Actually, itís a bit embarrassing. I believe next week Iím gonna have to give the first lecture for it because Iíll be in India when the actual class starts, so Iíve never done that. Iíve never taped ahead the first lecture before. Iím deeply embarrassed, but it was gonna happen sometime, so it will happen. All right. So with that, Iíll say the class is over. I have to thank Jacob, Yung, and Thanos for an enormous amount of help putting it all together and all that kind of stuff. And unless thereís questions, weíll just quit here. I mean Iíll be happy to answer questions about these or other classes and stuff like that, but otherwise thanks for slogging through it. Oh, and enjoy the final. I forgot to say Iíll be in Austin while youíre taking most of it, but then Iíll be back Saturday early morning, and Iíll be in touch. Iíll be around Saturday. Good.
[End of Audio]
Duration: 69 minutes