Instructor (Stephen Boyd):Some announcements. Actually, you can turn off all amplification in here. Thanks. And you can down to the pad. Iíll make a couple of announcements. The first is todayís lecture, weíre gonna finish just a few minutes early because I have to dash over to give a talk at 11:00 a.m. in CISX. In fact, youíre all welcome if you wanna come. I donít know why you would, but itís a talk on circuit design and optimization. Thatís in CISX auditorium. So remind me, actually, if we get Ė if Iím not out the door and walking towards CISX at 10:45 a.m., you can do something like wave your hands or anyway, that kind of thing.
As a result of that, Iíll be canceling my office hours today. A little bit late notice, but I will, and Iím moving them to Thursday. That doesnít really work really well with like, for example, Homework 7, but Iíll also be around sort of on and off in the afternoon today, so if there is something you really thought you needed to get me today for, youíd be able to find me some time in the afternoon probably. Or you can send an email.
One other announcement is I got one inquiry. I think it had to do with someone planning to go away or something like that over the next week, and they asked could I possibly be so cruel as to assign Homework 8 this Thursday. What do you think the answer is to that? Iím just sort of curious.
Anyway, I donít even have to answer that. Of course weíre gonna have a Homework 8. That was never in question, so we will indeed assign a Homework 8 on Thursday and itíll be due like maybe the Tuesday after the Thanksgiving week, something like that.
I saw someone Ė are you okay?
Instructor (Stephen Boyd):Youíll live? Okay. Her head kind of listlessly fell backwards. Thatís okay. All right. Okay, any other questions about this? No? I guess the people coming in now will be shocked when I get up and leave at 10:40 a.m. Okay, any questions about last timeís material? Otherwise, weíll finish and then, actually, later today weíll get on to the essentially final topic in the class. Weíre gonna spend a good deal of time on it, but it will in fact be the last actual topic, so.
All right. Weíre studying linear dynamical systems with inputs and outputs, so systems of the form x dot is Ax + Bu and y = Cx + Du, like this. The transform matrix, if you plug in s = 0, is, of course, you just plug in C(si) Ė a inverse b, and you plug in s = 0 and you get minus C(A inverse) B + D. Itís a famous matrix. Itís one youíll see arising often and this is called Ė I guess the more descriptive term for this would be the static gain matrix is what this would be.
But kind of a rather cool retro term to use is itís the DC Gain matrix. Thatís direct current, so this goes back to like, I donít know, 1910 or something like that, but itís kind of Ė I use this, but mostly just to irritate people and stuff like that because itís so retro. All right. And Iíll say a little bit in a minute about what happens if A happens not Ė if A is not invertible.
So what this describes is it actually describes the system, what relates the inputs to the outputs under static conditions. That is exactly what this does, so if you have static conditions, that means u, y, and x are all constant. Then of course, you have x dot and in that case is zero; x is constant. See if zero = A + Bu, y = Cx + Du and if you eliminate x from these equations by solving for x = minus A inverse Bu here and you plug that in here, you get this. Okay? So this is assuming A is invertible here, so this is what it describes.
Now, if A is not invertible, what it says is that there are inputs for which you cannot solve this equation here for any u, so it says there actually are no Ė there are uís, or there can be uís, for which there is no static equilibrium. You donít have to worry about that, but thatís the meaning of that.
Now, if the system is stable, this is the integral of the impulse response, and this just follows from the Laplace transform, that the integral of a function is Ė well, this is e to the Ės(t), so h of s is integral either the Ėh times this, 0 to infinity. You plug in s = 0 and itís the integral.
Now, that of course, requires h of t, this integral, to make sense and that would be the case if h of t decays.
Okay, itís also the same as the limit as t goes to infinity of the step response matrix, and the step response matrix here is, of course, the integral from 0 to t, and thatís again, follow just from thatís what the definition of the integral is.
Okay, and if you wanna know sort of what does the static or DC transfer matrix tell you, it basically tells you this. It says if the input vector converges to some constant u infinity, so the inputs can wiggle around, they can do whatever they like, but if they converge to a constant value, then the output will also wiggle around, but it will converge to a constant value and that constant value is obtained by matrix multiplication h of 0.
So h of 0 is very important. Itís the thing that tells you, roughly, how the input affects the output when all the transients and so on have settled out. And if you work out Ė we can work out some examples and they should make perfect sense.
For a mass spring system, the DC gain matrix Ė our mass spring system is right here. Our DC gain matrix, and letís think about what itís gonna do Ė itís gonna map the two tensions you apply here into the displacements of these three masses. So thatís what itís gonna do. Thatís what the DC gain matrix is.
So, of course, itís a 3x2 matrix and it would tell you, for example, the first column is obtained by pulling one Newton on this tension, letting the whole thing come to equilibrium and then recording the offset of the three masses. And itís kind of obvious what happens if you pull a Newton here, this thing displaces to the right, this displaces to the left. This displaces to the left a bit less. This one probably a bit more to the left than this one goes to the right. Iím making that up and I will change my story if we look at the actual numbers now and itís different.
And letís see. Did I get my story right? Yes, I did get my story right, so this says that if you apply one Newton, the left mass Ė thatís the first position Ė moves a quarter meter to the left. The next one moves minus a half; thatís twice the displacement, and the other one moves minus a quarter. And then you get a similar thing over here. Thereís a symmetry.
And this you can work out. Well, itís horrible, but I donít recommend doing it by hand, but for example, if you took this thing and C and worked out C times si minus this thing inverse times that, you would indeed get this matrix.
Now for the circuit, the DC gain matrix is actually quite simple. Again, you can work it out, but letís see what it means. I have to find it somewhere. Here we go.
So the circuit, the DC gain matrix, has the following interpretation. It basically says apply one volt here, wait for all the transients to settle, and then look at the mapping from the voltage here to the voltage at all the nodes, and of course, itís one, right? Because if you put a volt here, if everything is static, thereís no voltages changing anywhere, no current disflowing, this is an equi- potential here. So these all have to be the exact same voltage. So thatís the DC transfer matrix. Actually, in this case, thatís literally the DC transfer matrix from that input to those outputs, itís like that.
Now, these are silly cases. Obviously, if you have something more complicated, itís interesting immediately. I mean it tells you immediately how something works in static conditions.
Weíre gonna cover a couple more topics on systems with inputs and outputs. So the first is dyschronization with piecewise input, so here you have your system x dot A + Bu, y = Cx + Du and I put in a sequence Iím gonna call that use of D, D is for discreet or something like that. And what weíll do is, well, the input is going to be Ė the continuous time input, thatís the input to this system here, is going to be equal to Ė Iím going to index into the sequence and thatís going to be over the time interval from kH to k + 1H.
So by the way, many people call it Ė thereís all sort of names for this. This is sometimes called zero order hold is one name youíll hear for it. It just means the input is piecewise constant. Thereís probably some other names. Iím just not thinking of them at the moment, but thatís basically what it is.
Okay, and now, weíll define sequences, so x[D], x of D, thatís for discreet, thatís a sequence. So in other words, it takes as argument an integer and itís gonna be x at kH. So x is this function from R + Rm; x[D] is a sequence. Thatís a function from z + to RN. Thatís what x[D] is. And the system came by sampling here, so these would be Ė you can call this a perfect sampling or something like that.
So here youíve referred to h as the sample interval. Actually, thatís for x and y. Up here, it would be very common to call it the update interval. So that would be very common to just use a different name for it. And in fact, it turns out that in many systems, the update and the sample intervals are different. Weíll get to that in a minute.
So this would be the update interval. So for example, youíd say, oh, the update frequency is 50 hertz. The sample frequency is 400 hertz. U hertz did the same, but thatís the idea.
Okay. All right, so you can just think of these as sample versions of x and y. Letís work out how theyíre related. Well, the x[D] of k + 1 is x of k + 1H, but we can get that because we can propagate forward. This is what the state was at kH. This propagates it forward h-seconds. So this tells you how the state would propagate, actually, if there were no input over that interval.
Then this convolution here tells you what the effect of the input is, but itís only over that interval. Here u only refers to between kH and k + 1H. Now over that time period, this thing is a constant and therefore, comes out of the integral. These are matrices. I obviously canít pull it out on the left. I have to pull it out on the right. Itís the only correct place to put it and it goes here. B is also a constant; that goes out, and so you get something like this. I shouldnít say that. You donít get something like this. You get this.
So the terms make perfect sense. This is basically what would Ė this is what the state would be if there were no input applied. This, thatís complicated, but this thing here, thatís a matrix here and it just multiplies what the input is, its constant value over that interval and itís an update, and in fact, if you look at this closely, you can write this this way. Itís a discreet time dynamical system. It says x[D] k+1 is AD, x[D] + BD times uD, and everything works this way. D is the same, C is the same. These donít change, but A and B are given by this. And some people call this the discreet time system, so thatís what this is.
This is simple enough. This integral, by the way, can be evaluated numerically, but in fact, thereís lots of cases where it can be analyzed or you can actually just give an explicit formula for it from this thing. And Iíll give a case like that in a minute.
So A is invertible, which, by the way, need not be the case. There are plenty of interesting cases where A is not invertible. You can express that integral as this, and this makes perfect sense if someone walks up to you on the street and A for example was a number, you would know that the integral, either the TA between zero and h is something like itís one over A times e to the HA minus 1. So you would guess at a formula like this would hold. And the questions would be Ė now, thatís still just a guess. Thatís how these overloadings should work. You should just figure out what would this be if they were scalars. Then you have to put things in the right place.
Actually, it doesnít tell you whether the A inverse should go on the left or the right here. That it doesnít tell you. Thatís the first thing. And then thereís the bigger question as to whether or not itís true because arguing from analogy from a scale of formula will often give you the right answer, but in a large number of cases, it just will give you something thatís completely wrong.
So here, I can tell you why youíre okay, or actually, maybe someone can tell me why. What is it about the terms here that actually gives us Ė what makes this safe? By the way, safe means you should still go and check it, but what makes this safe? What do you imagine makes this safe?
Everything commutes here. You imagine this as a power series in A. Everything commutes. A commutes with A inverse, obviously, because A-A inverse is A inverse A is i. When everything commutes, thatís actually kind of the safe time. Thatís exactly when your scalar formulas are gonna actually correctly Ė theyíre gonna lead you to correct results. So thatís how that works.
And if you wanted to show this, it wouldnít be hard. Youíd actually just write out the power series for e to the Tau-A, integrated term by term, look at the new power series you have and say, hey, what do you know. Thatís the power series for this. Now, you might not wanna do that because of the A-inverse there, but still. Well, anyway, that would be that. So you have this.
Now, and interestingly, we can actually now point out something quite interesting here is that stability is preserved by discretization. So if Eigen value of A, thatís an x dot = A + Bu lambda N, the Eigen values of AD are e to the H lambda 1 up to e to the H lambda N. Did we put that on like some additional Ė I have a vague memory we made an additional homework problem on this or something. Spectral mapping theorem. Well, I have a vague memory of it. Maybe you have or Ė a few people nodded. You werenít just being polite, right? You actually saw Spectral Mapping Theorem in this course sometime? Okay, good. All right.
So here you would know that the Eigen values of AD, which are given by just the exponential like this, would be e to the H lambda 1 up to e to the H lambda N. And thatís interesting because it turns out that if you have the real part of a complex number is less than 0, thatís if, and only if, the magnitude of its exponential is less than 1. And the reason is very simple. Itís the magnitude of e to the Ė well, here Ė u, if u is a complex number, is exactly equal to e to the real part of u like that. So thatís almost by definition. So if this here has a negative real part, then the magnitude of this is less than one. Now, this is just the simplest form, but thereís all sorts of horrible little things. Theyíre mostly horrible in bookkeeping, but you could actually, in principle, work out all of them now. Here are the kind of horrible things that could happen. You could have offsets. You could say, well, you know, we measured the state at these times, which are offset. Theyíd give you some horrible timing diagram, and then they would say the whole thing operates at 50 hertz, but the state sample and the update are offset by 8 milliseconds, or something insane like this. Itís not hard. You just plug in the right e to the 8 milliseconds A in the right places and you could work out what happens and all that sort of stuff. Itís not fun, but it can be done.
Very, very common is multi-rate and so here you could actually have different inputs updated at different times, different outputs sampled at different intervals. Theyíre usually multiples of a common interval, so thatís extremely common. So, for example, a jet engine controller on an airplane might run at 400 hertz and then something wrapped around that, the update might be at only 100 hertz, but the sampling might be 50 hertz. Your nav update, your navigation update might be running at 10 hertz. Your radar altimeter might be running at 2 hertz and all this kind of stuff, and itís just a lot of horrible exponentials flying around and itís not fun, but somebody has to do it. And I claim, in principle, you could.
Okay, the next topic, which is dual system, Iím gonna actually skip over because I decided I wasnít Ė Iím feeling slightly pressed for time, but only just slightly. So donít worry. Youíll know when I get panicked. Iím not going there. I control it by the number of espressos I have before coming here. Iím at two right now, but I can go up to four if I need to. We donít yet. Next topic, causality. So Iíll just say a little bit about this. Now, one interpretation as you write the state is this. Itís basically the state of time, T; the state propagated forward T seconds. Thatís what this term is. Plus, thatís if you do nothing, so some people call this the zero input term, or I donít know, something like that. This thing is the zero initial state input. This is the effect of the input. Now notice that the state of time, T, depends on the initial state and the input only over the interval between zero and T, time. So you would say something like this. The current state and the current output depend only on the past input. Well, thatís what causality is. Causality says that the things you do now affect the future, not the other way around. Now itís a bit strange because you look at x dot = Ax + Bu, y = Cx + Du and I donít see any asymmetry in time in these equations. I donít see anything that would make things running backward in time look any different than running forward in time. I donít see any asymmetry here. Actually, the asymmetry is right here, in fact. Thatís the asymmetry, so itís a subtle point. Itís probably worthwhile to mention. So the asymmetry doesnít come from these equations. These equations are very happy. They run backwards in time. They run forwards in time. No problem. Itís actually our assigning Ė our considering an initial value problem. Thatís a problem in which the initial state is fixed, which makes it appear causal to us. For example, suppose you fixed the final state, x of T. Well, then I can write another formula that looks like this. It says that the current state is equal to the final state, propagated backwards in time to the current value, plus, and then an integral that has to do only with the future. So this is an integral. And I mean you can just check this formula. Itís absolutely correct.
So in this case, if you fix the final state, the system is, in fact, not only is it not causal, itís anti-causal, or something like that. I mean this had to be. So these are both related to the concept of state, which we, so far, have used only to mean x of T in x dot = Ax + Bu. But in fact, thereís an idea that when you say state, it ties into a much bigger idea and let me say what that is. So first, let me just say what state is abstractly. How many people have seen the concept of state, like in a computer science course, some very abstract computer science course or something like that? Actually, [inaudible] theory where you have to know that, right? You have to know the state of a processor. So in fact, the state of a processor, now thatís not abstract. Thatís quite not abstract. The state of a processor or something like that is very simple. Itís all the values of the registers and all the stuff like that. Itís basically whatever you would need to do in that processor so that, if you restored the state and ran it forward in time from there, its behavior would be absolutely identical. Thatís what the state is. So for example, when you call a function, you push the state on some stack or something, and when you finish calling the function, you pop the state and the idea now is, aside from side effects that happen during the function call, it should actually continue absolutely as if nothing happened. By the way, if you forget to set a register, if you forget to restore a register or something like that, youíre in deep, deep trouble, and thatís because you failed to restore the state. Thereís also an abstract idea of state. Abstract idea of state is something like this. It is this and this is worth thinking about and understanding. This is gonna be abstract, so you donít have to worry about it. Itís just for fun. The state is what you need to know about a dynamical system at a given time so that, if you knew the inputs in the future, you could perfectly predict the future behavior. Thatís what it means. So other ways to think of it is that it is a sufficient statistic for what has happened in the past in order for the purposes of predicting the future. So thatís what state is. Okay, so for example, if you have a model of how prices are dynamically changing, they depend on certain other things, like interest rates and things like that, you would say that the state in that process is sort of everything you need to know so that moving forward you can make prediction of these prices correctly, given the future inputs. Have a question?
Student:So this would work for like a time invariant system?
Instructor (Stephen Boyd):No. No, it works perfectly well. The question was does it work for time variances. It works perfectly well for Ė the same concept works for time variance systems. And by the way, discrete systems as well. So thatís the idea.
So thereís lots of ways to think about it. Youíd say something like this. The future output depends only on the current state and future input. Or you could say the future output depends on the past input only through the current state. Or you could the state summarizes the effect of the past inputs of future. These are all ways to sort of say the same concept. And another way to say it is the state is actually the link or the bridge between the past and the future. So another way to say it, you say this in machine learning, youíd say something like this, that the past and the future are conditionally independent given this state. Again, if you donít know what Iím talking about, thatís fine. These are vague ideas, but thatís okay. Itís a very useful one to know about.
Okay. Okay, so thatís the concept of state and thatís just to sort of point out what it means.
Another topic is change of coordinates. Weíve already done this for an autonomous system. For a system with inputs and outputs, a non-autonomous system, the same sort of thing happens, except youíll see that some things donít transform. So if we choose to change the state to x ~, we write x is tx ~, so x ~ are the co-efficients of the state in the ti expansion of the state. Then youíve got x ~ dot is. You get your familiar term here. Thatís the T inverse AT, thatís the similarity transform, times x ~ + T inverse Bu. That makes perfect sense because Bu was the effect on the state derivative, but in the original coordinates, and thatís what it is in the new coordinates.
Now, the linear dynamical system then looks like this: x~dot is A~x~+B~u, y to C~x~+D~u and here A is the familiar similarity transform. B gets transformed by this T inverse. C gets transformed on the right and D doesnít change at all. But these make perfect sense, absolutely perfect sense. And the reason is this. Youíre only changing the state coordinates, so thereíd be no reason that D would change because D maps input to outputs. Thatís how you do the state.
This is a read-out map and this basically says how do you map x~. The T here transforms for the new coordinates to the state coordinates to the old ones, and so on. Now, when you do this, the transfer function is the same. In fact, the input and output have not changed at all here, have not changed at all, and you just work out what happens here. If you form C~si minus A~B Ė by the way this means the impulse response, for example, is the same. So you can check that C e to the TAB Ė well, I guess youíd say plus delta of T times D, thatís the impulse response. This is the same as if I put tildes all over these things. It will be identical because in fact, u and y havenít changed. So this will be the same.
Okay, so you get this, and again, you have to work this out. This is not immediately. You have to throw Tís and T inverses and then the usual tricks with T and T inverses help, for example, i you write as T, T inverse, for example, or something like that. You pull things out, mess with a few inverses and things, then thereís a lot of smoke, a lot of cancellations, and in the end, you get this. So itís the same.
Okay, this allows you to have ideas like standard forms for linear dynamical systems. So you can change coordinates to put A in various forms, like you can put diagonal real modal Jordan. Iíll say a little bit about the use of this. It actually has a huge use, which Iíll say about in a minute.
So here if you put it in certain forms, then you get very interesting block diagrams because, for example, if A is diagonalized, you get something like this, and that is the so-called diagonal form of a linear dynamic system. Thatís the diagonal form, just as an example.
Now, you might ask why would you want to change coordinates like this. Well, you might wanna change coordinates of a real system just to understand how it works. So for example, this might be a modal Ė this would be a modal expansion in the middle, and then if one of these numbers were really small, youíd say something like, well, the input doesnít really strongly drive the third mode. That would be it. Or youíd say something like you look at a number over here, it might be small, or if thereís a scalar, or if you know this is a matrix, itís small. Youíd say the output doesnít particularly see a large contribution from the second mode. That would be the types of things youíd say.
Now, thereís another real use of changes of coordinates and this is real, and it is absolutely not Ė you do not change coordinates in the purpose Iím gonna describe now, to things like Jordan canonical form or diagonal or anything like that. Itís this.
If this represents some sort of signal processing system that you need to implement, for example, this might represent your equalizer, letís say, in a communication system, and you have to implement it. This change of coordinates is your friend. It is a degree of freedom because basically itís a knob. It says down here, the point is I can choose any T I like and I will have implemented the same signal. It has the same input/output properties, so I can change coordinates anyway I like.
I could do this for lots of reasons. I could change coordinates to get an A, for example, that has a lot of zeros in it. That might be very useful because when I implement this, it means itís much simpler to implement, for example. And thereís actually forms you would use.
You might also, if this was actually some kind of analogue implementation, you would actually change coordinates to make the resulting system one that was less sensitive to perturbations in the parameters, just for example. These would be the types of things you would do.
Okay, that was just for fun, but itís important to know that the change in coordinates actually has real use beyond merely sort of understanding how a system works, which is the most obvious application.
Okay. So weíll finish up with discrete time systems. Theyíre pretty simple because thereís not even any differential equation. Theyíre just linear iterations. This is xT + 1 is Ax of T + Bu of T. The block diagram looks like this. This is sometimes called Ė you can call it a register or a memory bank or something like that. Thatís what it is. Or a single unit delayer or something like that; a single sample delayer. Well, one over z, z inverse would be a typical one.
That means thatís the output sequence. What goes into it is actually the sequence advanced in time because if youíre looking through a delayer at a signal, what went in is something that was advanced in time on unit, so thatís what this is. Thatís the block diagram.
Youíll notice itís exactly the same, except the z is replaced with s, so thereís really nothing here.
Oh, I should Ė I guess Iíll also mention if you know about digital circuits and things, you can also imagine that thereís a latch so that this thing doesnít Ė like you do two-face clocking or something like this, or thereís a small delay or something like that so that on each Ė this thing doesnít race around and come back. Itís basically thereís a one-sample delay there.
Now, here the analysis is well, I mean it couldíve been done on Day 1. It requires nothing more than the knowledge of like matrix multiplication and thatís about it. So x of 1 is this; x of 2 is that. You multiply this out and you get that. And the pattern is very simple. The state is A to the Tx of 0. Thatís basically what A to the T for a discrete time system, thatís the time propagator operator. Thatís what that is. Thatís time propagation, A to the T.
So this propagates for the initial state and this is actually discrete time convolution. Itís nothing more. This is just discrete time convolution here. And you can write it this way: Cy is CA to the Tx of 0. Thatís the input contribution from the initial state, plus H convolved with u starts discrete time convolution, and the impulse response in this case is D at T = 0 and then C, A to the T minus 1, B for T positive. So thatís the impulse response. Itís D, CB, CAB, CA2B and so on.
Okay. Now, suppose that you have a sequence Ė and weíll just cover the z transform very quickly. Thereís not much here. Itís mostly just a review. Suppose you have a sequence. Weíll just make it matrix value to cover all cases right off the bat and weíll make it a sequence defined not as a double-sided sequence, but a sequence defined from 0, 1, 2 on these indexes. Then the z transform is simply this function here. And this maps, this makes sense, and depending on how violently w diverges. You have a divergence no faster than an exponential, then this is guaranteed to make sense for large enough complex number z because if z is large enough in magnitude, these things are falling off here quickly.
So you referred to this as the domain of w and as with Laplace transforms, itís not a big deal; you donít really need Ė all that can happen there is things can get complicated.
Now, if you define a new signal, which is time advanced like this, so itís the same as another signal, but itís what the signal will be one sample later. You work out what the z transform is. Itís again, itís like baby algebra. You multiply this out. Iím gonna change T to T bar, whatever, T + 1, and Iíll write it this way where the z comes out, and then you can write that as zw of z minus zw of 0. It looks like the same thing for the derivative formula for Laplace transform. Thatís the only difference right there, is that z. But otherwise, thereís no Ė it doesnít matter.
Okay. Discrete time transfer function is simple. You just take the z transform. I mean thereís a big difference here. We donít need this to get the solution. We already know the solution. We just worked it out by multiplying matrices, so this is, in a sense, not particularly needed. I just wanna show you how this works. And there actually are people who are more comfortable in the frequency domain; so just from there, depending on their cultural upbringing, personality type, things like that, there are just people more comfortable. Thatís fine. No problem. And so this is addressed to them.
By the way, if youíre perfectly comfortable with matrix multiplication, then I think everything we did over here, this described exactly what happens in a discrete time system. But anyway, all right.
So you take the z transform here. You get the usual thing. This is the analog of sX of s minus x of 0, except we have this extra z in there. You get this and you solve for the z transform of the state to get this formula here. And the only difference is thereís an extra z here and you get the Laplace transform of the output is gonna look like this. Itís h of z, u of z. H of z is Czi minus A inverse B + D. Thatís the discrete time transfer function. Got a question?
Student:Yes, [inaudible] rotation of that extra z?
Instructor (Stephen Boyd):Thatís a good question. I think the answer is yes. I can defend its existence, at least this way. Or I canít argue for it, but I can tell you why you shouldnít be shocked to see it there. If you sample something faster and faster and faster, this is, basically, -- where is it? I lost it. Here it is. Okay, that one. This basically says youíre off by one sample in calculating the effect of the output, the effect of the initial state, for example, on the output. Thatís kind of what this says. If you sample faster and faster, the z has no effect because youíre sort of moving something close. Itís a smaller and smaller time interval, so itís okay. It would be like saying, well, no problem. Iím just using x at the end of the interval as opposed to the beginning, and as the interval gets small, the effect goes away. So thatís not an argument as to why it should be there. Itís an argument as to why it shouldnít bother us that itís there. That is just an additional argument because the main argument, why it shouldnít bother you that itís there, is correct, which is a strong argument. Not always completely persuasive, but thatís it. Okay, so this finishes up a section of the course and weíre now gonna enter the last section of the course, and in fact, weíre gonna do essentially one more topic and then the course is over. Itís gonna take a lot. Weíre gonna do a lot of stuff on it. Itís very useful. Itís really cool stuff. It has to do with singular value decomposition. You may have heard this somewhere, somebody. Actually, how many people have heard about things like singular value decomps? Thatís very cool. Where did you hear about it?
Instructor (Stephen Boyd):A linear algebra class, so itís gotten there. Itís funny. Itís only 100 years old. Traditionally, material hasnít gotten into Ė that was taught in the math department?
Student:No, actually, [inaudible], but under them.
Instructor (Stephen Boyd):Oh, taught in the math department?
Student:No, itís EE.
Instructor (Stephen Boyd):Oh, okay, sorry. It was taught in an EE department. Okay. So you know, I think actually, itís about time. Itís been around for about 80 years now, so itís about time you might see its appearance in math, linear algebra courses. Did anyone here actually hear it in a linear algebra course taught in a math department? Cool. Where?
Student:At the University of Maryland.
Instructor (Stephen Boyd):Aha. Cool. So that was, by the way, one hand in a sea of Ė for those watching this later or something. Okay, all right, fine. So weíll look it up. How about in statistics? Anyone hear about this principle component analysis? Thereís a couple, okay. Whatís that?
Student:We used it in machine learning.
Instructor (Stephen Boyd):P Ė so Machine Learning, you know about it through PCA? But other than that, I guess people in like CS never heard of this. Okay. Cool. Iím trying to think of some other areas where itís used. Okay, all right.
So weíll do the last topic. Weíll start in with some basic stuff. You know, itís actually quite basic and Iíll explain that in just a minute. Weíre gonna look at first, the special case about the eigenvectors of symmetric matrices, what you can say, and then weíll get to quadratic forms. Thatís actually a bit different. Actually, itís the first time Ė so far, a matrix has always meant the same thing to you. This is actually be a different Ė itís gonna mean something different here. Weíll see that soon. Weíll look at some of the qualities, the idea of weíre gonna overload inequalities to symmetric matrices. Weíre gonna overload norm to matrices. These are not gonna be simple overloadings. These are not Ė theyíre gonna be overloadings in the sense that some things youíd guess are true would be true, and a bunch of things you would guess are true are false. And these are not simple overloadings. Theyíre not what you think they are and theyíre really useful. And this will culminate in the singular value decomposition or principle component analysis, depending on your background. Okay. Letís start with the Eigen values of symmetric matrices. So suppose you have a symmetric matrix, obviously itís gotta be square. And hereís the first fact. The Eigen values of a symmetric matrix are real. Oh, by the way, whole groups of people, for example, if you do physics, depending on what kind of physics you do, what happens is all the matrices you see are real. By the way, they could be either symmetric Ė thereís another one where itís self-adjoint is what youíd call it. And it means that all the Eigen values youíd ever encounter would be real.
Or, by the way, sometimes thereís an I in front, in which case, all the Eigen values are purely imaginary or something like that. So if youíre in one of these fields, what happens is, after a while, you get the idea that all matrices are symmetric or self-adjoint. Then you actually start imagining things, like all of these, and they lose Ė even people who have done this for years and stuff like that, they get very confused then when you go back to matrices that are non-symmetric. Or theyíve even completely suppressed it and forgotten it. Okay, but for you Ė I should say this Ė for symmetric matrices, theyíre very special things that obtain, in terms of the Eigen values, eigenvectors, all that. Itís very useful to know. Just donít spend all your time off dealing with these. If youíre one of the other types, make sure you know what happens when matrices are non-symmetric. But anyway, letís look at it. Letís see how this works. Suppose you have AV as lambda V. V is non-zero, so V is not eigenvector and V is complex here. Letís see what happens here. Iím gonna look at V conjugate transpose. By the way, thatís an extremely common thing. People write that as VH or V* and I should mention, although itís Ė I donít know Ė anyway, in Matlab if you type V prime, like that, and itís complex, you will get this. You will get the conjugate transpose. So this is V conjugate transpose and if youíre very slow with these arguments, because they look Ė just one little mistake and Ė in fact, arguments when you first look at them, they look like it canít be, like you missed something or whatever. They look magic. But letís take a look at it. Weíre gonna say V conjugate transpose AV as V conjugate transpose Ė I mean a parser as AV here. But AV we know is lambda V, so Iím gonna write this as Ė the lambda comes outside. Iím gonna write lambda V conjugate transpose V. V conjugate transpose V is the sum of the absolute values of the complex numbers lambda i because thatís what, well, because A conjugate A is magnitude A2 for a complex number. Now weíre gonna do the same calculation, but weíre gonna do it a little bit differently. Iím gonna take this thing here and Iím gonna replace these two with that expression, and thatís fair. And letís see why. Well, we can do the transpose first. This is the same as AV Ė well, you can do either one. This is V conjugate transpose A conjugate transpose. Thatís what this is. Now, V conjugate transpose, thatís what Iíve got here. A conjugate transpose is A transpose because Iím assuming A is real. So I get that. And AV is lambda V. I plug that in here, but thereís still the conjugate up top and that comes out as lambda bar times this. Now these two are equal, so thatís equal to that. This is a positive number and theyíre equal; thatís equal to that, and the only conclusion is lambda is lambda bar. You can go over this and look at it yourself to check that I am not pulling a fast one on you because if you first do this calculation, and I would assume no problem, I just lost a conjugate somewhere. But in fact, no. These are two valid derivations. Thatís equal to that and therefore, lambda is lambda bar. Itís the same as lambda is real, same thing. Now, we get to a basic fact about symmetric matrices and itís important to understand the logic of it. Itís quite subtle. It says this. There is a set of orthonormal eigenvectors of A. Thatís what it says. Now, in slang, you would say things like this. You might say the eigenvectors of A arenít orthonormal. In fact, thatís how youíd say it informally. But that is wrong. Okay? This is the correct statement, so as with many other things, you might want to practice thinking and saying the correct statement for a while, and after a while, when you realize people are looking at you weirdly and theyíre like why would you sound like that, then when itís actually causing social troubles, then you switch. People start thinking youíre a [inaudible]. And then you switch to the slang and the slang is the eigenvectors of a symmetric matrix are orthonormal. Thatís wrong in so many ways if you parse it. Itís sad. Weíll go over the ways in which thatís wrong in a minute. So letís see what that says. It says I can choose eigenvectors Q1 through QN, which are eigenvectors of A with Eigen values lambda i, and the Qi transpose Qj are delta ij. Thatís the same as saying thereís an orthogonal matrix Q for which Q inverse AQ, which is the same Q transpose AQ, is lambda. So another way to say it is you can diagonalize A with an orthogonal matrix if A is symmetric. Thatís the condition. Okay, now, that says you can express A as Q lambda Q inverse, but Q inverse is Q transpose. Now this I can write lots of ways, but hereís one. I can write this in a dyadic expansion here. This is the sum lambda i times Qi, Qi transpose. Weíre gonna look at this and itís a beautiful thing. These are end-by-end matrices. Some people call these outer products, so itís a sum. Theyíre also called dyads. And so this is sometimes called a dyadic expansion of A because you write A as a linear combination of a bunch of matrices or dyadic expansion. Now we have seen that matrix before. It is projection onto the line that goes through Qi, so this is A. You express A as a sum of projections onto these Ė in fact, theyíre orthogonal projections, these matrices. And I think Ė I have another vague memory of a homework assignment problem or something like this. Maybe not. Okay. Some of my vague memories are, well, wrong or something. Okay. So these are projections, so there can be a lot of interpretations of what this means. Before we go on, though, letís talk about the slang statement. So hereís the slang statement. This is what you would say. The eigenvectors of a symmetric matrix are orthonormal. Thereís your statement. This is among friends, casual get-together; this is what you would say. Youíre just fooling around doing research, no oneís looking, this is what youíd say. Actually, you could even say this at a conference. Thereíd be some social cues, though. When I hear someone and people like me hear someone say this, we get a little bit on edge and we listen for a little while to figure out either they have no idea what theyíre talking about or they know exactly what theyíre talking about and theyíre choosing to use informal language. Itís usually clear very quickly. Okay. This doesnít make any sense and any sense that you could assign to this is completely wrong. First of all, you canít talk about the eigenvectors of a matrix, even though everyone does, because it doesnít make any sense. Thereís zillions of eigenvectors. Take any eigenvector; multiply it by any non-zero number, thatís an eigenvector. So you donít talk about the eigenvector because that doesnít Ė letís start with that. That doesnít make any sense. Okay. So if I have a matrix Ė hereís one Ė i, that matrix is quite symmetric. What are its eigenvectors? So what are the eigenvectors of i?
Student:All the non-zero vectors.
Instructor (Stephen Boyd):All non-zero vectors are eigenvectors of i. So letís make it two by two and I could say, okay, hereís my choice of eigenvectors for i: 1 1 and 1 2. There, thatís my choice of eigenvectors. Thatís V1 and thatís V2. It is now false that these eigenvectors are normalized. They donít have norm 1. That has norm square root 2, that has norm something else, squared 5. So thatís false. Theyíre not normalized and theyíre most certainly not orthogonal by any means. Okay?
And in fact, if someone said get me the eigenvectors of i, and i returned these two things, no one can complain. These are eigenvectors of i, period. Okay? So itís absolutely the case that i times this is 1 times this, i times that is 1 times that. So thatís fine.
Okay. Now, thereís actually a situation in which you can actually say when something is close to this, so letís forget the normal because thatís silly. Can you say the eigenvectors of a symmetric matrix are orthogonal, and this case shows the answerís no because itís not true. Hereís a matrix in symmetric. Thatís an eigenvector, thatís an eigenvector; theyíre independent and theyíre by no means orthogonal. I think thatís enough on critiquing this thing.
So the right semantics is, the right statement is you can choose the eigenvectors to be orthonormal, and that statement is shrewdly true for i because, for example, I could choose 1 0 and 0 1. For that matter, I could choose 1 1 divided by square of 2 and 1 minus 1 divided by square of 2, and that would also be an orthonormal set of eigenvectors for i. Okay.
Letís interpret this formula. This is A is Q lambda Q transpose. Now remember, Q transpose is Q inverse. So letís look at some formulas. Letís look at some interpretations. So the first is to look at three matrices separately and it says that if you wanna operate by A on a vector, hereís what you do. The first thing you do is you multiply by Q transpose. So the first thing you do is the x comes in, you multiply Q transpose x and you get Q transpose.
And now we know what Q transpose x does. We know exactly what it does. Thatís essentially Q inverse. Q transpose x actually resolves x into the Qi coordinates. Thatís what it means. It resolves x. So this vector is the coordinates of x in the Qi expansion.
Now we multiply by symmetric matrix Ė I mean, sorry, a diagonal matrix. Thatís very simple. Itís very simple to visualize what multiplying by a diagonal matrix is because youíre actually just scaling each coordinate.
By the way, if itís a negative Eigen value, youíre flipping. Youíre switching the orientation. Okay? Thatís here.
Now, when we multiply by Q, we know exactly what that is. Thatís actually reconstituting the output. So if you like, you can think of this as a coding, a scaling, and a reconstruction. Your question?
Student:Yeah, sorry, still donít have any [inaudible] orthogonal eigenvectors. Why is it that for a symmetric matrix you can find orthogonal eigenvectors, but if a matrix is not symmetric, you canít necessarily find orthogonal?
Instructor (Stephen Boyd):Itís a great question. I havenít answered it yet. But Iím going to. I think I am. Might be Thursday, but Iím gonna get to it. So weíre just gonna push that on the stack and weíll pop it later maybe.
So the question was why and I havenít said so. So first, weíre just gonna look at Ė I said it as a fact and weíre gonna look at what does it mean? What are the implications? Then weíre gonna come back and see why itís true, but weíll see why in a minute. Now, by the way, I have shown why the Eigen values should be real. I have not shown that you can choose the eigenvectors to be orthonormal.
Oh, by the way, one implication of this, it says that a symmetric matrix, the Jordan form is really simple. Itís always diagonal. You cannot have a non-trivial Jordan form for symmetric matrix. So weíre gonna get to that later, I hope. I hope we are. I think we are.
Okay. Now, this is actually a very interesting interpretation. Oh, and by the way, itís worthwhile knowing this; this comes up all the time. This is, well, roughly, actually, this exact operation is carried out, for example, in the current standard for DSL. Itís also done jpeg. So jpeg, you do a DCT transform on an 8 x 8 block of pixels. You donít have to know this. Iím just saying this is not those blog diagrams of abstract things. This type of thing is done all the time in all sorts of stuff all around you.
In jpeg, at least in one of the older standards, you take 8 x 8 blocks of pixels, you form a DCT transformation, and then, in fact, you donít multiply here. In fact, what you do is you quantize in the middle and then you transmit these and then itís reconstituted when you decode the image. Okay? So pictures like this actually are all around you, widely used, and so on. It comes up in signal processing, compression, communications. I mentioned one in communications. It comes up all over the place.
So thatís the picture is you resolve into the Qi coordinates, you scale, flip of lambda is negative, and you reconstitute.
Now, geometrically, thereís a beautiful interpretation because we know an interpretation of orthogonal matrix geometrically is itís an isometric mapping. So itís something that preserves lengths and it preserves angles, and it preserves distances.
Now, it can flip. For example, you can have a reflection through a plane and roughly speaking, you should think of these as rotations or reflections. So this basically says Ė Iím gonna call it a rotation even if itís a reflection Ė it says rotate the input by, for example, round some axis by 30 degrees. Scale in that new coordinate system and then it says undo it, and that means rotate around the same axis 30 degrees the other direction. So thatís the idea.
Weíve already mentioned this. Oh, by the way, when you diagonally real scale a vector, thereís lots of ways to say it. Thereís, well, I found both dilation and dilatation, so somehow thereís two. I thought for a while dilation was the only correct one. No, it also turns out, itís also English to say dilatation and I tried to blame it on some weird people in some field. I couldnít identify the field that committed this crime. Or country of origin; I also tried to pin it maybe like on the British or something like that. That seemed like a good, promising Ė that would be the kind of thing, that extra syllable, just have that Britain there. But couldnít blame it on them. Couldnít chase it down.
So youíll see dilatation. Thereís also another thing you should Ė so youíll also see this. And by the way, on a couple of occasions, I have had students and people say that actually these mean different things and one or two of them tried to explain it to me. They seemed Ė the distinction seemed very clear to them, but it never sunk in with me. There may be. There may be a difference, but if there is, I for one havenít got it. That might be my Ė probably me. So there might even be a distinction. There could be some field where they say, no, no, no, totally different things.
This is where you multiply each component in the vector by lambda i. So this is the picture. Now this is decomposition like this, weíve already talked about. This is a dyadic expansion. You can call it Ė oh, by the way, some people call this simply the spectral expansion. This is a spectral expansion of A, thatís what this is. This thing over here.
And these are called projectors and sometimes they even Ė in fact, a very common way to see this would be this. This would be very familiar, but actually, in a lot of cases, there would be an infinity here. In a bunch of fields, this would be very, very Ė youíd see things like that and youíd also see the same thing with an integral and all sorts of stuff, but it would look just like that. And theyíd call that the spectral expansion of the operator A, depending on what field youíre in. Okay.
So letís look at Ė this is just a stupid example, but just to see that something happens here. So hereís a silly matrix. You clearly donít need anything to figure out what this matrix does to a vector. But as usual, with the examples, the boys and I even do this for a 2x2 matrix. The boys even do this for a 30x30 matrix, or for that matter, 3,000 x 3,000 matrix, where it is by no means obvious what a 3,000 x 3,000 matrix does symmetric at all.
So here you work it out. The eigenvectors turn out to be 1 1, 1 minus 1 Ė oh, did you hear that? That was slang, big time slang. So let me wind back and say it again without slang. But then Iíll stop after this lecture and Iíll go back to slang. Okay, so Iíll say it precisely.
For this matrix, I chose the eigenvectors 1 and 1 divided by square 2, 1 minus 1 divided by the square root of two, which are orthonormal. Actually, that involved a small slang because I shouldnít say the vectors are orthonormal. I should say I chose the set of two eigenvectors consisting of 1, first eigenvector, 1 1 divided by square of 2; the second eigenvector, 1 minus 1 divided by the square of 2; end of set. And that set of eigenvectors is orthonormal. There, that was formal.
Thatís why people donít talk this way and why, if you see people who do talk this way, itís weird. But you should think that way, so you should speak casually. But maybe you donít know when the right time is to, but you should never think casually. Thatís called sloppy or something. Okay. All right. Or you could do it; you should just do it in private. You shouldnít write it down or something. And not think that itís clear thinking.
So hereís the picture. Thereís Q1 and Q2, and hereís some x that comes in. Hereís x and so the first thing you do is resolve it until you project it onto the Q1 line and the Q2 line. Thatís orthogonal projections; thatís these and these. Then you scale these by the eigenvectors. I guess letís see what happens.
This first guy, nothing happens. And the second one gets multiplied by two and flipped in orientation, so this guy gets flipped over here and doubled, and so you get something like that, and then thatís the output.
Now, you sure did not need to do a spectral decomposition to find out what this 2x2 matrix does to a vector, but itís just to show an example.
Okay, now weíre gonna show what I said I was gonna show, except weíre not gonna do the full case. Weíre gonna do the case where lambda i is distinct. In this case, when the Eigen values are distinct, there is a statement that can be made somewhere in between the nonsense slang and the correct formal statement. Thereís actually a true statement that lies in between them. So weíre gonna do the case of lambda i distinct and in this case, I can say the following Ė all right, let me see if I can get it right.
All right, itís gonna come out right. Here we go. I have to concentrate. Here it goes. If the Eigen values are distinct for a symmetric matrix, then any set of eigenvectors is orthogonal. Independent eigenvectors Ė oh, I was close. I was close. All right, let me try it again. If a matrix is real and symmetric, then any set of N independent eigenvectors is orthogonal. Yes. It came out right. I think thatís correct.
So thatís a different statement from the original that I made. Just the quantifiers were in different places or something like that. Notice I didnít have Ė so in this case, you donít have to choose. There is no choice.
And the slang reading of that would be Ė in slang, you would say this, in abnormal speech you would say the eigenvectors of a real matrix with the state Eigen values are orthogonal. And then if you simply choose to normalize them, then you could say ďand therefore could be normalized to be orthonormalĒ or something like that.
So AVi is lambda Vi. Norm Vi is 1. Well, letís see whatís gonna happen here. So here, notice I didnít have to choose the Vs in any special way. I just chose them to be the eigenvectors period. I have asked you to normalize them. That seems entirely reasonable. The eigenvectors are non-zero. You can obviously divide them by their norm and get something which is still an eigenvector and has norm 1.
So weíll work out Vi transpose AVj, this thing, and weíll do that two ways. Iíll first parse it, Iíll associate it AVj. Thatís gonna leave you lambda j transpose Vj, but Iím also going to rewrite it this way as AVi transpose Vj. Now you have to check. AVi quantity transpose is Vi transpose A transpose, and thatís why I use the fact that A = A transpose and then thatís the same as this.
Here, I get lambda i. Now when you do things like this, you have to be very careful. Itís just like the calculation we did before. Itís probably that youíve made a mistake, but you can check here. There is no mistake and you see the following. Itís actually quite interesting. If i is not equal to j, you get this statement. Just by saying this is equal to that, you get this. Now thereís only two possibilities. I mean if i equals j, this is a non-statement because it says 0 = 0. If i is not equal to j, this is a number which is non-zero. Thatís our assumption that the Eigen values are distinct. Therefore, that has to be zero and weíre done.
So in this case, you can actually say the eigenvectors, with a small bit of slang, you can say the eigenvectors of a symmetric matrix with distinct Eigen values are orthogonal. And then therefore, can be normalized easily to be orthonormal, so itís something like that. Thereís a little bit of slang.
Okay, now the general case, the distinction is this. You have to say the eigenvectors can be chosen to be orthogonal. An example would be the identity matrix where, of course, you could choose an orthonormal set of eigenvectors, but if youíre just weird and perverse, you could choose any set of independent vectors and say, what? Those are my eigenvectors. Donít like them; theyíre independent; whatís your problem? So thatís that. Okay.
So letís look at some examples of this. The first example is a RC circuit. So here I have a resistive circuit. I pulled the capacitors out on the left like this and the state of this dynamical systemís gonna be the Ė I can take the voltage on the capacitors and I can take the charge just as well, and the dynamics of this is CVk. CkVk is the charge on a capacitor k. CkVk is the current flow and thatís the current flow into the capacitor, which is the negative of the current flow into the resistive circuit. So thatís the dynamics of that.
And I have i = gV; g is the conductance matrix of this circuit and its maps, the voltage appearing at these ports to the currents that flow into the ports. So itís a conductance matrix like that.
So this describes the dynamics of the system and you can write that this way. V dot is minus C inverse gV and so we have an autonomous dynamical system. Thereís A like that.
Now, by the way, g is symmetric, the resistive circuit. Thatís actually a basic principle from physics, basically says that if this is an arbitrarily complicated thing with the resistors and things like that, just your terminal resistors, then in fact, this g matrix is symmetric. And you get similar things in mechanics and other stuff, and economics, too.
Okay, however, this matrix is not symmetric. But weíre gonna change coordinates to make it symmetric and to change coordinates to make it symmetric, we use a rather strange state. Itís the square root of the capacitive times the voltage. Now thatís kind of weird because thatís a reasonable choice of state. Thatís voltage. That is an entirely reasonable choice of state, this measure in volts. Thatís the charge, to use the charge. This is in Coulombs; thatís in volts, but, in fact, youíre using something thatís halfway in between because this is Ci to the 0Ci. Youíre using square root CiV. It doesnít look right. These are very Ė I forget, I mean the physical units of these Ė this is really quite bizarre. I guess itís volts times square root farads, which you could write out all sorts of other ways. Whatever it is, itís weird. It seems odd. When you change coordinates like this, this is scaling. You actually end up with x dot is minus C minus half, gC minus half, so this is actually now a symmetric matrix here. We can conclude that the Eigen values of actually this matrix, which, however, is similar to this matrix, are real. So we recover something you probably know from undergraduate days or intuition or something like that. An RC circuit has real Eigen values period, so you canít have oscillations. Okay? Cannot. So for example, in an interconnect circuit, if itís well modeled by resistance and capacitous, you cannot have oscillations period. Okay? You have to have inductance to have oscillations. Okay, thatís that. I will say a little bit about this. I quit in a minute literally, and Iíll tell you why in a second. I should say there is, if you really do wanna know why you would use coordinates like this, there is actually something interesting here and Iíll tell you what it is. What is norm x2 for this thing? For this? Well, itís sum square root CiVi quantity2, which is what? Itís sum CiVi2 and somebody in EE tell me what this is. Itís the electro statically stored energy in the capacitors times 2. Thank you. I was just hanging on that last thing. That is twice the electro statically stored energy. So these coordinates might look sick and indeed, they are sick, I think. However, these are coordinates in which the norm now corresponds exactly to energy, so if someone said defend that, you could say, well, itís quite natural from the energy point of view. I have to quit a few minutes early today. I told people at the very beginning because I have to rush over and give a talk at CISX auditorium, so weíll quit here.
[End of Audio]
Duration: 72 minutes