Instructor (Stephen Boyd):I guess weíve started. Let me start with a couple of announcements. The first is I guess I should remind you that Homework 1 is due today at 5:00. Several people have asked what they should do with their homework, and I should have said this maybe on the first day, although I guess its most relevant now. One thing you should not do is give it to me. I am highly unreliable. I will accept it by the way, thatís why Iím warning you. Youíll give it to me. Itíll be in my hand. By the time I get back to Packard, who knows where it will be?
The TAs are more reliable. You can give your homework to a TA if you personally know them and trust them. But really, the correct I.O. method is that there is a box; itís actually a file cabinet; itís explained on the website. But itís near my office in Packard. Thatís the right place where you should submit the homework.
Okay, the next is that we have gotten more than just a handful of sort of requests or suggestions to do something like a linear algebra or matrix review session. Thatís for people who either never had a detailed course on this or have successfully repressed the memories of the course they may have taken. So for those people, the TAís and I have been talking about it, and weíll probably do something. It wonít be a formal session. It might simply be one of the office hours. One block of office hours will simply be devoted to this topic. And we would of course announce that by email and on the website. So that would sort of be the idea.
Just out of curiosity, by a show of hands, if such a session were to be put together, how many people would go to it? Weíll have a session, that simple. Thatís what weíll do. Okay, any questions about last time? If not, weíll continue.
So, last time we started looking at this idea of the null space of a matrix. Now the null space is a subset. Itís a set of vectors, itís a subspace, in fact, as its name implies. And it is a subset of what I would call a dimension of the input dimension. Thatís what it is. So, itís the vectors that are mapped to zero. Now what it gives is, this gives exactly the ambiguity in solving Y = AX. So, it tells you exactly what you know and donít know when I give you Y, about X when I give you Y in Y= AX.
In particular, anything in the null space is going to map to zero. So for example, in a sensor or estimation problem, X corresponds to some pattern or input for which your sensors all read zero. So for all you know, itís either zero or itís something in the null space. If the only point of the null space is zero, then that means thatís the only way that can happen. This is what it means.
Now in a design problem or in a control problem something in the null space is a very interesting thing. Itís something like this; itís something that has, you might say, zero net effect. So in the case of the mass problem, it would be a sequence of forces were non-zero, so the mass goes flying off to the left, right, goes all over the place. But the point is that N seconds later, it is back at the origin with zero velocity. So that means that basically nothing happened. Okay? So thatís the idea.
Now one case of great interest is when the null space Ė I canít see if Ė Itís not me but thatís twisted at an angle, so maybe you guys can rotate your camera there a little bit. There we go. Too much. There we go. Actually, thatís still too much. Iíll let you guys figure it out back there. So you say that you have zero null space, thatís the slang. That means, simply, not that the null space is zero. The null space canít be zero. The null space, the data type is itís a set of vectors. So when you say its zero, you mean itís a set whose only element is the zero vector. But the slang is the null space is zero. And some people call this 1:1, and what it means is this, it is the same as saying that X can always be uniquely determined from Y = AX. It basically says when does the transformation A not lose information? And 1:1, of course, means that different Xís get mapped to different Yís. That actually, literally, what 1:1 means. Otherwise you might call it many to one, would be the other way.
In fact, one too many is not a function, because the definition of a function is that only one thing comes out. Another way to say this is that the columns of A are independent. So when you say something like zero null space, youíre really just saying that columns of A are independent. So, itís two different languages for exactly the same thing. And you can check that. Why does it mean that? Because when you say AX = 0 implies X = 0, AX, one interpretation of AX is itís a linear combination of the columns of A where the Xís give you the recipe, the coefficients. So basically it says that the only way to add up a linear combination of the columns and get zero, the only way, is if actually the recipe is trivial, if all the Xís are zero. Thatís the same as saying the columns are independent.
Okay, now this part, everything up here is easy to check, or itís basically just a restatement of Ė I mean, thereís nothing to it. This is not obvious, and so a few things here are not going to be obvious, Iím not going to show them now. Iíll state them as unproven fact, later weíll come back and youíll be able to show all these things. This is an unproven fact. It says if A has zero null space, if and only if it has a left inverse. Now a left inverse is a matrix B, which when multiplied on the left Ė Sorry, when it multiplies A on the left it gives you the identity. Thatís what BA is.
Now you may or may not have seen this. You have probably seen the idea of a matrix inverse, but maybe not a left inverse. Actually, how many people have even heard of the concept of a left inverse for a non-square matrix? Thatís good because it used to be basically no one. So a left inverse means you multiple on the left and you get I. By the way, the meaning of the left inverse is actually, itís great, itís actually a decoder is what it is. Because if BA = I, Y = AX then if you multiple by B, what you get is you get X. So XB, when you see a left inverse, it means it is a decoder or in communications, itís a perfect equalizer. In any kind of signal processing itís a perfect deconvolver or something like that. Thatís what a left inverse is.
Now whatís interesting about this is that, sort of, of course, is a very strong statement about this first one. It says yeah, you can uniquely determine X, given Y = AX. This says, actually, not only can you determine it, but you can get it by applying a linear function. Thatís not obvious right? It could have been that you could uniquely get X, but you had to jump through all sorts of hoops and do all sorts of signal processing or something like that to get X. It turns out no. In fact, you can decode with a linear decoder.
And the next one is this. It turns out this one is relatively easy to show. It says itís the determinant of A transposed A, is non-zero. At the moment, that means nothing to you, but it will soon. And I think this is one of the ones thatís Ė this one, I have to be very careful here, because some of these are extremely simple. Theyíre either three lines or theyíre actually very difficult. And all of the difficult ones, by the way, are equivalent to each other.
Shall I try showing this? What do you think? So I reserve the right to absolutely give up one minute into this, and to pretend that I never thought Ė that I knew ahead of time this was actually one of the tough ones. Okay, so weíre just gonna see. Weíll see if this is one of the easy ones. What we want to show is this. Letís start by saying suppose that A has a non Ė Weíre going to show that AX Ė Well, here, A is 1:1, if and only if, det A transpose A is non-zero. Okay, letís try that. All right, so letís try a few things here. Letís start by saying, assume A were not 1:1. Okay? That means, well by definition, it says that its null space is not just zero. That means that there is some X which is non-zero, but AX = 0. Thatís what it means. It means thereís a non-zero element in null space.
Now if AX is zero, then surely A transpose AX is zero because I could write it his way, thatís the zero vector, and A transposed times any zero vector is zero. But I could also write it this way and write A transpose AX. I can re-associate. Ah, but now we have a problem. We have a square matrix here multiplying a non-zero vector and we get zero. And that implies that the determinant of A transpose A is zero. Okay? So we just showed that if this fails to hold, this fails to hold.
And now we do the opposite. Now letís assume Ė thatís one way. Letís assume now det A transpose A is zero. Assume thatís zero. That means the matrix A transpose A is singular. That means thereís some non-zero vector, which gets mapped to zero like that, where X is non-zero. Now let me ask you a question here. If I had scalars, and I wrote down Ė Actually, can I just cancel the A transpose off of here? Thatís an excellent question. Can I just write this as A transpose A = 0, as A transposed times AX = 0, and now say A is non-zero, so I can just A transposed non-zero and just remove that. Can I do that? Can you cancel matrices? No, in general, no you absolutely cannot. So itís entirely possible you can have the product of two matrices, both non-zero, and you get zero. So you canít cancel matrices. Okay?
Thatís not going to help, but watch this. This is actually Ė Here Iíll show you a trick, you may have to pan up here. Itís going to set the pace for the class. I can move it up. All right, what happens is this. We know this, and hereís the trick. If this is true, I can certainly multiple on the left by A transpose and get that because X transpose times a vector thatís zero is surely zero. Thatís the case. Now Iím going to rewrite this in a different way. Iím going to write this as AX transposed times that. And thatís just one of the rules for transposition. You transpose a product by transposing each thing and reversing the order. Okay?
Now you may know that the Ė Well, I guess its part of the prerequisite. I think actually weíre going to cover it in a few slides anyway. But in fact, the transpose of a vector times itself is actually the square of its norm. Thatís actually this. This is actually nothing more than the sums of the squares of the entries. This is AXI squared sum on I. Now when a sum of squares of numbers is zero, every single number is zero, period. And in fact, what this says is, therefore AX is zero. Thatís just what we wanted, because now I have an X in the null space of A, and its non-zero, and therefore A is not 1:1. Okay?
By the way, I wonít do this very often. I would actually encourage you to see how much of Ė Most of the statements in the notes are things you should check. These things get boring after a while, but you should actually try to check as much as you can. Most of it is quite straightforward. Some of this stuff for now is not.
Okay, so let me say something about the interpretations of the null space. Well, suppose Y = AX describes a measurement set up or something like that. Then what it says then is something in the null space is undetectable. It is a pattern that is undetectable from the sensors. Because when you have that pattern, all the sensors just give you flat zero reading. And you donít know if maybe what youíre looking at is zero. In fact, it gives you the exact same signature, zero. It also says this, it says that X and X Ė is you take any vector and you add anything in the null space, they are completely indistinguishable according to your measurements, totally indistinguishable because they give you the same readings, the same Y, the same AX.
Now in fact it characterizes the ambiguity in the measurement. Thatís what it does. Now if Y = AX represents something like an output resulting from an input or action, X, then basically it says that something in the null space is actually an action. It is a non-zero action that has no net result. Thatís what it means, and in fact in this case, the null space characterizes the freedom of input choice for a given result. So actually having a big null space in a measurement problem is, generally speaking, bad because it means thereís lots of ambiguity, if what youíre trying to do is recover X.
If on the other hand, X represents an action that you can take, and Y is an outcome, having a big null space is great because it says there are lots of Xís. And you have an organized way, in fact, of finding Xís that will satisfy the requirements. So thatís good, and it means that you can drop back, and you can optimize something else. You can drop back and find a small X or whatever you like.
Now the range of a matrix is another one of these fundamental subspaces, or whatever. And itís denoted with a calligraphic R or a script R or something like that. Itís range of A. This is also a subspace. It is not a Ė it is a subspace of the output dimension. So it is actually a subset of Rn and it is simply the set of all vectors you can hit. It is the set of all vectors that have the form AX where X ranges over all of Rn. So this is a subset of Rm. Okay, unless youíre matrix is square, the null space in range are sets of vectors of completely different dimensions.
Now you can interpret it this way, itís a set of vectors that can be hit by the linear mapping Y = AX. Thatís what it means. Itís the set of all things you can achieve, or something like that. But itís also, in terms of the columns of A, itís very simple. Itís the span of the columns, so null space zero means that the columns are independent. If the range Ė Here the range is simply the span of the columns. Okay? And you can also think of it this way. Itís the set of right hand sides, if you write it as AX = Y for which AX = Y has a solution, so thatís the idea.
Now here it has all sorts of interesting interpretations and applications in different areas. So for example, Y = AX represents a design problem where X is an action, Y is an outcome. Range of A is the set of all things you can actually achieve. So if someone says, ďI would like this outcome.Ē But if that outcome is not in the range, it means, ďSorry, I canít do it. I canít get that outcome. Maybe I can get close.Ē Weíll talk about that. ďBut I canít get that outcome.Ē In the case of a sensor reading, it is also very interesting. Y = AX, the range means it is a possible sensor measurement. What would be a practical use for the range in a case like that?
Instructor (Stephen Boyd):Got it, thatís it precisely. So what would happen is this, if in fact the measure Ė If you believe the system is modeled by Y = AX, Y is a vector of measurements. If Y comes and the first thing you do with Y is check, is it in the range? If the answer is yes, no problem, you can then proceed to whatever youíre doing. If the answer is no, you want to throw an exception. You want to throw an exception to whoever passed you the Y, and basically says sorry, that Y could not have possibly been something of the form AX. By the way, that might mean one of you sensors is off. So this could be a sensor detection routine or something like that. And you could do all sorts of interesting stuff there. How would you check? Suppose you knew that one sensor was bad, what do you do? Whatís the algorithm? How do you check it? That can be a homework problem, by the way. Want to make a note of that? Thatís simple.
So letís talk about it. How do you do that? If I give you Y = AX, X is in R10, Y is in R20. Roughly speaking, youíve got about a 2:1 redundancy of sensors over parameters youíre estimating. But thereís a trick. One of the sensors has failed. How would you find it? So, I want you to tell me how to find out which sensor. I want you to come back and say, ďSensor 13 is no good.Ē How do you do it?
Student:You put in two inputs.
Instructor (Stephen Boyd):You know what, actually itís passive. You donít have Ė You canít do that. You donít have the Ė itís passive. You canít control X, youíre just given readings, thatís all. You had a suggestion.
Student:Take the [inaudible] of Y [inaudible].
Instructor (Stephen Boyd):I donít think you can take the inner product of Y with the rows of the sensor matrix because the have different dimensions. The rows of the sensor matrix are of size Rn, the input. And the output is of size Ė if it was a ten in our example, the otherís R20, but youíre close. What would you do? [Crosstalk]
Instructor (Stephen Boyd):Find the null space of what? Student:
Instructor (Stephen Boyd):Let me ask you this. Suppose I have a sensor suite, and I say, ďPlease take out sensor 13, what does that do to the matrix? How do you express that in terms of what you do with Y = AX, Y used to be 20 high, now itís now 19. How do you get that y and what do you do with A? What do you do?
Student:You delete the 13th row.
Instructor (Stephen Boyd):You delete the 13th row of A. Okay, so thatís what you do. So basically, if everything is fine, if itís Y = AX thatís what itís been for a while, and I come along and say, ďNuh, uh, sensor 13 is going out for maintenance.Ē Then you have a reduced A red and you remove the 13th row. Thatís what it means. Does everybody have this? So now you have 19 x 10 matrix. Okay.
You can calculate the range of that 19 x 10 matrix. Letís suppose Ė Letís say you can do that. You will within one lecture. You can calculate the range of that matrix, and you can check if the reduced measurement is in the range. If itís in the range, what does it mean after I remove the 13th row? You have to be careful. What does it mean? Itís kind of complicated.
Instructor (Stephen Boyd):Iím going to turn it around and say that it means that the 13th Ė It means, if now you remove a row, and you look at the reduced Y, and itís in the range of the reduced matrix, what it means is this; the 13th sensor might not have been bad. Sorry, itís a strange statement, but thatís exactly what it means. What if, I remove the 13th row; I look at the range of that matrix. I look at the range of that matrix or something like that, and actually now the reduced Y is not in the range. It means you still have a bad sensor. It means youíre still wrong. It means the 13th was not the probably. Iím assuming thereís only one sensor failure.
I have a feeling that Iím just making things more confused. This is what we call a homework problem. You will be seeing this shortly. Trust me. Okay, so. Or you can go back and see if any of this makes sense later. All right?
Now, thereís a special case, which is when you can hit all vectors in the output space. Thatís when the range is Rm. That means you can hit everything. That says, you can solve AX = Y for any Y. That has all sorts of obvious interpretations. Weíll get to that. Another way is to say that the columns of A span Rm. Thatís what it means. Now in this case, this is not obvious. It says there is a right inverse of A. So, this says, for any Y you give me, I can find and X for which AX = Y. This third bulla here is much more specific. It actually says, not only that, but in fact I can construct the X as a linear function of Y. Thatís what it means.
So here, if in fact thereís AB with AB = I, thatís a right inverse. What it is says is this, it says that ABX is X, well of course because IX is X. On the other hand, it just says Ė You know what, I shouldnít have used X. Sorry. There we go. It was perfectly okay, just horrible notation.
What this says is Ė this is quite specific. It says, ďOh, you want an Ax for which AX = Y, no problem. Letís take your Y and multiply by B.Ē So B actually is an automated design procedure. Thatís what it is. If this is a design problem, itís a design procedure. It says, ďYou want this velocity and position? Multiply by B that gives you a force program that does it.Ē Okay, so thatís what it is.
This is the same as saying that the rows of A are independent. Again, you can check. And itís basically, now it gets tricky. This is the null space of AT is zero. Now, some of these look complicated. Some actually are, most are not, but I am going to defer a lecture until we show all of the details. So this is the null space of AT is empty. And you can say this, these two are actually basically the same thing because to say that the null space of a matrix is equal to just zero is basically saying that the columns are independent. But the columns AT are the rows of A. And so, this one and this one are simply the English equivalent of that statement. Okay? And this one would probably follow up from one on the previous page or something like that.
I want to warn you, some of these are not obvious. But by the way, I would not Ė You should sit down and sit down and see how many of these you can actually show. You wonít be able to show some of them, like that one. Maybe that one, Iím not sure, right away. But within a lecture you will be able to.
Okay, so letís look at various interpretations of range. Suppose a vector V is in the range of a matrix, and a vector W is not in the range. Weíll see what does this mean. If Y = AX represents a measurement. Then if you observe V that means itís a possible, or maybe a better name is a consistent sensor signal. We just talked about that a few minutes ago when I confused all of you, most of you any way. So if however, you get a sensor measurement which is not in the range, that basically says itís impossible, if you really believe Y = AX or it means itís inconsistent. Thatís what it means.
Now if Y = AX represents the result of an output given an action, or an input or something like that, then it says, if youíre in the range, itís something thatís possible. Itís achievable. However, something not in the range is something that simply cannot be achieved. Thatís what it means. So, in this case Ra characterizes possible results or achievable outputs. Thatís what it does.
Okay, now so far we have talked about the size of A has been arbitrary, it doesnít have to be square. And in fact, once you find my greatest complaint with, aside from the fact that they are profoundly boring, linear algebra classes. The first linear algebra classes, my first complaint is that they focus on square matrices too much, which actually in engineering has essentially no use. Weíll see why, but in fact, most times in engineering when you have, in any kind of engineering application, you have a square matrix, except weíll see some examples later, but in general it means something is deeply wrong. Weíll see why. Youíll know why later, why in fact matrices should either be fat or skinny or something like that.
It means kind of that you set things up so there are just enough knobs to turn to do what you need to do. Thatís silly. The whole point is you either want to exploit redundancy or something like that. There are cases where you want to do that. If you want to measure something, if you want to measure six things, and sensors are super duper expensive, and for some reason you can only afford six, okay fine. But in general, this makes absolutely no sense. Itís not robust either. In fact, if you have seven sensors to measure six things, at least then you can actually add a nice layer of software that will actually detect and remove a sensor that fails, automatically. Using the method which I deeply confused all of you with, and which you will see on the homework shortly.
Hey, what do you think? Do you think weíre allowed to, even though weíve posted Homework two? I think we can do it, weíll see. All right. We just have to make Ė it may have to wait until Homework three, weíll see.
Now weíre going to look at a square matrix. You see, itís invertible or non-singular if its determinant is non-zero. And this is equivalent to many, many things. The things like the columns of A are a basis for Rn. That means they span Rn; that means its range is everything. And it means that theyíre independent which means that, in this case, the other half of being a basis, so leave that. It means a null space is zero. Thatís the slang. The null space is the set consisting of the zero vector. I donít think Ė I quit. Iím going to start saying null space is zero.
It also says the rows of A are basis for Rn. It says Y = AX is a unique solution for every Y in Rn. So, it means it has now Ė thereís a matrix that is both its left and its right inverse, and itís the only one. So, if youíre a left inverse of a square matrix, youíre also a right inverse. Okay? Itís the same as saying the null space of A is just zero, and the range is everything. And itís the same as saying the determinate of (AT) A and det (A) AT is non-zero. By the way, for a square matrix, this is easy to show because here, I can write this, det (AT) A is det (AT) x det (A), right? That I can do. Can I do that here? If I wrote here, thatís det (A) X det (AT), what would you say to me?
Instructor (Stephen Boyd):Youíd say syntax error, exactly. Why? Because det takes as argument only a square matrix, and in this problem there is M and there is N and they need not be the same. So this is a syntax error, which is a terrible, terrible thing, right? Because it means the most casual parsing would reveal the error. Weíll see later that there are some deeper semantic errors. Thatís where the syntax works out but the meaning is wrong. Strangely, a semantic error is actually less of a crime. Okay, but in this case, theyíre square, so we can write these out, and these are very straightforward.
So whatís the interpretation of an inverse? Well, I mean, this is kind of obvious. It basically undoes the mapping associated with A. Actually, the cool part is its both Ė it can work as equally well as a post equalizer or a pre-equalizer or a prefilter, and Iím using some of the language of communications here. In other words, you can in this case, you can send something through a channel, and then apply B to reconstruct what happened. Thatís a traditional equalizer. Or, before you send the signal, you can process it through B, and that has lots of names like, one is like a pulse shape or pre-equalizer, or something like that, in communication. So thatís the idea.
Okay, so thatís that. That youíve seen. Iíll mention one other thing involving that. One interpretation of inverse, in fact you can also use it for things like left inverses and right inverses, but letís take B is A-1, and letís look at the rows of the inverse of a matrix. Well, letís write this, this is Y = AX. If you write it out column by column is Y = X-1A-1 = XnAn, of course X = BY. This first one is Y = AX and this is just X = BY, because thatís the inverse here. Now, if I write this way, I want to write out Y column by column, this one, and Iím going to write X out row by row. This is XI is BI until they transpose Y, because when you multiply a matrix B by a vector Y, if you do a row interpretation it says you take the inner product of each row of B, with Y. Did I get that right? I did.
Hey, that says look. These XIís I can plug them in here and I can rewrite it this way. And now you see the most beautiful thing. You see what the rows of the inverse of a matrix are. They are things that extract the coefficients of Ė so this basically says something like Y is a linear combination of the Aís. Now if you want to know what linear combination, or how do you get the mixer, whatís the recipe? It says you take the inner product with the rows of the inverse matrix and these give you exactly the coefficients in the expansion of Y in the basis A-1 through An. Okay?
Sometimes people call A1 through AN and B1 through BN, dual bases. Thatís actually kind of common now in some areas of signal processing. They call it Ė this goes back a long way, and itís interesting, actually to work because weíre going to look at some other stuff later. But let me ask you something. What can you say about this? What is that equal to? What is it?
Instructor (Stephen Boyd):Yep, thatís one. If I = J, and its zero if I ? J. Okay? By the way, actually very soon weíre going to talk about things like orthogonality and things like that. So some people refer to these as biorthogonal sets or something like that. In other words, itís kind of weird, it says that if the Bís are orthogonal to the other Aís but it doesnít Ė what do you know about this? What can you say about (AIT) AJ? Not much. There is only one thing I can say. I can say something intelligent about that. What can I say intelligent about that? Itís what?
Student:Itís non-zero. Instructor:
How do you know its non-zero?
Instructor (Stephen Boyd):So I would have a column of A0, whatís wrong with that? I could remove it. So, remove it.
Instructor (Stephen Boyd):Thank you. Itís all the same thing. Yeah, A could not be invertible if it had a column that was zero. Right. Okay. All right. All of these things are very abstract now, but weíll see them in applications and it will Ė Yeah?
Instructor (Stephen Boyd):No, theyíre not. Ah, but a basis doesnít mean that theyíre orthogonal. Weíll get to that. Yeah. Weíll get to that. By the way, if your background is physics or its something like applied physics or something like that, most, a lot of the matrices you encounter will be not only square but symmetric. And so, a lot of operands will be self-adjoined. And that means a lot of things because thatís what youíve always seen. If thatís the case, howís my diagnosis? Thatís a diagnosis there. Okay, so in that case this class is great for you. Youíre going to see all sorts of stuff, and youíre going to learn all the things that are almost always true, like eigenvalues are real for you, right? Yeah. Got it. Good. Youíll be fine, everything will be great.
Actually, then it comes up. Youíll see itís going to come up in physics, too. So, weíll see. There will be cool cases where itís not symmetric. And then weird things will happen and all your fellow physicists will be deeply and profoundly confused because for them everything is symmetric, all bases are orthogonal and so on.
Rank of a matrix. The rank of a matrix is simply the dimension of its range. And here are some facts, these are not obvious. Things like the rank of A is the rank of AT. Another way to say this is the rank of A is the maximum number of independent columns or rows of A you can have. And that means immediately the rank of a matrix cannot be any more than the minimum of its input or output dimension, number of rows or columns. I actually said those backwards, but thatís fine. Thatís what it means. If you have a 3 x 5 matrix, its rank could not possibly be more than three. It is at most three.
Hereís a very basic fact, some people actually make a big deal about this and they circle it, and itís in some fancy box and everything and itís called sometimes, the fundamental theorem of linear algebra or something like that. They make a big deal. They have a theme song, and anyway. This is sometimes called the fundamental theorem of linear algebra. Itís beautiful, but itís not that big a deal. Actually, youíll find out itís actually much cooler to be able to do stuff with linear algebra than to celebrate pretty formulas. You know, itís like Ei? + 1 = 0, you get over that and everythingís fine.
What this says is very interesting. It says that the rank of the matrix, thatís the dimension of the set of things you can hit. Thatís this, plus the dimension of the null space, thatís the dimension of sort of the ambiguity set. The size of the set of things that get crunched to zero is equal to N, which is the input dimension. My interpretation of this, well Iím sure lots of people interpret it this way, is this is something like conservation of dimension or if you like, something like degrees of freedom. I mean, maybe thatís a better way to say it. Donít take any of this too seriously, but itís just to give you the idea.
So the rank of a matrix is actually the dimension of the set hit, but letís make this very specific. Letís take A is NR, well we just had something that was 20 x 10. Thatís it; so by the way, the name for that technically is thatís a skinny matrix because itís got 20 rows and ten columns. Iíll use that terminology. Itís going to come up a lot. So I have a skinny matrix, R 20 x 10, and Iím going to tell you that the rank of A, by the way could it be 11? No, but Iím going to make it eight. Okay? So there it is, itís a 20 x 10 matrix whose rank is eight.
Now what does this mean? It means that if you ask, what can you hit, whatís the set of things that have the form AX? Well, the answer to that is this. First of all, thatís a subspace of R20, so you have twenty long vectors and thereís a subspace. It could have a dimension up to ten, but in fact it only has a dimension of eight. Thatís what it says. Itís got a dimension of eight. So basically it says, if X is an action and Y is a result, it says that the set of things I can effect or do is actually eight dimensional. Thatís what I can do. I can do eight dimensions worth of stuff.
But wait a minute, in this case I had ten input knobs, so roughly speaking, you would look at that and say, if what you can do is only eight dimensional, you have ten input knobs, you have two redundant knobs. Guess what. Thatís the dimension of the null space, so the dimension of the null space of A must be two because they have to add up to ten. Thatís the idea here. It basically says something about the total number of degrees of freedom going in minus the degree of ambiguity is equal to the total dimension of what you can affect. Thatís kind of what it says. Itís a good idea to have a rough idea of what this is, and it will get much more specific in specific cases.
This one is very loose. Each dimension of input is either crushed to zero or ends up an output. Thatís boy, I wouldnít put that in anything but notes. Because itís actually, in some sense, I didnít even know what the dimension of an input is. What Iím saying is that if any of you said this, boy, weíd come down hard on you. I mean, unless it was casual, in casual conversation, but in anything other than that you would not want Ė On the other hand, as a vague idea it kind of explains whatís going on.
Now thereís another interpretation of rank, which is actually very, very nice. Itís the idea of coding, and it comes up in ideas like transmission and actually it comes up in compression, all sorts of things. So here it is, if you have a product of matrices, then the rank is less than the minimum of the ranks of the matrices, of the things that you multiply. This sort of makes sense because if you look over here, and you think about what you can do, it basically says Ė this one is actually very, very easy to show. This one up here. But now what this says is if I write a matrix A in factored form, if I write B as M by R matrix multiplied by an R by N, you can think of R as an intermediary dimension, the intermediate dimension. It basically says, I have something that looks like this, thatís A, but Iím going to write it this way, and Iím going to factor it, and that means basically Iím going to write it as a product of two things. Actually, if you like instead of one piece of code, itís gonna call two, one after the other. Okay?
And what this says is, in this case R is the intermediary dimension, itís the intermediate dimension here. Thatís what it is. Now if you factor something like this the rank is less than R, but in fact, if the rank is R you can factor it this way. So I can write it this way. Now this doesnít look like it has any Ė Itís interesting, itís actually extremely interesting, and tons of super practical stuff is based on just this idea. But weíll just look at one now. Actually later in the class weíre going to do something even cooler which is this. Very soon youíll know how to factor a matrix into one to make a factorization like this.
This has lots of applications immediately. Like, suppose I want to transmit Y, given X, but I pay for the number of real numbers I have to transfer between them, right? If N is 100, and M is 200, but R is 3, this makes a lot of sense because what happens is first I do this processing on this transmit side. I transmit three numbers, or whatever I said the rank was, and then theyíre reconstructed over here. Already these ideas should have ideas of efficient transmission, compression; all that kind of stuff is close to here. It is very close to this.
Letís just look at one application. Letís suppose I need to calculate a matrix vector product Y = AX. Now you would think that canít be that big a deal. Why would that be a big deal? And it would not be a big deal if A were, for example, 5,000 x 5,000. Right? Or it depends on what weíre talking here, A could be 20,000 x 20,000, and you could do this doing NPI and a whole bunch of processes, or something like that. Or A could be 100 x 100 and you could do this in a Ė actually, 40 x 40 and you could do this in a shockingly fast way. Okay? Iím just multiply Y = AX. This could be Ė What are you doing when youíre multiplying by A? This could be an equalizer. It could be precoder. It could be some estimation youíre doing, it could be anything. So weíre not worried about that.
Now, suppose you knew that A could be factored, and letís suppose Ė The interesting case is when the rank is much smaller than either M or N. But letís look at this. Now, if I compute
Y = AX directly, thatís MN operations because I calculate the inner product of each row of A with Y, and each of those requires N multiplies N-1 adds. You know, roughly. Okay? That would be MN operations. With a typical sort of gigaflop machine Ė well my laptop comes out faster than that, so itís pretty fast. Thatís okay because as fast as you can do this, thereíll be some application where you want to do it faster, or you want to handle bigger things. That would be MN operations.
Now suppose you do it this way, you write A as BC, and you associate it this way. You first operate by C, and then by this. Well, if you multiply by C first, itís going to cost you RN operations, then it costs you MR, and thatís (M + N) x R. Okay? Just to plug in some numbers here letís suppose Ė Letís make both M and N equal to 1,000, and letís take R = 10. So itís a 1,000 x 1,000 matrix and its rank is 10. The saving now is pretty obvious. In the first case itís 106 and in the second one itís 102,000, so itís a million versus 20K. Okay? Thatís a 50:1 savings in multiplying AX. Has everybody got this? This is the idea.
By the way, later in the class weíre going to look at very good methods for not just Ė very soon youíll know how to factor a matrix this way. If it is low ranked youíll factor it. Later in the class, weíll do some things that are very cool. Weíll actually look at how to do approximate factorizations. Thatís even cooler. You will be given a matrix A; you will know how to find, for example, an approximation of A thatís ranked ten. Okay? This has tons of uses all over the place. Okay, thatís what it means. If you see a matrix thatís low rank, it says actually, you could operate on it quickly or it means itís got something to do with compression or something like that.
Student:How do you know if a matrix is low rank in the first place?
Instructor (Stephen Boyd):How do you know if a matrix is low rank in the first place? You will know, the sort of exact answer, if you want exact rank, youíll know in one lecture. If you want to know Ė much more interesting question, much more practical is this, given a matrix, how do you know if itís approximately rank 10. Youíll know that in about five weeks. By the way, did you notice that I didnít answer your question at all? Okay. But I did it forcefully. So, thatís okay. We put your question on a stack, it will be popped. Okay?
One last topic is change of coordinates. Now, the standard basis vectors are these EIís like this, and you obviously have this expansion. I mean, thatís kind of silly, but itís a pedantic way of writing this out. But you would say in this case that the XI are the coordinates of X in the standard basis. Normally, you just say XI are the coordinates. Okay? Or the entries of the coordinates or something like that. Thatís how you write it.
Now, if you have another basis in Rn, if T one through ten is another basis, you can expand a vector X in another basis, and weíll call the coordinates ~X one up to ~X N. Okay? And of course, the fact that thatís a basis tell us that Ė That tells us two things. It says that the Tís are actually independent, which means thereís only one way to write it this way. And it means, also, that any X on the left hand side here can be written in this form. Okay? It turns out we can get these ~X very easily, by a couple of matrix operations. So here ~X and the coordinates of these, and what weíre going to do is weíre going to write this out as a matrix, so weíll write that as X = T[~X]. Now, this is basically a character representation of this, so you write it this way. But now, immediately we get the answer. T is invertible, and therefore ~X is simply T-x, and indeed it turns out that the inner product of the Ith row of the inverse of a matrix extracts the TI coordinates. Okay?
For example, if someone says Ė I mean actually, where would this come up all the time? It comes up in things like vision and robotics. And actually, it comes up in navigation as well. So, for the rest of we just use a fixed coordinate system, and seem to go through life just fine, but for seem reason in things like dynamics and navigation and things like that, everyone has to have their own personal coordinate system, and so you have to transform things from one to the other. For example, in robotics, every link of the robot arm has its own personal coordinate system, and you have to transform them. I donít know why people do this, but they do. This is kind of the thing that would allow you to do that. Now you know what it is, itís sort of the inverse of a matrix and it the rows extract the coordinates, and things like that.
Now, you can also look at what happens to a matrix. If you have Y = AX, this is all in the standard coordinate system, and now suppose you write X in the T coordinates. And then weíre going to write Y also in the T coordinate, and the T coordinates will be ~X and ~Y, then just by working it out you get ~Y = (T-AT)~X , thatís what it looks like. So, this thing is something thatís going to come up a bunch of times, not a bunch of times, but it will come up several times, and it should light up a light in your brain. First of all, itís got a name, its called similarity transformation, thatís the first. For the moment this is all very abstract, but you should think of it as basically representing Y = AX but you represent both X and Y in a different basis, and youíll get something that looks like that. So weíll get back to that later.
Now for some reason at the end of this lecture, thereís some stuff thatís more elementary than all of this and should have been at the beginning. But thatís okay, weíll do it now. It has to do with just a quick review of norms and inner products and things like that. Now, the Euclidian norm of a vector is actually the square root of the sum of the squares. Some of the squares you can write as X transpose X. Thatís the inner product of a vector with itself. And itís the square root of the sum of the squares. And itís supposed to measure the length of a vector. And indeed it measures the length of a vector with N = 1. Itís the absolute value. When N = 2 itís actually the Euclidian distance in the plain, and for three itís Euclidian distance in R3. But it makes sense to talk about then norm in R500. It makes perfect sense, and people talk about it all the time.
Now there are various things related to this, but Iíll mention some of the basic ideas. So, itís supposed to measure length. Itís supposed to be a generalization of absolute value. By the way, some people have actually Ė the double bars are to distinguish it from the absolute value. But presumably if you were super, super cool, and kind of just grew up from a child onward using vectors and matrices, you would probably write this this way, and there are people who have reached that level of development, of linear algebraic development that they write it this way.
The same way weíve got to the level where zero, we donít mark it; we donít put a doo dad on it. We donít put an arrow on it, like our friends in physics. We donít make it bold or something like that. So for zero weíre fully developed. Zero is totally overloaded for us. Itís the same thing for the number, the complex number, matrices, vectors, everything. This is a hold over. Actually, itís 150 years old, maybe more; something like that. Thereís only one minor catch here. Unfortunately, thereís a bunch of fields where people Ė itís useful to interpret this as the element-wise absolute value. Thatís the hitch there, but itís supposed to be a generalization of this.
So it satisfies a bunch of properties, these are all easy to show. This one, I believe you did show, or will be showing. Is that right? No, itís close to something you will show or will be showing, which is because showing, which is the CauchyĖSchwarz inequality. Is that right? On this homework? Iím always a couple of homeworks ahead, so I canít remember. Okay, this is a triangle inequality. Itís kind of obvious. It says basically, if thatís X and thatís Y, thatís X + Y, and it says that the length of this is no bigger than the length of X plus the length of Y.
The picture makes it totally obvious when they would be equal, and thatís when X and Y are aligned. This is called definiteness. It says if the norm is zero, the vector is zero. Thatís easy to show here because is the square root of a sum of squares is zero; the sum of the squares is zero. But if a sum of numbers that are non-negative is zero, each number is zero. That means each XI2 is zero. Therefore, each XI is zero.
Now, there are some things related to norm. For example, itís extremely common in a lot of applications where N varies. By the way, if N doesnít vary in an application, like if N, if youíre doing control or something like that, you donít Ė when N is fixed, itís like six and itís the three positions and the three momentums. You donít divide by N or something like that. But in other areas, like signal processing, where you get vectors of different lengths. For example, an audio clip is a vector, but N depends on what the length of the audio segment is. And you want to talk about different things like that. Or it could be any other signal.
Itís extremely common to normalize the norm by square root N. Thatís the same as taking the sum of the squares of the elements, dividing by N. This is called the mean square value of the vector X. And that is obviously the root mean square value, thatís the RMS value of a vector. And, it just gives you a rough idea of the typical size of that the entries would be. Thereís actually some things you could say about it. But so if the RMS value of a vector is one or one point three, it means that basically if you look at a random entry it should be on the ord. of one point three.
By the way, if a vector has RMS value of one point three, how big could the entries be? Could they be 1,000? Could you have an entry thatís 1,000?
Student:It depends on [inaudible].
Instructor (Stephen Boyd):Iíll simplify your answer. Ready? Yes. Thatís his answer. Thatís correct. You can have a vector whose RMS value is one point three and you could have an entry in it thatís 1,000. That only works if you have a really big vector. You would have a lot of zeros and then a really huge one. You couldnít have a four long vector that had an RMS value of one point three and have an entry of 1,000. Thereís things you can say about that, but thatís the rough idea.
Now, norm is length from zero. If you take, since you have a difference, it says the norm of a difference gives you a distance. This is very important. This allows you to talk about the distance between two things. This is what allows you to do things, talk about the distance between, for example, two audio signals, or two images. You can say, and in fact here you might even give an RMS value, and in both cases it would be very interesting. What would you do? You might have the original and some encoded and subsequently decoded signal, and you would say, ďI was within in 1 percent.Ē The RMS error is 1 percent that means the norm of the difference divided by the norm of, say, the original signal is about point-zero-one. Thatís what that would mean.
Now, the inner product, of course weíve been using it all the time. Thereís lots of ways to write it, but one way is this. Itís actually just X transpose Y, nothing else. In fact, we wonít write this. Actually, itís kind of cool. Thereís other ways people write this. Iíll show you some of them. Hereís one that you see. Again, weíre going toward physics now. This would be the notations there. There are others. You could write X, Y thatís one way. And Iíve seen this. You could also do dot, and depending on what subfield theyíre in, your dot is either little or big thick thing. These are different notations for inner product. Does anyone know any others because thereís probably some others too?
All right, so the inner product, you can work out these properties is quite straightforward. In fact, I donít think Iíll say much more about that, Iíll zoom on. Because these are kind of obvious.
The CauchyĖSchwarz inequality is very important and it says this, that the inner product of two vectors in absolute value is less than or equal to the norm of the product. From this, you can immediately derive, for example, the triangle inequality right away. But thatís what this says. And this is something, I guess you show on the homework, or have shown, or will show in the next half day or something like that.
The angle between two vectors is simply the arc cosine of the inner product divided by the product of the norms. Now, this number here is between plus and minus one. And the arc cosine, letís draw the cosine. And we usually take it between zero and 180, or something like that. So we donít distinguish between minus 10 degrees and plus 10 degrees. Itís the unsined angle, is the way to say it. So itís the angle between things. And this allows you to do all sorts of cool stuff. To talk about two images, and say that they have an angle of 3.7 degrees with respect to each other. It makes perfect sense. Or an image can come in and you can have a database of 10,000 images and you could ask which of the 10,000 images is the one that just came in. Have the largest inner product width, which would mean the smallest angle. And this would make sense, and you can already start guessing these things have applications.
Itís kind of weird to talk about the angle between two vectors, in for example R one million because thatís what youíre doing if youíre looking at 1K by 1K images. After a while Ė donít try to visualize, by the way, R one million. Or at least work your way up there because you could be in big Ė yeah. Start with R4, when youíre comfortable at R4, maybe R5 or R8, something like that. There also might be some drugs you might have Ė some pharmacological things are needed to really, really understand R.
Actually, Tom Kover gives a wonderful talk, basically showing beyond a shadow Ė makes it completely clear. You know, we pretend, so people ask, ďWhat are you doing here?Ē When weíre talking about the angle between two images, ďThereís no such thing as the angle between two photos.Ē And I go, ďYeah, sure, its 3 degrees.Ē Then Iíd say, ďYou donít understand, we do this kind of applied math stuff, and the way it works is we talk about distances and angles, but in huge dimensional spaces.Ē And they say, ďCan you really visualize that?Ē At that point you have the choice of either lying and saying, ďOh yeah.Ē
Or what you would really say is something like this, youíd say, ďWell, no not really.Ē The truth is not even close, right? But youíd say, ďWell not really, but we use our pictures that we draw on two dimensional paper or in our offices, you get a grad Student: to say hold your hand up here, and then you get in there and you poke around this way.Ē You say, ďWe use that intuition from R3 to guide us in making image compression algorithms in our one million, right?Ē So you might ask how useful is your intuition from R3?
Tom Kover has a series of examples showing that you know nothing. Heíll show you all sorts of things. Theyíll be true for N = 3, 4, 5, for six on theyíre so false itís amazing. Thing are like wow Ė I should collect some of these from him because theyíre really good, and there are about four of them. One is like if you pick a point at random in a unit ball in arc 20, itís basically; I think they call it a sphere hardening. Thereís something like a 99.99999 percent chance that you are within 1 percent of the boundary of the sphere. Just totally insane things, but okay, just to let you know, so that at least we were honest about it.
Itís very important to understand what it means to have a positive inner product or a negative inner product or something like that. Is should say this, if X and Y are aligned, that means they point in the same direction, that says that the inner product is the product of the norms. If theyíre opposed, that means that the inner product is as small as it can be. If theyíre orthogonal, it means that thereís a 90 degree angle between them. It means X transpose Y is zero. And itís very important to understand what it means to have a positive inner product or negative inner product.
So positive inner product basically means that the angle between them is between zero and 90. If you have two things, as you twist them out, the angle goes from 90, and the inner product is quite positive. Itís as positive as it can be when theyíre aligned. As they twist along like this, and when they get to 90 degrees, the inner product is zero, then the inner product starts getting negative, like that. So a very, very crude thing to say would be something like this. If the inner product between two vectors is positive, they point roughly in the same direction. And I mean really roughly. I mean theyíre not fighting each other. If you add them, thatís actually one of the characterizations, that the norm of X + Y is bigger than the minimum of the two. You make progress.
If you have a negative inner product, it means the angles between 90 and 180. This allows us to combine things like a half space. If you have a set of Xís who have a negative inner product where theyíre give vector Y, thatís this. You draw the vector Y and the vectors that have a zero inner product with Y, basically those that are orthogonal to Y, thatís the hyper plain with Y as a normal vector. Thatís here. This shaded region, these are actually vectors whose inner product is negative with Y. So they have an angle more than 90 degrees, thatís sort of the picture. This is called the half space, and these things come up and all sorts of things. A half space is the solution of a linear inequality.
Iíll start in on the material of the next lecture, which youíve probably seen in one or two other contexts, maybe in the context of Fourier or somewhere. So weíll talk about orthonormal sets of vectors. The Graham-Schmidt procedure or when you right it out as a matrix, in matrix terminology itís called the QR factorization. And then weíll talk about various things we can show with this.
We start with the idea of an orthonormal set of vectors. So a set of vectors is called normalized if norm UI is one. So normalized just means the length is one. And by the way, some people call those unit vectors. I donít know why. Other people call them direction vectors, vectors whose norm is one. So thatís very common notation for it. Unit vectors is a bit weird because for most people unit vectors means eI, e3, e5, but still.
A set of vectors is orthogonal if different vectors in the set Ė another way to say this is mutually orthogonal. It says that any two different vectors in the set have a 90 degree angle between them. Okay? And itís orthonormal if both. Now, watch out because people will simply say, you will say on the streets, U1 through UK are orthonormal vectors. That is basically completely wrong. By the way, thereís nothing wrong with saying U1 through UK are normalized vectors. Thatís cool because to be normalized is an attribute of a vector alone. To be orthogonal, it makes no sense to say Ė if it made sense then you could have your set of orthogonal vectors and mine, and we could throw them together. If you have five and I have seven, now we have 12 orthogonal vectors. It doesnít work that way.
Normalized vectors works, no problem, we can combine your normalized vectors with mine, and now we have twelve normalized vectors. Orthogonality refers to a set of vectors. It is not an attribute of a vector. Okay. How do you write that in vector notation? Very simple, if you have some vectors U1 through UK, you put them in a matrix like this. You just simply make them the columns of a matrix, call that U. And to say this, is to simply say that U transpose U is I, period. Thatís it. Why? Because U transpose is U-1 transpose up to UK transpose like this. And then you multiply by U-1 through UK. And you can see now, the II entry of this matrix is basically UI transpose UI. Thatís one and the off diagonals are zero.
What would be the matrix way of saying that a set of vectors is orthogonal but not necessarily normalized? How would you say it?
Instructor (Stephen Boyd):U transpose U is diagonal. That means they are Ė thatís a mutually orthogonal set. How do you say Ė whatís the matrix way of saying a set of vectors is normalized? How do you say it?
Instructor (Stephen Boyd):In terms of the U transpose U, it would be the diagonal entries of U transpose U would be all ones. How about the off diagonal entries? Who knows? That depends on the angles between them. Thatís what it would say.
Now, you can quickly show that orthonormal vectors are independent. How? Well, if you write something this way, and you multiply it by UI, you will get that I is zero. And that says, if you have an orthonormal set of vectors, U1 through UK, it turns out it spans our view. You have to be very, very careful here, because if I say that I have an orthonormal set of vectors, thatís exactly the same as saying U transpose U is I, exactly the same. This is the matrix way to say that.
Watch out because it is not the same as that. Although, that will hold in some special cases. Okay? In general, this is not true. An actually, thereís a quick way to remember it. If you have a bunch of vectors and you stack them up. So hereís three vectors in R10, so I write them this way, and I write this and so you get a 3 x 3 matrix here. Right? Like that. Thatís U transpose U equals I. Okay?
Now, suppose I got confused and switched them around and I wrote, (U) U transpose equals I. By the way, thereís no syntax crime here. None because this would be a 10 x 10 matrix. This would be a 10 x 3 matrix. And that would be a 3 x 10 matrix, and there is no syntax crime. So a casual parsing doesnít reveal any problem with this. The parser goes right through it. Okay? But as a quick way to know that youíre in deep, deep trouble, and that this, when you multiply two matrices together, the rank of this matrix is three. The rank of that matrix is three, and the rank of the product, therefore, is at most three. And whatís the rank of the identity matrix in R10 x 10? Itís ten, so we have a problem here. So this is a semantic error if you do this. Itís a semantic error.
By the way, weíre going to find out that that 10 x 10 matrix is not the identity. What it is is really interesting, and weíre going to see it later. But for the moment, just remember it is not the identity. Okay, Iíll mention the geometric properties, which are actually very interesting. Suppose you have an orthonormal set of vectors, which is to say a matrix whose columns are orthonormal. Ooo, that was slang, did you hear that? But Iím trusting you that you are cool on that and arenít going to make semantic and other errors. That was slang. Thatís how, by the way, people would say it. Otherwise, youíd sound like a pedant right? If youíre like, ďExcuse me donít you mean the set of its columns is orthonormal.Ē You shouldnít talk to people like that. You should avoid them.
So suppose the columns of U are orthonormal. It says the following, if you take W = UZ. It says the norm of W is the norm of Z. Now note that the norm of W is different Ė W is possibly a different size from the size of Z, so what this says is, when you multiply by a matrix whose columns are orthonormal, you donít change the norm. Thatís what it means. Another way to say it is, people say it this way, W = UZ is isometric. ďIsoĒ means the same, and ďmetricĒ means that it has to do with the metric or distance properties. And it basically says if you have two points like this, and the distance is three between them. If they get mapped by you, it says that the distance between if thatís like X and Y, the distance between UX and UY in possibly a higher dimension will also be three. So it preserves these things.
Itís very easy to show this. Thatís just a quick calculation. I think maybe this is a good time to quit for this lecture.
[End of Audio]
Duration: 76 minutes