Instructor (Stephen Boyd):Oh, looks like weíre on. Well, I have an announcement. Yesterday, Jacob, the TA Ė somewhere, lost him. Oh, well. Heís around somewhere. Jacob and SEPD put together a prototype of the publicly available lectures for 263. I mean, not that you care, but just so you know, itís on the website now. You can go take a look at it, and it should be just open to the entire world, the lectures. So we havenít figured out yet how to, like, buzz your head out if the camera, you know, decides to point to you, or if we have to Ė I suppose we do have to; weíll figure that out later. But anyway Ė

Weíd be interested if you find any problems with it, or if your friends somewhere else have any trouble with it, weíd be curious to Ė I mean, we know that it actually works from many places, but a lot of different platforms and things like that. So just to let you know, that experiment has now reached the next step. Itís up and running. Itís in Flash, so just like Ė should be not much different than YouTube. So itís up and running. We might also make available some WMV streaming, and then actually WMV download as well. Weíve already piddled with that internally and may also make that public soon. So Ė okay.

Letís continue with orthonormal sets of vectors. Does anyone have any questions about last time? If not, weíll continue. Our topic is, of course, orthonormal sets of vectors. So a set of vectors is orthonormal. So I have K vectors in RN. Theyíre orthonormal if theyíre normalized. Thatís an attribute of the vectors separately and then mutually orthogonal, and thatís an attribute of the set of vectors, and itís orthonormal if both.

Now, you can write that very compactly in matrices. Itís U transpose U = I, and U transpose U = I is nothing but the matrix statement of normalized Ė thatís the diagonal here, the ones, and the zeros on the off diagonal of I tell you that these vectors are mutually orthogonal. Thatís what this says. Okay. Now, last time we worked out several properties of these. Weíll look at the geometric properties of a matrix whose columns are orthonormal. So letís suppose we have a set of orthonormal vectors. Oops, I said it. That was slang. An orthonormal set of vectors. After awhile, in fact, some time later today Iíll just stop worrying about it.

So you U1 through UK are an orthonormal set of vectors. If W is UZ, then the norm of W is the norm of Z. In other words, if you multiply by a matrix whose columns are orthonormal, you preserve lengths; you preserve norm. Letís see how that works. It also means that people say the mapping Z to UZ is isometric. Iso means the same, and metric means that the lengths or distances are preserved because, for example, U of Ė if you have Z minus Z Ė or I should write it this way. UZ minus UZ tilde norm is equal to the norm of Z minus Z tilde.

So if two vectors in RK have a certain distance, then their images under U will have the same distance. This is very easy to show. If you look at the norm≤ of W, thatís UZ, thatís norm UZ≤, but the norm≤ of a vector is nothing but the transpose of the vector times the vector. So Iíd put UZ, and I transpose it here. I use the transpose rule to get Z transpose, U transpose U, and now I use the fact that this is the identity, and I get this, so a very quick derivation.

Now, inner products are also preserved. So if I have two vectors, Z and Z tilde, and I calculate their inner product, and then I calculate the inner product of the images under this mapping; youíll find out that theyíre the same, and itís the same argument. You simply say the inner product of W and W tilde Ė I just substitute UZ and UZ tilde, thatís UZ transpose times UZ tilde. You switch it around and you get the same thing.

Now, because norms and inner products are preserved, that means angles are preserved. In other words, if two vectors have an angle of 15 degrees, then when you multiply them by this matrix U, their images will also have an angle of 15 degrees. It means, for example, orthogonality is preserved. Having a positive inner product, that means acute angles mapped to acute angles and so on. And, in fact, it really means that itís something like Ė well, let me careful. Itís something like a rigid transformation. In R3, there are two types Ė Iíll get to that in a minute; Iíll wait on that.

It basically means that geometry, metric geometry, is preserved Ė distances, angles, inner products, lengths, all that. Now, a very important special case is when you actually have N of these, and these have a height of N. So I have N, a set of N Ė this is slang, orthonormal vectors or an orthonormal set of N vectors in RN, and they, of course, form a basis in this case because theyíre independent, and they span RN, and in this case U, the matrix is a square. Up till now, itís been skinny. It canít be fat; itís been skinny, a pot which includes a square as a special case.

Now thatís square, and in this case itís got a name, and itís called orthogonal, and, sorry, these should have been called orthonormal, but language is weird, and orthogonal is the name given, maybe sometime in the mid-nineteenth century, and it stuck, and thatís absolutely universally used, except maybe in some weird, you know, who knows. There may be people who do some weird branch of signal processing or something that people who talked about over complete bases, they might have redefined orthogonal and have some special meaning, especially they mightíve given it some other name.

So thatís an orthogonal matrix, so itís square. It satisfies U transpose U = I. Of course, in this case, this means here that U transpose is actually the inverse of U. So it means transpose and inverse coincide. Thatís what it means to be orthogonal. Now, that also means that U, U transpose is I. Now, in general, itís absolutely false that if AB = I, that implies BA = I. If AB = I, it means only that B has a left inverse which is A, and A has a right inverse which is B. This absolutely does not mean, imply, that BA = I, but if they are square, itís the same. So in this case, we include U, U transpose is I.

And thatís a very interesting formula; it says this: itís the sum of UI, UI transpose is I. Now, some people actually give a fancy name to a matrix of this form. This is often called an outer product, so thereís lots of slang for this. I shouldíve put this in the notes. One is itís called an outer product, and it makes, sort of, sense, right? Because the inner product is when you reverse the two and you get a scalar. This is an end-by-end matrix, right? When you reverse them you get a scalar. Itís called an outer product.

Another name for it, also very widely used is dyad. So this, in fact, when you have a sum of dyads, itís sometimes called a dyadic expansion. You will hear that very commonly, and thatís the high language of mathematics. Anyone would understand what youíre talk about if you say that. So this is a dyad. Oh, some people also just refer to it as a rank-1 matrix, which is actually Ė weíll see, in fact, soon, in fact, you already know, although we donít have the derivation, that this is the case, that itís not true that itís rank-1, just rank-1; Iíll get to that later, but symmetric rank-1, something like that. These are all the synonyms for an outer product like this. By the way, if you have PQ transpose where P and Q are vectors, what is the IJ entry of this? What is it?

Student:[Off mic].

Instructor (Stephen Boyd):No, itís PI, QJ. So basically what the outer product does is it takes two vectors, actually, of different sizes, possibly different sizes, and it simply forms all possible products of one entry of one with an entry of other, and it stuffs it in the matrix. So thatís what PQ transpose is, like that. Okay.

All right. Back to our main story, so our main story is that U, U transpose = I can be written this way. I mean, you could check on this. This is just one way to expand this thing is to write it out as sum UI, UI transpose. Okay. Letís see what this means. Thereís lots of ways to say this. If X is U, U transpose, X Ė well, because U, U transpose is I, we can write that this way. We can write X = sum UI, UI transpose, I = 1 to N ◊ X, and Iím gonna reassociate things here. Iím gonna put a bracket around that. Thatís a scalar.

Now, a scalar can actually move to the front, and, in fact, in some cases, I guess when we recast, at the moment, thatís formally a 1 by 1 matrix, but if I simply put in front of scalar, parentheses this, or parenthesis scalar to recast this as a scalar, it can move up front here, and then you get this formula here.

Now, this formula is actually quite pretty, and letís see what it says. It basically says X is a linear combination of the UIís. What are the coefficients? The coefficients actually are found in an incredibly simple way, by simply multiplying X by UI transpose. So people call UI transpose X, the component of X in the direction UI. Actually, sometimes people call this the coefficient, and you would distinguish Ė you would call UI transpose X the coefficient and UI transpose X ◊ UI would sometimes be called the component of X in the direction of UI.

So here, the multiplication U transpose X, which is nothing but this, thatís equal to U1 transpose X down to UN transpose X. This thing, when you make this multiplication, itís called resolving X into a vector of its UI components because thatís what this says. Itís basically calculating the coefficients here. When you form the operation X = UA, thatís actually reconstituting X from its UI components. Itís rebuilding X from the recipe; the recipe is A here.

So if you write this out, thatís called the UI expansion, and you can see that itís really two operations. When you write X = U, U transpose X, if you think of it this way as U, then U transpose X, these things have real meanings. U transpose X resolves X into its U expansion coefficients. You multiply U by this vector of coefficients; this is the recipe. That reconstitutes X. So thatís the meaning of this. I mean, you have to, kind of, go over this just to think about it, but thereís nothing really too complicated here.

Now, this identity, which is I as U, U transpose, which is the sum. Actually, thereís lot of names given to it, but in physics itís sometimes written this way, and thatís mirroring this idea here. If you take X, you multiply this on the right by X like that or something like that. So I write I, and if I put this here, then you see it, sort of, completes the inner product symbol here. So Iím just curious, is anyone Ė you want to guess what this is called in physics? Thereís people here who know.

Well, thatís called a bracket. So this is called Ė I couldnít make this up, okay? Itís called a Keptbra, although I donít know how you pronounce it because this is a bracket. Well, I got news for you, thatís about as fun as it gets in physics, so thatís the best they can Ė actually, have people heard this? Cool. So they stick to it, huh? Okay. Do they think itís funny? No, they donít.


Instructor (Stephen Boyd):No, so they didnít even get it. You said when you saw it, what, did they just smile or what?

Student:Yeah, they smiled.

Instructor (Stephen Boyd):They smiled.


Instructor (Stephen Boyd):But nothing more?


Instructor (Stephen Boyd):Okay. So letís just say they found it mildly amusing. It was, sort of, like a private joke kind of a smile or what are we Ė was it Ė what are we

Student:[Off mic].

Instructor (Stephen Boyd):Oh, other people Ė no, no, no. No, Iím not. Iím just asking. Okay. All right. Weíll move on. Okay. By the way, youíll see this in lots of context, I presume, and already have. You may have seen the Fourier Ė or will see Fourier Transform described this way, and so this should look very much like many, many things youíve seen, if you havenít seen exactly this. Okay.

Okay. Letís take a look at the geometric interpretation. If you have an orthogonal transformation, then the transformation, when you multiply it by an orthogonal matrix, it preserves a norm of vectors. So we already know that. It preserves angles, and examples would be something like this. If you rotate about an axis, so if you fix an axis somewhere and then rotate 22 degrees, thatís a linear operation, and itís given by multiplication by an orthogonal matrix.

Another example is reflection. So in reflection you specify a hyper plane, and then you say Ė and then the operation is to take something on one side of the hyper plane, basically calculate the projection onto the hyper plane, and then go the exact same distance on the other side. Thatís a reflection. Thatís also linear, and itís also given by an orthogonal transformation.

Letís look at some examples in R2. So in R2, this is, kind of, pretty straightforward. If you just want to rotate by theta degrees like this, a good way to work out the matrix is quite simple. You simply figure out the first column of that matrix is the image of E1 because we know AE1 gives you the first column. So itís gonna be this thing which is cosign theta, sign theta, and, indeed, thatís the first column, and the E2 rotates to the vector of -sign theta, and then cosign theta; itís the height, and thatís this one, and you can check. The U theta, transpose U theta is I, and itís just a trigonometric identity. The ones on the diagonals return cosign≤ + sign≤ which will be 1, and the off diagonal ones will just cancel away.

If you reflect across this line, thatís a line of theta over 2, if you reflect across it, you can work out what happens to E1 and E2, and that would give you this matrix here. It looks very similar to this one. Itís also orthogonal, and this one gives a reflection, so thatís the thing. Now, in fact, you can show the following: any orthogonal matrix, 2 by 2 matrix, is either a rotation or a reflection.

So what this tells you is that orthogonal matrices now, just from knowing what it means geometrically, itís gonna come up anywhere where you have isometric changes of coordinates. So itís gonna come up in areas like dynamics, aeronautics, anywhere where youíre changing coordinates. GPS, it would come up all the time. Itís gonna come up in navigation, and it will come up in robotics all the time.

So itíll just be matrices filled with cosigns and signs, and thatíll be the orthogonal transformations that map, you know, the coordinate system of the third link in the robot arm back to the base coordinate system, or in GPS, youíll have the world fixed Ė whatever the earth-centered, earth-fixed coordinate system and some other coordinate system, and thereíll be rotation matrices, orthogonal matrices that affect the transformation. Okay? So youíll see them in lots and lots of fields like that. Okay.

So now weíre gonna look at something called the Graham-Schmidt Procedure, and before we start, let me point out where we are in the class and where weíre going. So this may be a time to have a little road sign to say where we are and where weíre going. So far, weíve rapidly reviewed these ideas from linear algebra that you may or may not have seen before, but, at the moment, you actually have, so far, we have introduced no computation teeth into what weíre talking about here, right? So, so far, you have no idea how one would actually, sort of, compute a subspace, how would you calculate the range? So, in fact, so far itís all been talk.

Thatís gonna change in the next two lectures, and itís not a big deal, but itís gonna change in the next two lectures, and all of the stuff weíve been talking about will be given algorithmic and computational teeth. So you will actually be able to do things with all the things weíve been talking about, which actually will make the course a lot more interesting, I assure you, for me, anyway. Itís gonna be a lot more interesting and a lot more fun. So that starts now. Thatís where weíre going.

Okay. Graham-Schmidt Procedure, this you probably have seen somewhere. Itís not exactly a very complicated thing, so thatís okay. Graham-Schmidt Procedure goes like this. Itís an algorithm, and it accepts a set of independent vectors. Ooh, that was slang. I really have to work on this, right? Because independence is not a property of vectors individually; itís a property of sets of vectors. Okay. Iím gonna try it again, and then Iím just gonna say forget it. I gotta collect my thought.

Itís given an independent set of vectors. Yes, it came out. So, actually, you could help me, and whenever I slip into slang you could hiss just a little bit or something like that just to help me here, but after awhile you just get used to it. All right. So given an independent set of vectors Ė yeah, thatís good, okay.

A1 through AK and RN, Graham-Schmidt procedure actually finds a bunch of orthonormal vectors with the following property. They have the same span as the vectors you started with, and, in fact, thatís true no matter where you stop the algorithm. So if you look at the Q1 up to QR, these are orthonormal, and they all have the same span as A1 through AR. Okay.

So what this says is that Q1 to Q4 is an orthonormal basis for the span of A1 through AR, and the idea of the method is, first, what you do is you orthogonalize each vector with respect to the previous ones, and then you normalize it to have norm 1. I mean, itís not particularly complicated. So letís see how it works. So itís actually quite straightforward. The first thing you do is youíre given A1, and your task, or at least the specifications, are to produce Ė well, a Q1, which has the same span as A1 and should be orthogonal. Well, it should be a singleton which is orthonormal which is the same as saying it should have length one, okay?

Thatís easy to do. You simply take A1 and you divide it by its norm, and that gives you a unit vector that points in the same direction as A1, and weíll do that in two steps. Weíll say Q tilde 1 is A1, and then Q1 is Q1 tilde divided by Q1 tilde, norm. This is the normalization step. This is the orthogonalization step except thereís nothing to orthogonalize against.

Now it gets interesting in Step 2. You take Q2 tilde is A2 minus Q1 transpose A2 ◊ Q1. And this is what you do, now when you subtract this, you are actually Ė the name of this step, if you asked me to describe it, is this: You are orthogonalizing with respect to Q1. Thatís what youíre doing in this step. Youíre orthogonalizing with this Ė you can check that Q1, Q2 transpose, Q2 tilde is zero, and you do it by subtracting the appropriate component of Q1 Ė the appropriate multiple of Q1, okay? So thatís what this does.

This ensures that Q1 transpose, Q2 tilde is zero. Thatís what it does, and you can check because Q1 transpose, A2 is the same as this. You could work out what this is times Ė and then you have Q1 transpose. Q1 transpose, Q1 is one; itís normalized and so is zero, okay?

So this is really the orthogonal, in fact, this is called orthogonalization, okay?

Then when you orthogonalize, thereís no reason to believe that what you are left with has unit norm, so you normalize. So basically it alternates between orthogonalize, normalize, orthogonalize, normalize. Now, you always orthogonalize with respect to the stuff youíve already done, and that involves this, okay? So this is the next step is you simply orthogonalize and so on, and you keep going.

And thereís lots of questions you have to ask here. In fact, what you have to ask in an algorithm like this is if you fail, why? And youíd have to argue why you donít fail. If I failed here, if I had a divide by zero exception here, what would it mean? Suppose here we fail, as we really canít fail here. Here you can fail by a divide by zero exception. What would it mean?

Student:[Off mic].

Instructor (Stephen Boyd):A1 is 1; do you have any problem with that? It violates the contract here because we were allegedly given or passed a set Ė ooh, an independent set of vectors. Caught myself that time, okay? If A1 were zero, that contract was violated that we were not sent an independent set of vectors, okay?

Now, the second one is actually really interesting. What would happen if we Ė oh, not here, here you go. What if there is a divide by zero exception at 2B; what would it mean? It would mean that Q2 Ė well, itíd mean Q2 tilde is zero. That means A2 is equal to this thing times Q1. Q1 is also a multiple of A1. That says A2 is a multiple of A1. Once again, it means this Ė you get a divide by zero exception only if A2 is a multiple of A1. That means the pair, A2 A1, A1 A2, is not independent. It means that the preconditions didnít hold. Okay.

By the way, some people use Ė you can use a Graham-Schmidt-like procedure to check, in fact, if a set of vectors is independent. That came out right, by the way. It checks because if it has a divide by zero exception somewhere, instead of, sort of, jumping core, it can throw an exception saying Ė in fact, it can be quite explicit. It can actually say these are not independent. In fact, it can say the following: The last vector that I processed is actually the following specific linear combination of the previous ones. So it actually returns a certificate proving that the set of vectors is not independent. Everybody see what Iím saying here? So these are the methods, and weíll see how that works.

All right. So hereís the geometry of it. You start with A1, thatís over here, and thatís also Q1 tilde. The only thing here to do is to divide by the norm, so apparently thatís our unit length. The next step is you take A2, which is like this. You subtract off the Q1 component, thatís you subtract off this. This vector is this thing here, right here. I subtract that off, and what Iím left with is this. Oh, by the way, thereís a name for this. Some people call that The Innovations or something. In a statistical context, in signal processing context, itís called The Innovations, this thing.

I mean, it makes sense. This is, sort of, whatís new in A2 that you havenít seen already in A1; itís whatís new. Thereís other reasons why itís called The Innovations, but, I mean, that, sort of, makes perfect sense even in this geometric context. So thatís the innovations, and you simply normalize the innovations. So this is the picture.

Now, if you do this for awhile Ė if you run up our steps, you have the following: At each step here, I take these terms here, and I put them on the other side, and you see something very nice which is that A3 is equal to Q3, which is, in fact, nothing but Ė Q3 tilde is going to be Q3 ◊ norm Q3 tilde or something, and times a scalar, and you see that this expression here gives A3 as a linear combination of Q1, Q2, Q3.

Oh, by the way, thatís, kind of, what weíre required to do. This says that at each step, you express AI effectively as a linear combination of Q1 up to QI, like that. And weíre gonna give these names. Weíre gonna call these R1I, R2I, and RII, okay? And they just come right out of the Graham-Schmidt Procedure. And RII is not zero because RII Ė in fact, not only that, you can even say this. RII is positive because itís actually this thing, okay?

Well, if you write this set of equations out in matrix form, letís see what it says. It says that each AI is a linear combination of Q1 up to QI, and if you write that on the matrix form, itís an embarrassingly simple formula. Itís four ASCII characters; itís A = QR. So A = QR. Here A is RN by K. Thatís a matrix; youíve concatenated the vectors into a matrix. Q is of the same size; itís N by K, and R is K by K, but whatís interesting Ė oh, letís see how this works out.

This says A = Q ◊ R, and you can check that by the rules of batch, batch matrix multiplying, and if you interpret column by column, it says basically that A1 is this thing times that, and thatís Q1 ◊ R11, exactly like we said. A2 is this thing times the second row, and thatís gonna be Q2 ◊ R12 + Ė Q1 ◊ R12 or R12 + Q2 ◊ RQ2. These are scalars, so they, kind of, they rotate forward. Thatís the picture.

The lower triangle of Ė the fact that this is upper triangular here comes from the fact that when you apply this procedure, you express AI only as a linear combination of Q1 up to QI. It doesnít involve QI + 1, QI +2, and so on. Thereís, sort of, a causality here, which is, sort of, what you should think of automatically when you see an upper triangular matrix or a lower triangular. Suitably interpreted in the meaning is something like causality here.

And, in fact, I guess I can say what the causality is here. This algorithm, the QR algorithm, spits out Q7 before it has even seen A8, A9, A10. Those may not even exist. Thatís the causality, that when you calculate Q7, you only need A1 through A7 to calculate Q7, not A8, not A9, which may not even exist. So thatís the causality that you have here. Okay.

Now, itís a beautiful decomposite that says you can write a matrix as a product of a matrix whose columns are orthonormal Ė that was slang Ė times a lower triangular matrix which is actually invertible, upper triangular matrix is invertible. And this is called the QRD composition, and itís also, maybe, sometimes called the Graham-Schmidt Decomposition or something like that, but this is it. Okay.

And thatís called a decomposition or a factorization of A, and then youíll find, actually, in terms of English, there are two verbs. I donít know which is right, but you will say that the verb to do this is either to factor or to factorize, and I think it depends on whether you, your advisor, and your advisorís advisor were, maybe, educated, like, at Cambridge, or Oxford, or something like that. Iím not exactly sure, but you see it about half and half. People will say, ďYou should factorize QA to get QR or factor.Ē I think, actually, factorize is winning out now, so Ė all right. So thatís called a factorization. Weíll see several factorizations before the class is over. Oh, by the way, Iíve now shown Ė by the way, we have a big stack of unproved things, and we just popped one, for the record.

Earlier I said that if a matrix Ė well, not quite; we didnít get that. This is one of Ė this is Ė well, okay. No, we didnít. Sorry. Scratch all that. So itíll be fun. When we can actually edit the videos itís gonna be fun because Iíll start and then my head will go like this, and itíll be a little bit shorter. Itíll be good. I canít wait to do that. All right. Forget all that. Letís move on.

By the way, the method I showed here is Ė the Graham-Schmidt Procedure the way I showed it is actually Ė and obviously itís completely correct. It is not the way itís actually done in practice. In fact, people use a variation on it called Modified Graham-Schmidt Procedure, which is less sensitive to numerical rounding errors, and, in fact, they donít use the Modified Graham-Schmidt Procedure either. I mean, if you really want to know how itís done, itís done by blocks and things like that.

So itís fairly complicated, the same way, I think, at one point, when I was ranting last week, maybe, about how do you multiply two matrices. Anyway, itís not the four line C program. Itís actually done by blocks and things like that, same for QR factorizations. If you were to look at the actual code that runs, for example, in Matlab, if you do a QRD composition Ė and let me point something out.

Math Works had nothing whatsoever to do with any of the numerics. All of the numerics in Matlab Ė just to make sure this is absolutely clear because it irritates me beyond belief when people think otherwise. Every numerical method in Matlab relies on open source public domain software written by professors and graduate students, just so you know. So please never say, ďOh, the numerics in Matlab, beautiful.Ē Because you, and your friends, and your professors, and colleagues, and peers wrote it, and itís free for everybody. Sorry, just had to have that slip out. Okay. All right.

Okay. So what happens is this is actually an algorithm for computing an orthonormal basis for the range of a matrix. I mean, thatís what it does; it creates an orthonormal range for the basis. By the way, as a method right now, it doesnít really Ė youíve got a little bit of algorithmic teeth. A modified version of this would actually allow you to check if a skinny matrix is full rank. It would allow you to check if a set of vectors is independent. How would you do it?

Well, using this method, as I said, you would modify the algorithm to not dump core on a divide by zero exception, but instead, to return something saying theyíre not independent. Thatís what you would do, okay? So at least you have a little bit. You have an algorithmic method to check the rank Ė Iím sorry, to check if a matrix, so far, is full rank. Weíre gonna get to rank in a minute.

Now, the General Graham-Schmidt Procedure Ė General Graham-Schmidt procedure works like this. In the one I just described, we assumed the input set of vectors is independent. Notice thatís slang in writing, but youíre allowed, by the way, to write slang on lecture notes, not in papers, by the way Ė oh, unless itís obvious. Unless itís in a paragraph in which itís clearly, kind of, fuzzy and giving the idea, then youíre allowed to use slang, but in lecture notes Iím allowed to use slang.

Now, if theyíre dependent, whatíll happen is youíll get a divide by zero exception in the algorithm we just looked at. Now, hereís the modified algorithm. Itís really simple. It does this Ė letís go back to the algorithm and see what we do. Hereís what happens. What happens is this. You do the same thing as before. So, for example, you remove the previous Q components from the current vector, all right? Now you attempt to normalize. The only thing that can fail here is that Q3 tilde can be zero.

So you simply put a test in front that says if Q3 tilde = 0, by the way, that has a very specific meaning. That will only happen Ė it means, specifically, that A3 is a linear combination of Q1 and Q2, which is the same as saying itís a linear combination of A1 and A2. Thatís the only way this will fail. If it fails, you simply fail to do this and you actually just get Ė you say get me another column. You just say, instead of dividing, you call get call or get next vector, and you pull another vector in, and you do the same thing; you normalize. You didnít spit out a Q.

So this is the algorithm. It runs like this: U form AI tilde is this thing, so you actually orthogonalize with respect to the previous things here. If there is any innovation because A tilde you can interpret as the innovation. Itís basically, sort of, whatís new that I havenít seen already? Thatís A tilde. If this is zero, you just do nothing. You just go right back, and you get another column, and you keep going. You donít generate a QI. Otherwise, you generate a Q.

So the number of Qís you generate actually is gonna tell you something about the rank. Okay.

Now, by the way, this works perfectly nice. Letís see how this works just for fun. All right, letís do it this way. Suppose I put in vectors that look like this: 0, E1, 2E1, and E2. Actually, you know what? It doesnít matter. Iím gonna call that A1, A1, and A2, and A1 and A2 are independent. What does a Generalized Graham-Schmidt do here? Well, it checks zero. It orthogonalizes it with respect to the previous Qís of which there are none, and it attempts to normalize.

Youíre gonna have some serious trouble normalizing zero, so this doesnít happen; this fails. So you simply skip over it, and you get a new vector which is A1. You orthogonalize with respect to previous Qís; there are no previous Qís, so you are already orthogonalized, and now you normalize with respect to A1. So we divide A1 by its norm, and thatís gonna be Q1. And so notice that the span of the first two vectors here, thatís zero and A1, is now, in fact, Q1, which has dimension one, okay?

At the next step you get 2A1. You orthogonalize, and youíll get zero because 2A1 is in a linear combination of A1 Ė well, of A1. So here, once again, this fails again, and you donít spit out a Q. You pull an A2, and then you get a Q. So as you process these four vectors, you will, in fact, generate only two Qís. Reason, the rank of this is exactly equal to two, okay?

So you can actually say much more. You know, itís not here. Itís, kind of, silly, isnít it? Oh, itís not Q1 through QI; theyíre indexed by a different way. Q1 through QR is an orthonormal basis for the range of A, but notice what happened here. R can be less than K, okay? So thatís what happens. R can be less than K. Oh, thereís another name for this. This thing is sometimes called the Rank Revealing QR Factorization.

So thatís a great name for it. Rank Revealing QR, and you will hear this. People will talk about a Rank Revealing QR Algorithm. Thatís basically a Q algorithm thatís modified like this, doesnít dump core when what you pass it is not independent, but an interesting thing is when it terminates, it has actually calculated the rank; itís got the rank.

But there is one thing, by the way, that weíre not looking at and, in general, donít care about in this course. Theyíre important issues. It turns out that you have to be a little bit, you know, when you have numerical codes, this is not A tilde = = 0. In other words, itís actually exactly zero as a double vector. Now, more likely than anything, this is something like the norm is less than some tolerance which is really small, okay?

So whenever you see something like a rank revealing QR, be forewarned that there is a tolerance in there, and that you may have access to it. You may want to change it. You may want to know what it is. It is probably set very low, so low that, for engineering purposes Ė it may be inappropriately low for engineering purposes, but I mention this just because itís an issue. You really donít have to worry about it, but you should be aware that there is a tolerance in a real rank revealing QR algorithm. Okay.

Now, in matrix form, you can write it this way. You can write A = QR where Q transpose Q is I. Now, you should simply process these five ASCII characters. You donít need the R because youíre sophisticated now. You know thatís just simply the width of Q. When you see Q transpose Q = I, that should simply be an alias or a macro for the columns of QR orthonormal. So thatís what this is.

And R is an R by K, and it looks like this. The Rís get spit out this way; itís quite interesting. I think itís called Upper Staircase or thereís another one youíll hear like Upper Echelon or some Ė I donít know. Youíll hear various names for this type of factorization Ė upper staircase, upper Ė I donít know. Who knows? Reduced Row Echelon Ė I donít know what it is. Anyway, but it looks like this.

Now, the corners on the steps occurred exactly when you spit out a new Q, which occurred when the orthogonalization, the innovation, turned out to be non-zero. So for this picture, we can say a lot about the rank of various bits and pieces. So Iím gonna ask you some questions. The fact that a Q1 was generated on the first step is equivalent to what statement about the columns A1 up to AK? We generated a Q1 on the first step, why?

Student:A1ís not zero.

Instructor (Stephen Boyd):A1ís not zero. We generated a Q2 on the second step; what does it mean?


Instructor (Stephen Boyd):A1, N2, H over independent. Okay. On the third step, nothing was generated. It has a meaning; whatís it mean?


Instructor (Stephen Boyd):Okay. A3 is the linear combination of A1 and A2, exactly. Okay. So you get the idea. Okay. So actually the structure of this is quite revealing. It, kind of, tells you Ė well, it certainly tells you in ascending order, it, sort of, tells you which vectors are dependent on previous ones, and so you get a lot of information and structure here. Okay. Oh, I donít know. Letís see. Itís only count out a couple. What is the rank of A1, A2, A3, A4?


Instructor (Stephen Boyd):Itís three, and, in fact, we have an orthonormal basis for it; what is it?


Instructor (Stephen Boyd):Q1, Q2, Q3 Ė Q3, I heard Q4, but Iím just ignoring it, okay. Thatís it. Itís Q1, Q2, Q3, right. Did I do that right? Yes, I did. All right. Itís okay. Okay. Now, thereís another way to write this and another way youíll actually find this Ė this is one method. What you can do is actually you can permute the columns. Now, permuting the columns is the same as multiplying by a permutation matrix on the right.

I used to get all confused about this, and I still do, sometimes, but I just go Ė actually, the one thing you do actually if no oneís looking, you can write down a 2 by 2 matrix and do this secretly, but not if anyone is looking. Donít ever do that because itís Ė well, itís humiliating, really, if someone catches you. Iím talking about whether permuting columns or rows corresponds to permutation multiplication on the left or right.

The other way to do it is say well, look, columns are associated with inputs; you know that. So, basically, if you mess with the inputs, thatís something that should happen first before you apply the matrix. If it applies first, itís on the right. I can see this was extremely unhelpful. Well, okay, fine. Make up your own mnemonic. I donít care. Okay.

So I take this matrix here, and I apply a permutation on the right that rotates these columns forward, right? And takes all the columns when there were no innovations and rotates them to the back. If I do that, itís gonna look like this: A is Q, R tilde, S Ė thatís, kind of, whateverís left over Ė ◊ P. Now here, Q is, again, a matrix with columns that are orthonormal. R tilde is upper triangular and invertible, so now it looks like this. Itís just like the old QR factorization; thatís that. S is Ė well, who cares what S is, and it looks like that. So you have this picture. Okay? And then Q goes here, and A is the same size as Q, okay? So thatís the picture. Yeah, youíve rotated everything forward Ė or rotated the ones where you had something. Okay. So this is another form youíll see this in. Okay.

All right. Letís talk about some applications of this. Well, it directly yields a orthonormal basis for the range of A. Now, Iím gonna go back. Remember, I went on a little detour, a little ahead of time. Now itís actually time. So, now, weíre gonna actually pop something off our stack of unanswered questions, or something like that, or unproved assertions.

Earlier in the class, there was a factorization interpretation of rank, and the factorization went this. It says that if a matrix A has rank R, then you can factor it as a matrix, you know, you can make a tall, skinny factorization, okay? So you can get a tall, skinny factorization where the intermediate dimension here is the rank. I asserted that without proof, and we also had interpretations of this right then. Okay.

Now you have one because here is one, right there. So that gives you one right there. Thatís exactly the size. Hereís A is Q, thatís the first part of the factorization, thatís what I called B, and you call this thing C, and itís got exactly the right sizes, so it gives you that factorization.

By the way, this means Ė and you can do, sort of, a pseudo application. Earlier we found out that a factorization of a matrix, especially if itís a stunning factorization, like if itís 1,000 by 1,000 matrix, that, for example, has rank 15, and it factors as 1,000 by 15, and 15 by 1,000 matrix.

We found out many things you can do now. One is you can multiply by that matrix super, way faster than you could before, okay? And letís not make it 1,000. Letís make it a million, million by million matrix, okay? So if A is a million by a million, it says that you can do this Ė it says you can factor it this way. You couldnít even store it, a million by million matrix, let alone multiply by it, okay?

Factored as a million by 15, 15 by a million, itís no problem and would be shockingly fast. I can do that on my laptop. So, now, the question is youíre given such a matrix. Of course, you canít store it, so the story gets a little bit fuzzy here, and youíre asked to see if you can do such a factorization. You can apply something like this Modified Graham-Schmidt, and you can keep getting columns and stuff like that.

By the way, you donít actually have to ever store a whole lot when you do this algorithm, right? All you need is a method to get Ė you just request columns, and you check against the other one. You could basically say you know what? If the ranks more than 20, forget it; this idea is not gonna fly. So you keep processing the columns, and you see how many Qís you pick up. If, in fact, you process all the columns, and Q only got to 15 or 16, you win. You have a factorization, low rank.

That work was an investment. You can now multiply that matrix by that matrix some absurd about of time faster. If thatís a simulation, or some signal processing calculation, or something like that thatís gonna happen a lot of times, you just won big. By the way, this is not so useful because these are exact factorizations. Later in the class, weíre gonna see approximate factorizations. Thatís much more powerful, is an approximate factorization, but for now, weíll leave it this way.

Student:[Off mic].

Instructor (Stephen Boyd):No, it doesnít. No. Iíll repeat the question. In Matlab, when you multiply big matrices, does it attempt to do anything like this? And the answer is no, it doesnít. In fact, I donít really know any linear algebra system, although one could be developed, that would do cool stuff like that. That would basically flag a matrix as low rank and Ė now, ideas involving rank are widely used in numerical linear algebra, but, as far I know, thereís no general system. Itíd be quite cool to build one, actually. Okay.

So letís see. Weíve already checked a couple of things. Oh, hereís one. If you want to check if a vector is in the span of a bunch of matrices, which is basically saying, for example, suppose you want to check if B is in the range of A, no problem. All you do is you could run the Modified Graham-Schmidt on A1 up to AK and B, and what you check on is this. You only check that when B was pulled in, you generated a new vector. If you did generate a new vector, a new Q, that means B is not in the range. If you didnít Ė if on the last step, you didnít generate a new Q, then B is in the range of the previous ones. Actually, the algorithm automatically solves, by the way, AX = B in that case. It finds an X that gives AX = B.

Now, one interesting fact is that because of this causality, the Graham-Schmidt Procedure yields a QR factor is A and yields QR factors that if you stop it early, and if you send a terminate signal to the QR process, and it stops, it actually has calculated a QR factorization of a leading Ė if itís processed P columns, itís the leading sub-matrix of A, has already been processed.

So whatís, kind of, cool about that is this. It says that when you actually calculate, for example, the QR factorization of A, youíve actually simultaneously calculated the QR factorization of every sub-matrix of A. I have to say, itís not every, but itís every leading sub-matrix. So hereís A, and Iím gonna call a leading sub-matrix, what happens if I truncate this way, okay? And youíll actually have Ė if you truncate Q and R appropriately, in the same way, you get the QR factorization of a Ė oh, I donít know. What you call that; what would you call the first chunk of a matrix? What would you call it?

Student:The head.

Instructor (Stephen Boyd):The head of the matrix? Sure, yeah. Yeah Ė no, no, no, thatís fine. Okay. What do you call it when you have a string at the beginning of a string?


Instructor (Stephen Boyd):Pre Ė what, prefix? Prefix of a matrix; what do you think?


Instructor (Stephen Boyd):Head, prefix? Listen, we gotta step in. If we donít, the physicists are gonna step in, and itís gonna have some totally idiotic name, so Ė sorry. That was just for you. Yeah, I presume youíre in physics?

Student:No, I was.

Instructor (Stephen Boyd):You were.


Instructor (Stephen Boyd):Okay. What are you in now?


Instructor (Stephen Boyd):Aero, all right. I have plenty to say about those people too, sorry. You had a question? No, okay. All right. So prefix or head at the moment. Iím just saying we have to be involved because, otherwise, you wonít believe the names other people will assign to them. So all right.

Iíll mention one more thing which is the full QR factorization. Full QR factorization is this. You write A = Q and R1. Letís call that the QR factorization we just worked out, without the permutation, just assume itís full rank or whatever. It doesnít matter. Actually either way it doesnít matter. You write Q Ė here you put Q1, and what you do is you fill it out to make it square, okay? And you put a zero here, so it really, I mean, it doesnít matter, and let me explain a little bit about how you do this.

This is a very common procedure. It basically says given a, like, letís say K orthonormal columns, that was slang. Iím gonna stop saying that, but anyway. Given an orthonormal set of K columns, the question is Ė and these are in R10. Itís very standard to fill it out, which is to generate something like a Q8, Q9, and Q10 that makes the whole thing an orthogonal matrix, okay?

Now, it sounds weird. The question is how do you do it, and the way is actually quite straightforward. What you do Ė let me just show you how you would get this Q2. Itís really, kind of, dumb. It works like this. Letís feed in to our rank revealing Q4 the following: A and then the identity, okay? So thatís what weíre gonna feed in. Now, that matrix, sure, itís definitely full rank because of this I here, okay? No doubt about it, thatís full rank.

But what weíre gonna do is I just want to set a flag that the first time our QR algorithm gets a column, has to pull into the I matrix to get a new column, I want to set a flag. So hereís the way it happens. Our modified QR starts working on A, pulls in column one, processes it, two, three, four, five. Letís say it finishes up A, and it doesnít matter what the width of A is. Well, letís just put some numbers on here for fun.

Thatís ten. Letís say that A is, you know, seven vectors, and letís say Iíve run this Graham-Schmidt QR Factorization, or Graham-Schmidt, or algorithm, or whatever on this, and letís suppose the rank of A is six. So what will happen is I will have pulled in A7 and processed it, okay? And how many Qís will I have at that point? Six, Iíll have an orthonormal basis for the range of A. Okay.

Now, normally thatís where you stop. Thatís where our old factorization stops. Now weíre gonna do the following though. Iím just gonna go get E1 because thatís the next thing here, but Iím gonna set a flag, which is by Iíve overrun Ė Iím indexing out of my matrix, okay? Iím no longer processing A. Iíll take E1. Now, E1 might be in the range of A, in which case, I wonít generate a Q7, okay?

No problem if it isnít there. I pull E2 in, and I process that, okay?

Now, when I finish here, I must have Q1 through Q10. I absolutely must, why? Because the rank of this matrix is 10. So, however, the last three, Q8, Q9, Q10 were not generated from A; they were generated from this I or whatever other matrix you put here, okay? However, the Q8, Q9, Q10, theyíre all normalized. Theyíre mutually orthogonal, and theyíre orthogonal to Q1 through Q7, and Iím gonna write it this way.

So this would be that first Ė this is the chunk of Qís that I got out of A, and this is what happened when I actually Ė I called get call on A, and it returned an EOM. I guess thatís end of matrix flag, okay? So it returned an EOM, and I said no problem, and I got a column from Ė I just got E1, letís say, and kept going. So thatís what this is, okay? So this is how you do it, and this is very cool because this is finishing off, extending an orthonormal basis from a given basis out to a basis for the Ė a basis for a subspace to a basis for the whole thing.

I should add, all of this, at the moment, since weíre not looking at applications, it probably doesnít make a whole lot of sense, or, even if it made sense, is not that interesting. It will be very interesting when we actually start doing stuff. There was a question?

Student:[Off mic].

Instructor (Stephen Boyd):E2, you go to E2. Then you go to E3, but the point is, by the time you finish processing this, you have to have ten because the rank of this matrix, I guarantee you, is ten no matter what is, including A = 0. Thatís about a low as ranks go, okay?

So in A = 0, letís find out what would happen here is A would generate no Qís anywhere. So youíd stop. Then it would be correct. Itíd be, like, an empty matrix or something like that, no Qís, okay? Then the whole thing Ė then I would actually, in fact, be Q2 because theyíre already orthonormal, and in that case, this would be zero Ė not zero, sorry. No, and Q2 would be I. Does that answer your question? Okay.

So this is called extending an orthonormal basis. Now, the two subspaces, the range of Q1 and range of Q2, theyíre actually very interesting. Theyíre called Complimentary Subspaces. They do two things. First of all, theyíre orthogonal. We overload Ė right now, you know what it means to say X perp Y. It means X transpose Y is zero. They have zero inner product. Thatís what that means. We overload the perp to sets of vectors.

So you can say a set A is perpendicular to a set B, and the meaning is this. Any vector in A is orthogonal to any vector in B, okay? So here, you would say that range of Q1 and the range of Q2 like this are orthogonal like that. It means that anything in here and anything in here, and, by the way, you need to check that. You really need to check that; you can check it.

It means that anything in here is a linear combination of, letís say, Q1 has got columns Q1 through QR, and this has columns qR + 1, thatís little q, you can tell by the way I say it; itís a little bit softer, the lower case q. Itís, like, qR + 1 up to qM. This is a linear combination of the first R1ís, and itís a linear combination of the others, but theyíre all mutually orthogonal, so the inner product is zero. Okay.

Now, the other thing is we overload sum for sets. When you overload the sum for a set of vectors, it means the following. Itís a new set, and it consists of all sums, all possible sums, of a vector from this set plus a vector from that set, okay? So weíre gonna overload plus as well to sets of vectors, okay? Now, so you write this this way. If you do that, youíd say that RFQ1 + R of Q2 is equal to RN, like that, and what that says is the following. It says if you take all possible sums of a vector from range of Q1 plus all possible Ė well, if you pick a vector from here and here, add them, and then do that over all possible choices, you get RN. Yes, a question?

Student:[Off mic].

Instructor (Stephen Boyd):Whatís what?

Student:[Off mic].

Instructor (Stephen Boyd):The bar, you mean this thing?


Instructor (Stephen Boyd):Iíll tell you what that means in a minute. Okay. So this is what people would write. And, by the way, the way youíd say this in English would be something like this: oh, range of Q1 + range of Q2 is all N, or youíd say something weird like together they span RN. That would be the English phrase. Thatís how youíd say that informally.

Now, this bar, this is a symbol that says R1 Ė this terminology means R1, the range of Q1 + the range of Q2 is RN. Thatís what the plus means, and that little perp, it saves you writing this: And range of Q1 is perpendicular to range of Q2, so all the perp means is that. You will see it. Thereís no particular Ė it doesnít save you a whole lot of trouble to write out both, but you have to admit, it looks cool, and itís a very advanced notation, and if you do that, people wonít know what youíre doing, and you can say Ė well, youíll see Iíll give you some words you can say, and your roommates or whatever will be very impressed. So you asked what itís for; thatís basically what itís for. Okay. All right.

So another way to say that is this. In this case, you say that theyíre orthogonal compliments, the two subspaces, and another way to do it is the perp symbol actually acts as a post-fix superscript on a subspace, and it is, in fact, the orthogonal compliment. Itís the set of all vectors orthogonal to every vector in this set here, okay? Weíll have a homework problem or something on this soon to, kind of, help you get straight about this, and weíll look at some more of this in another context.

So now, I want to say one more thing. This is, at the moment, gonna look a bit bizarre, but later itíll make sense, I promise. If you do a QR factorization of A transpose, now A and A transpose have exactly the same rank. So I do a QR factorization of A transpose, I get something Ė sorry. If I do a QR factorization of A, which is a full one, and I transpose it, I get this. Sorry, I shouldíve said A is Q1, Q2 ◊ R1 N0, like that, okay? Thatís the full QR, and I transposed this QR factorization, and you find the following. You can now imagine what A transpose does to a vector by first operating here and then operating here.

Now, this is interesting because basically it says that when you form A transpose Z, you first for Q1 transpose Z and Q2 transpose Z. So you produce two sub-vectors. The first one Ė letís see if Iíve got this right. No, Iíve got it totally wrong, completely and totally Ė there we go. All right. So itís really this. Q1, Q2 Ė letís try that again Ė is U form R1Z and zero here. When I multiply Ė no, no. I had it completely right.

Iím wondering if this posting the lectures on the web is such a great idea. We better look into editing them pretty fast, I think. Thatís my conclusion, okay? This happened just because theyíve been posted. All my friends at MIT will be watching this, chuckling. Iíll get you, donít worry. Okay. Letís try it again.

Okay. Weíre just gonna go slow here. R1 transpose zero, okay, there we go. A transpose Z is this. There we go. So the first thing we do is we calculate Q1 transpose Z and Q2 transpose Z. Q1 transpose Z gets then multiplied by R1 transpose and actually causes something to come out. Q2 transpose Z then gets zeroed out. So you calculate Q2 transpose Z, but then nothing happens, and now you see an amazing thing. You see that basically A transpose Z = 0, actually if, and only if, Q2 transpose Z = 0, which is the same as saying that Z is in the range of Q1.

Student:[Off mic].

Instructor (Stephen Boyd):Where? Thank you. Itís just not Ė this is what happens. I knew we shouldnít have done it, see? Okay. There we go. Thank you. There we go. All right. Okay. Now, what this says is interesting. It says that the range of Q2 is the null space of A transpose. Itís the null space of Q1 transpose, which is the same as the null space of A transpose, okay?

And what you find out is that this second Ė when you fill out this second part of the QR factorization here, this Q2, what it really is is the columns are an orthonomal basis for the null space of A transpose, okay? And what you include is this, the range of A and the null space of A transpose are complimentary subspaces, okay? Indeed, one of these is the span of Q1; itís the range of Q1, and the other is the range of Q2 obtained in this full QR factorization. So they are complimentary subspaces.

That means, again, this is the method used to impress your roommates and so on. It means that they add up to all of RN, and itís says that theyíre also orthogonal subspaces. By the way, one consequence of this is the following, is that every vector in RN can be written uniquely as a sum of a vector which is in the range of A and a vector which is in the range of A transpose, and more over, those two vectors are orthogonal, okay? Thatís what it means.

And so people call this the Orthogonal Decomposition of RN induced by the matrix A. Thatís the name for this, and once you know this, and we have, now, a constructive method for doing this. In fact, we simply apply the Modified Graham-Schmidt to AI, thatís the algorithm. If you do that, you can generate this Q1 and Q2, and all of this will actually Ė and now itís constructive.

Now you can actually prove most Ė I think maybe actually all of the assertions from the linear algebra review once you know this. Itíll actually be a little bit easier in a couple of weeks when we do more material, but this is it. But if you switch A to A transpose, you get a decomposition of RK, which is the other side, and you get this, the null space of A is an orthogonal compliment of the range of A transpose. And so these are the various Ė these are, by the way, the four fundamental subspaces. Thereís whatever people glorify them as, these four.

Actually you donít have to worry about them too much. All that really matters is what they mean in the context of applications, I think. So thatís the importance, so you donít have to Ė these are just silly symbols that are interesting. Itís much more interesting when we start doing estimation, or channel decoding, or input design. Then these things are very interesting, and they have very real consequences, and thatís what weíll be doing in the next bit.

Let me show you a couple of things here. When you show your friends Ė when you assert this they say, ďWow, thatís a Ė Ē oh, by the way, some people consider Ė I think this is sometimes called Ė this is also sometimes called one of the Ė this is also one of the major ďresultsĒ in linear algebra. So itís a big deal. Itís treated with a lot of reverence. Youíre never supposed to joke about this because this is Ė I mean, it is cool, all right? But the point is itís Ė I want to point out a couple of things. Bits and pieces of it are trivial, but one-half of it is not. But want to see people look at that and go like, ďWow, I need to really go to a quiet place, and sit, and actually think about what that means.Ē Thatís pretty serious stuff, but I want to show you one thing. I hope this is gonna work out. Weíre gonna try.

Iím gonna at least Ė why donít I show you this part? Range of A is orthogonal to the null space of A transpose, right? Thatís one-half to the assertion. Thatís why the little perp is sitting above the plus, thatís what you asked. So letís see. Letís show that. What do you have to show here? Well, really, I have to show that any element of this thing is orthogonal to any element of this one. I hope this works, by the way, but considering that we just posted these things online, itís almost certain Iím gonna end in a deep, deep hole, but letís just see what happens, all right?

So a general element of the range of A is a vector of the form, letís say, AX, and then Iíll put a question mark here because this is what we have to show. Letís call a general element of null space of A transpose, Iím gonna call it W, but we put a little note, which is that A transpose W = 0. Oh, yeah. Itís gonna work, okay? So this is what we, kind of, have to show here, and letís see. Well, I mean, to check orthogonality, you write this. The question mark means itís open; itís not shown yet. Okay.

Well, thatís X transpose, A transpose W, and if you donít mind, Iím gonna just reassociate right now and skip a step, and Iím gonna look at that and put a question mark there, and I can erase the question mark because A transpose W is zero, okay? So now, the way you do that Ė this is the way you derive it. The way you hand in your homework or make your argument is the other way around.

You start and you say well, letís take W to be in the null space, that means this, and then you say letís let Ė hereís the general element of range of A, itís AX. Then you say, well, letís just form the inner product this way, and you say letís start by writing this down. You first say to someone, ďYouíd agree with that, wouldnít you?Ē And theyíd say, ďYes.Ē And you go, ďWell, no problem, but A transpose W is zero, right?Ē ďYes.Ē ďSo I can then write this. Youíre not putting the question marks there.Ē And theyíd say, ďI guess.Ē And then you go, ďCool, then I can reassociate it and rewrite that that way.Ē And theyíd say, ďYes.Ē And then youíd say, ďOh, Iím done.Ē So thatís the way that works. Anyway, what I wanted to point out is that in all of these things, some of them are very easy, and the others are not that easy. Okay. All right.

So I will start saying a few things about the next topic because itís the first one that is actually real and useful. Itís least squares. Iíll say a little bit about it, maybe just now because weíre just finishing up, Iíll just say a few things about it. Itís not a new topic. It goes back to Gause who wrote, actually, a book about it and may or not have been the inventor of it, but as far as most people know, he was or something like that, and he wrote a book about it in Latin, which was actually just translated, which is, kind of, cool, and you can get the book. I mean, not that youíre required to do so, but you get the book, and then on the left is the Latin, and on the right somebody translated it, so you get to see it.

And, by the way, if you read the book, youíll be very, very grateful you were born in a time when we actually, A.) Had notation because imagine what happens when you have to write out all of this stuff. Y = AX, thatís, like, a page of Latin, okay? The transpose goes on, and on, and on. I mean, this is not simple stuff. Thatís No. 1, so thatís the first thing you can be grateful for. Also, that you donít have to read it and the lecture notes for the class are not in Latin, another thing you could be grateful for.

The other one, actually, is that you were born in a time when computers will actually do all this stuff for you, and you can actually use this stuff, and embed it, and things like that. So anyway. All right, thatís just history. Iím just telling you that least squares is not exactly a new topic. Okay. But Iíll tell you what itís about. Itís about approximate solution of over-determined equations. So let me say just a little bit about that so that Thursday we can do this.

Letís suppose you have Y = AX where A is skinny, okay? So that means something like you have more equations than unknowns. It depends on the context of what exactly it means. You have more specifications than you have designed degrees of freedom, lots of ways to say it. You have more measurements than things you have to, you know, itís a redundant measurement system, or at least, I should say, itís a dimensionally redundant measurement system. It would appear, based on the sizes that you have more measurements than you need.

So people call this an over-determined set of linear equations, and basically for most, Y you canít solve that equation, obviously, because the range of A is a subspace of dimension M in RN. Thatís very small. In fact, even if M is N minus 1, in some, kind of, uniform distribution Ė with any probability density, if you draw Y from it, the probability that you can solve Y = AX is zero, right? Because a subspace, any strict subspace will have probability zero under any probability distribution, if you know about that kind of stuff. Okay.

So thereís a compromise, and the idea is to not solve it but to approximately solve it. So you approximately Ė now, what does it mean to approximately solve? It means, instead of finding an X for which AX = Y, which is impossible, weíre gonna find an X for which AX is approximately Y, and weíll measure proximity by the norm of the error. So thatís the basic idea, and I think we will quit here.

[End of Audio]

Duration: 77 minutes