Instructor (Stephen Boyd):Welcome back. Iíve got to say, the schedule is a bit bizarre where youíre off for a week and then come back for a week and a half, basically.
Let me go down to the pad here. Iíll start by making just a couple of announcements. You might want to zoom in a little bit here. The first one is, just to remind you, the final is going be next Friday to Saturday or Saturday to Sunday. And as always, if for some reason these donít work for you, just let us know and weíll figure out an alternative time for you to take it. Other administrative things are homework, so I guess homework eight, which you have now, right, and thatís due this Thursday, I think. Is that right? Male Speaker:
Instructor (Stephen Boyd):So thatís due Thursday. But weíre going to pipeline a little bit and weíre going to launch homework nine later today. Thatís after we figure out what it is and all that sort of stuff. So weíll launch homework nine later today and thatís due next Thursday, thatís obviously, the last one. Except, of course, with you final youíll be given a homework ten, which you can turn in during the first week of January.
Okay. Letís, if anyone remembers what we were doing, it seems, this is the topic we were doing before our quarter was interrupted by the schedule. We were talking about the idea of the gain of a matrix in a direction. So this is actually very, very important. This is sort of at the heart of the overloading Y equals AX. If you have Y equals AX where A, Y, and X are scalars, itís, you donít, I mean, itís very easy, thatís simply multiplication. When you have a matrix here, itís much more interesting. Because in fact, the amount by which the size of Y divided by the size of X, thatís the gain of the matrix A in the direction X, this varies with direction. Thatís exactly what makes it interesting.
Now there are cases when it doesnít vary with direction. Youíve seen one. One is when A is orthogonal. So if A is square and its columns are orthonormal, then Y equals AX, the norm of Y is the norm of X, the gain is one in all directions. So watch, youíre going to get is the complete picture today, on that.
So last time we asked the question, what is the maximum gain of a matrix. And we answered it. And the answer was well, itís, first of it itís got a name we gave it a name. The name is the norm of a matrix. And it is given by the square root of the largest, whoops, there we go, there it is the square root of the largest eigen value of A transpose A. Now, you have to be very, very careful here, because A is not necessarily symmetric. In fact, itís not even necessarily square. A transfers of A, of course, is always square, however, and positive semi definite. Itís nothing norm of a matrix and that tells you exactly what the maximum amplification factor of a matrix is when you consider it the mapping of Y equals AX. But we know more. We know things like this, the maximum gain input direction is the eigen vector associated of A transpose A, associated with [inaudible]. So that would be the input, which is amplified by the largest factor. The input that is amplified by the smallest factor, input direction is QN. Thatís the eigen vector of A transpose A associated with [inaudible]. Okay. And of course, this can be zero, which is just a longwinded way of saying A has a null space. Because if youíre in the null space, it means youíve come in non zero, you come out zero; it means the gain is zero. So this is a quantitative idea that generalizes the qualitative of null space. Okay. This brings us to the whole story. And the whole story is the singular value of decomposition, otherwise known as the SVD singular value of decomposition. Now, itís been a while. Historically itís been around for quite a while, maybe 100 years, thatís certainly 100 years. It was certainly well known in the Ď20s and Ď30s. Itís been used in statistics. Youíll hear other names for it. The one youíll hear most is PCA, which is principle component analysis. And thereís probably a couple other words for it in other strange fields. That came out the wrong way. That was not to imply that statistics is a strange filed, although, it is a bit odd, actually. So but anyway, so youíll hear other names for it. Certainly, in some areas of physics itís got some name, and I guess, in other contexts itís called things like the carbon and low ebb expansion, thatís something else youíll hear about the KLX. Youíll hear all sorts of names. But the main ones are singular value of decomposition and principle component analysis. Okay. And the singular value of decomposition is a triple matrix decomposition. By the way, weíve seen a bunch of them by now. Well, not that many, actually, but weíve seen a bunch, and it is very important to keep your matrix decompositions clear in your mind. Weíve seen diagonalization, thatís something like A equals T, land to T inverse. Weíve seen QR factorization. We have seen an orthogonal eigen decomposition. Thatís like A equals Q land duct Q transpose, and this is yet, another. By the way, thereís lots of others as well, maybe not a lots but five or so other common ones. So here it is. It says that any matrix can be written out this way. Itís A is U sigma V transpose where A is end by end with rank R. This intermediate matrix sigma is diagonal and its size is R by R. So itís exactly equal to the rank its size. And the singular values, thatís the sigma one through sigma R, these are ordered, always, from largest to smallest. So sigma one is the largest and sigma R is the smallest here and these are positive. U and V are both matrices with orthonormal columns. So U transpose U is I and V transpose V is I. So thatís what these are. So it looks like this, U and then sigma, which is diagonal, and then V transpose can look any size, like that, for example look like, for example. Okay. That would be a typical way this might look. Okay. So this intermediate dimension is the rank of the matrix. Now, you can write this many other ways. If you write the columns of U as U one through UR and the columns of V as V one through VR, then you can light this new sigma transpose out as what people call as a dyadic expansion. So thatís a dyadic expansion, which of course, means youíre writing the matrix as a linear combination of dyads and the dyads are UYVI transpose. Dyad is just a word for rank one matrix, also called an outer product, though, in other fields. Actually, there are no other fields. This is math. So Ė So sigma I here, these are called the singular Ė these are the non zero singular values of A. So thereís actually a bit of confusion and weíll get to that in a bit, and Iíll warn you about that, as to whether singular values can be zero. In the definition Iíve just given you, the singular values are all positive period. So it doesnít make any sense to say sigma five is zero. But weíll see that, in fact, thereís a lot of confusion on the streets about this and people do refer to zero singular values. But weíll get to that. Okay. Now, these VIs are called the right or input singular vectors of A, weíre going to see Y in a minute, and the UI are the left for output singular vectors. And it sort of makes sense. If A is U sigma V transpose, you can see that when you sort of operate, but when you form AX, the first thing you do is multiply it by V transpose X. But you know what that is. Thatís expanding V in the VI basis. Thatís the first thing you do. Then you scale and then you send the output along the UI basis. So these are really the basis sort of for the input that actually comes through the system and the output. So weíll get more into that in a minute. But itís clear V is associated with input and U with output. Thatís clear. Okay. Well, these things are, right at the moment, theyíre quite mysterious but I think we can clear up the mystery very quickly. If you form A transpose A and A is U sigma V transpose times U sigma V transpose, if you transpose this you get V, sigma is diagonal and therefore symmetric, so sigma transpose is sigma, then you get U transpose U sigma V transpose, that goes away, and you get V sigma squared V transpose over here. And what that says is the following: It says that V sigma square V transpose is an eigen vector decomposition of A transpose A. So A transpose A is a symmetric positive semi definite matrix, if you write out itís eigen vector in respect to decomposition, you get V sigma squared V transpose, thatís this thing, and so those are the input singular directions. And that says that these are eigen vectors of A transpose A. It also tells you exactly whatónow we know what the singular values are. Theyíre no longer mysteries. The singular values are the square roots, sigma I is the square root of the I eigen value of A transpose A. Okay. And thatís for I equals one to R. Those are only eigen values of A transpose A that are positive, then they become zero. And in particular, sigma one is nothing but the norm of A. Itís the square root of the largest eigen value of A transpose A. So sigma one is the maximum gain. That sort of makes sense. Weíll see why thatís the case in a minute. We can actually see it right now, but Iíll wait a minute. Now, if you multiple A by A transpose in the other direction, you get AA transpose, then you just multiply this out. This time the V goes away and you get U sigma square U transpose and what you find are the UI are eigen vectors of AA transpose. So thatís what they are. These are the output directions. And the sigma IR, again, the square roots of the eigen values of AA transpose, and here a question comes up, and now you have two formulas of the sigma I, which is the right one? Well, theyíre both correct because it turns out that the eigen values of, letís see, CD are the same as the non zero eigen values of CD, are the same as the eigen values of DC, whenever these are both square matrices. Was there a homework problem on that or something? I have a vague memory of this. Yes. Okay. There was. Good. So in fact, AA transpose and A transpose A, which by the way, generally have different sizes, but the non zero eigen values are the same. By the way, this is super useful. I can give an example. If I ask you what are the eigen values of PP transpose, or something like that, thatís a symmetric Ė I donít even need that, I can just do it the other way around. Letís make a general thing PQ transpose, there you go. So PQ transpose is a rank one matrix and I asked what are the eigen values of PQ transpose. You would use this theorem to answer it. Where youíd say, well, theyíre the same as the non zero eigen values correspond to the eigen values of that. Thatís a one by one matrix. I know what the eigen values of a one by one matrix are. Itís just the number. So that says that the eigen values of PQ transpose are Q transpose P and N minus zeros if that M by M. Okay. So Iím just giving you an example but anyway, that was just a little aside. Okay. Now, we can actually see what these Us are. Now, obviously, if you write A is U sigma V transpose, if you plug anything into A like AX and some vector here, and it gets multiplied by U. So anything of the form AX is a linear combination of the columns of U. Therefore, the columns of U, actually certainly, are a subspace of the range of A but in fact, thereí exactly equal to the range of A, because I can make this thing be any vector I like, including like E one through ER here. Okay. So U one through UR are an orthonormal basis for the range of A. Now, V one through VR are an orthonormal basis for the null space of V orthonormal complement, not the null space of A. In fact, itís sort of the opposite of null space of A. These are the input directions that basically are not in the null space. Thatís one way to say it. So if youíre orthogonal to V one through VR youíre in the null space of A. So thatís what this means. So this connects, once again, the all sorts of things youíve seen before. So letís get some interpretations now of what this means, A equals U sigma V transpose. Well, it means this, you can write it out as a dyadic expansion or you can do it as a block diagram. As a block diagram, it works like this. It says take a vector N and the first thing you do is you partially expand it. Of course, the Vs are not a basis, because thereís only R of them. But it says you partially expand V, you calculate the V coefficients where I equals one to R, thatís V transpose X, thatís an R vector, this is R vector with the coefficients of X and a VI expansion, then you scale those by positive numbers. You scale this by positive numbers. Thatís all you do. Thatís diagonal, no interaction. Then you take those positive numbers and you reconstitute the output by multiplying by this matrixís columns orthogonal. Now, by the way, when you move from here to here, thatís an isometric mapping. In other words, distances going from here, this is distance in RR, that was big R, big bold R, with a superscript little r, and then this is in our, I canít remember now, M, is that right? Yes. This is NRM. So this mapping is isometric. The distances are preserved exactly, angles preserved, everything here, norms, everything. Okay. So this is kind of the idea. Now, this looks exactly the same as if A is square it looks just like an eigen value decomposition for symmetric A. Actually, itís one difference. It is symmetric A, this is diagonal and real, but they can be negative, because you can have negative eigen values. And in that case, U and V are the same. The input and output directions are the same. Part of that sort of makes sense. Thatís what symmetric means. Symmetric means something like you can swap the input and output and it looks the same. Because when you transpose a matrix thatís what youíre doing. Youíre switching the roles of the input and output. So this kind of makes sense. So this is the picture. By the way, you can see all sorts of things now about you get a completed picture of the gain as a function of direction now. And letís see how that works. If you want to find an input here for which the output is largest possible, thatís here, thatíd be the maximum gain input direction, you would do this. Youíd say, well, look, norm here is the same as the norm here. So if you want to maximize the norm here, you donít even have to worry about the U, just make this as big as it can be. So it says what you really want is you want this vector, then scaled by these numbers sigma one through sigma R, to be as big as possible. Now, when you put in a vector here of unit norm, here youíre going to get a V transpose X is a set of numbers whose norm is less than one. I guess thatís vessels theorem, or whatever, and I believe you had a homework problem on that, too. So because the partial expansion and the VI coordinates. How would you make this come out so that Ė whatís going to happen to this vector is this vectorís going to come out, itís going to be a bunch of numbers whose squares are going to add up to less than one, less than equal to one, and then theyíre going to be scaled by positive numbers. If you want that to come out as big as possible, you have to put all, all of this into the first component because it is going to get the greatest amplification factor. And that means you want V transpose X to be E one. And thereís only one way to do that. And thatís to chose X equals V 1. Thatís the first column of V. You can check this, you can verify this many other ways. Iím just explaining how this works. Okay. So thatís the idea. U one is the highest gain output direction. And of course, you have AB 1 and sigma one and U one. In fact, of course, you have AVI equals sigma I times UI. So it basically says that the Iís singular input direction is amplified by gain factor sigma I. The sigma Iís are gain factors. So letís see, actually, if we can figure out how this works. Am I missing something here, Page 30? No, Iím not. Here we go. All right, here we go. So letís take this four by four matrix. Obviously, thatís so small you can figure out what happens yourself. Actually, thatís on the boundary, to tell you the truth. Even a four by four matrix, you canít look at it, no person can look at a four by four matrix, except in totally obvious cases like when itís diagonal, and say something intelligent about the gain varies. I mean, you can say a few things, you can mumble a few things and wave your arms, but you really canít say much more. So you can imagine what happens when you have a 500 by 500 matrix. But anyway, letís take A is in our four by four, and the singular values are 10, 7, .1, and .05. So the first thing, we can all ask sorts of questions about what does that mean about the matrix and its gain properties. Well, the first is this, the rank of A is four, because you have four singular values. So A is U sigma V transpose, U, V, and sigma are all four by four in this case. And what this says is that the gain varies from as small as .05 to it as big as 10. So that is a 200 to 1 ratio of gain. So if you want to know how isotropic is this mapping, in the sense of how much can the gain vary with direction, the answerís 200 to 1. Weíll talk, actually later this lecture, about what that isotropy means and isotropy means, lack of isotropy. So the gain varies from .05 so it has no null space. However, there is an input direction, which is scrunched by factor of 20 to 1. And thereís another input direction, which is amplified by a factor of 20 to 1. Okay. So we can be much more specific about it. We could actually say that if you have an input, if X, mostly, is in the plane given span by V 1 and V two, these are orthogonal, thatís the two most sensitive input directions, if youíre in that plane, then you get amplified by somewhere between seven and ten. Thatís what it means. Okay. So thereís one plane of input directions for which this system roughly has a gain on the order of ten, you know, between seven and ten. By the way, the outputs come out along the directions span by U one and U 2. Those are the two most sensitive output directions, or the highest gain output directions. Now, at the other side, there is another plane, by the way, orthogonal to the original plane that requires, by the way, some serious visualization skills, advanced visualization skills, since this is an R 4. But you have another plane orthogonal to the original high gain plane, now you have a low gain plane inputs that along in this low gain plane they get scrunched by factor between zero, 5, and .1. Okay. And theyíre going to come out, well, theyíll hardly come out, but when they come out, I mean, theyíll be greatly attenuated, but they will come out along U three and U four span by that low gain [inaudible] plane. Thatís the picture. Okay. Now, depending on the application, you might say that A is rank two. Thatís going to depend entirely on the applicant. Obviously, itís not rank two. Mathematically itís rank four, right there. However, in some cases you might say that A is rank two. As someone said, itís sort of the gain, if this is some kind of communication channel or wireless system, thatís a gain of plus 20 DB, thatís like minus 26 or something like that. You might say, you know what, thatís below the noise floor. Thatís worth nothing. Okay. Letís look at another sort of baby example here. Letís take a two by two matrix. Now, for a two by two matrix you do not need singular value composition to understand what it does, thatís for sure. That much is for sure. Okay. With the singular value of 1 and .5, basically what it say is youíd take something like X here, thereíd be an input coordinate system, thatís V one and V two, and you would resolve X into its components. That would be this and this. This V one gets multiplied by sigma one and then comes out along U one. So whatever that is, if thatís V one, letís say that about .55. Then youíve got .55 times U one, whereís U one, thatís over here. This should be bigger, shouldnít it? A little bit. I didnít draw it right, I think. Well, anyway, itís about right. That should come out about here and then for U two you get, maybe, I donít know, .7 V two, .7 V two should get scrunched by .5 and you should get .35 V two, and indeed, thatís, at least visually, a third. So thatís about right and thatís the output. Thatís the picture. Okay. Thatís the singular value decomposition. It has spread, in fact, to almost all fields. Then any field that uses any math, any field that does any computation. So you will hear about it in different contexts, youíll hear about it in the statistics, you hear about it in the finance, youíll hear about it in basically anything. Signal processing control, wireless stuff, networking it goes on, and on, and on. So youíll hear all about it. So itís actually a very good thing to know. Itís probably itís the last real topic for this class and itís very important and youíll be seeing it unless you Ė well, anyway, youíll be seeing it in many, many contexts. Thereís actually still some weird backward fields where it hasnít hit yet. Itís coming to those fields. So thereís a bunch more. I think, actually, letís see, the last couple of years itís sort of hitting fluid mechanics. So thatís the one I know about most recently. Which is amazing, because itís this incredibly complicated very sophisticated field and they just didnít know about this and itís a big deal now because it explains all sorts of stuff there. Anyway, itís also used in tons and tons of other things. Itís used in, letís see, itís used in the current standard for DSL, is basically straight out of the SVD, so all multi-input, multi-output wireless systems straight from SVD. Itís a lot of methods for compression and things like that, straight out of the SVD. Another one, and this is Ė itís only a small stretch, but it turns out, in fact, things like Googleís page rank. You may have heard this, straight from the SVD, the zero of order of the page rank, SVD. Itís nothing else. So anyway, I just wanted to point out that this is actually an incredibly important topic. I want to say that because others, obviously, are much less important, for example, our friend, the Jordan canonical form. But we teach that just so youíd know about those things and can defend yourself, if needed. If youíre caught in a dark alley late at night with a bunch of mathematicians. So, okay, who are pissed off. Okay. So letís do a couple of applications, but like I say, you go to Google and type this or some of itís synonyms like PCA and thereíll be literally millions and millions of hits. The whole fieldís based on this so youíll be seeing a lot of this. The first thing is, actually, I can now tell you the whole story on the pseudo-inverse. This is for any A not equal to zero. Iíve already fixed the notes, I just havenít posted it, but I just noticed this mistake this morning. So for any A non zero has a singular value decomposition so people donít really say that the zero matrix doesnít have a SVD or something, or it gives you a headache, I donít know, something like that. I mean, V and U would be zero by, theyíd have to be zero by something. So any non zero matrix has a SVD. So if you have a non zero matrix and its SVD is U sigma V transpose, then the Moore-Penrose inverse is V sigma inverse U transpose. Thatís it. And you can check this agrees exactly with the two formulas for the Moore-Penrose inverse or pseudo-inverse youíve seen so far. So youíve seen when A is skinny and full rank thereís this formula, and youíve seen when A is fat and full rank thereís this formula. But this formula, here, works for any matrix that is non zero, any at all, and it basically does what U think U do. If you want to sort of invert this, youíd sort of make a V transpose over there. Sigma square invertible diagonal can be, of course, inverted, and then you do something like inverting U, it would be inverting U, if U were square, youíd make it U transpose and you get this. You switch the role of the input and output directions, invert the gains along these directions and thatís what you get. Okay. So thatís the general Moore-Penrose inverse. It coincides with these, in these two cases, but itís now generalized to other situations. And in fact, in the general case hereís the meaning of it. So letís look at the general case when A is not full rank. Okay. So A is not full rank. If A is not full rank and you want to do least squares, least norm things get tricky. So letís see how that works. If A is not full rank the first thing you do is you say, well, how close can you get to Y with something of the form AW. This a least squares problem except now A is not full rank. And so up until now, you actually did not know how to solve this. You inform your trusted A transpose, A inverse and right at that point you have a problem because A transpose A is not invertible if A is not full rank. Okay. However, if you want to minimize this, there are still Ė you can form this least squares problem and ask how close can you get? Now, because A is not full rank the answerís not going to be a single point, itís going to be actually the whole set of points an affine set. So itís going to be the set of points for which AZ minus Y, the deviation is as small as it can be. Thatís an affine set. Itís a set of least squares solutions. Now, in that set, you can then ask for the smaller, the minimum norm least squares approximately solution. And thatís exactly what A dagger Y gives you. So this is quite complicated but itís worth thinking about and you should be able to sort of see what it does now. So this is the picture. So X zero inverse is the minimum norm least squares, and I really should put approximate here because I donít like writing, I donít even like, Iím going to fix that. I didnít even like saying least squares solution. Because the whole point is itís not a solution. I know that thatís what everyone says, but it irritates me. So Iím going to change it. Itís an approximate solution. Okay. Now, we can also do the pseudo-inverse of the regularization. So letís see how that works. So letís see how thatís going to work. So letís let U be positive and then this is the regularized, this is ticking of regularization, or ridge regression you call it in statistics, or something like that, and here you have two things. You have sort of your mismatch and then you have something that penalizes the size of X. By the way, itís clear, this is going to be related to pseudo-inverse, because pseudo-inverse finds among the Xs that minimizes this, it finds the one of smallest norm. So itís clearly related, these are clearly very close. Now, this is a perfectly good formula. Itís simply A transpose A plus Mu I inverse, A transpose Y. This inverse is valid, period. A can be fat, skinny, full rank, it donít matter, it can be zero here, itís fine, no problem, okay, because the Mu I takes care of everything and this is always positive definite. Okay. Now, as Mu goes to zero this limit gives you this pseudo-inverse. And in fact, it turns out if you want to get the pseudo-inverse this is another formula for it. Itís the limit of A transpose A plus Mu I inverse times A transpose. So itís the limit. Another way to understand what the pseudo-inverse is, if you donít like the SVD, you can think of it this way, itís the limit of regularized least squares in the limit as Mu goes to zero. Now, in some cases this is super simple. If A is skinny and full rank, then this makes sense when Mu equals zero and it just converges to our old friend, this thing. But this formulaís quite different in other cases. If A in fat and full rank, A transpose A is definitely not invertible. Okay. But A transpose A plus Mu Y, as long as Mu is positive, is invertible. And this inverse makes sense and this will actually converge to this A value. In fact, in that case it would have converged, of course, to A transpose quality inverse, which is the other formula for it. But the other cool thing is that even if A is not full rank, this converges to that, to the pseudo-inverse. Thatís what it is. So the pseudo-inverse is something like the limit of infinitely small regularization or something like that. As usual, the important part to know what it means not to know simply that it exists and so on. Okay. So now you know the full story on the pseudo-inverse. Now, like a lot of other factorizations, like the QR and I think weíve seen a couple others, maybe just QR, you can extend, in factorization, by zero-padding matrices on the left and right. You do that to make certain a zero-padding some and then padding out other things in convenient ways. One way is to like, for example, fill out an orthonormal basis. So you can do that with SVD, too. So if you take Ė thatís the original SVD, like this, so R is the rank, these are all positive numbers. What you do is you find a matrix U two, which is complimentary to U one, and you find a complimentary matrix to V one. So thatís V two. Complimentary means that the columns of the V two are together with the columns of V one, form an orthonormal basis. And you know lots of ways to do that. One way is to run QR by appending, for example, the identity matrix after U one through UR that would do. So what youíre going to get now is U one U two is actually now a square matrix and itís the output size M by M. And youíll get a matrix V one V two. By the way, these look, when written like this, theyíre fat, theyíre actually square because U is tall here. Right, so this is actually a square matrix, although visually when printed it doesnít look like it. Thatís square as well. Thatís the input direction and both are orthogonal. So now what you do is to make up for the fact that youíve just added a bunch of extra columns on U and V, no problem, you zero pad sigma and you get something like this. Now sigma is no longer square. Itís diagonal in the weird sense that off the diagonal N equals zero. But generally people donít talk about diagonal matrices that are not square. I mean, like, I donít know why they donít, but they generally donít. And then these are just the zero padding the square. By the way, some of these could be zero, depending on A. So for example, if A is skinny and full rank, like this, it means itís rank R and that says that when you write it as U sigma and V transpose, one of these is going to be square. I guess this one. Thatís your V transpose. So in this case, if A is skinny and full rank, you donít have to actually add anything to V. For example, if A were fat and full rank, you wouldnít have to add anything to U. Okay. I hope I got those right. I might not of, but too bad. Okay. So you zero pad like this and now you can write, once again, A is U sigma V transpose. So you have A as this thing and it works out this way. And people call this the full SVD of the matrix. In this form what happens is these are actually orthogonal matrices U and V. But sigma now has exactly the same size, as the original matrix A is diagonal, I mean, in the sense thatís the only place where it can have Ė but it can also have zero entries. And the one we looked up before, if you want to distinguish it from this, is called the compact or the economy SVD. Oh, I should mention this, itís actually quite important in matlab, for example, if you just do this I think you get the full thing. Although I donít remember. Is that correct? Okay. And if you want the compact one, I think you write something like this. Thatís econ, the string econ, I think. Anyone have a laptop with them? I know Jacob was, heís not here, so anyway. Okay. By the way, this is quite important. Let me mention a couple of reasons this is important. You donít want this Ė if A is, for example, I donít know, 100 by 10,000 then just forget the computation, letís talk storage, here. So you can be in big trouble if you type this. If you type this itís going to return a V, which is 10,000 by 10,000. Okay. And if thatís not big enough of a dense maker to make trouble, you know, make this 100,000. And now, for sure, youíre in trouble. So unless you want the big one, you could actually be in big trouble. Here, this one would do the right thing. I would return something that was, if this was full rank, it would return 100 by 100 matrix, no problem, 100 by 100 diagonal matrix, definitely no problem, and in this case 100,000 by 100 matrix, again, that would be no problem. Okay. But if you try this one it wonít work. Okay. So I just mention this. But it is important. Okay. Now, here, by the way, is where the confusion on zero singular values comes in. So itís up there and let me explain it in this context. Now, letís just take a matrix A thatís in R like seven by seven. And letís say itís rank six. Okay. So the SVD, or the compact SVD, or economy SVD looks like this. Itís U, thatís six Y, and you get a six by six diagonal, and then you get a V transpose, which is going to be six by seven. Okay. V is seven by six. Okay. Now, this is what happens in that case. And the singular values are sigma one through sigma six. However, when a matrix is square a lot of people will talk about sigma seven, in this case, and theyíll say itís zero. And there are actually applications where you want to talk about that. Because, in fact, what you might be interested in is what the minimum gain of this matrix. Well, itís rank six. Itís got a null space. The minimum gain is zero. And then people will talk about V seven. V seven is, of course, nothing but, in this case, describes null space of A. So it turns out while sigma one, V one, U one, these things are sort of unambiguous. But if someone writes something like sigma min, you have to be kind of careful. You mean the minimum of the positive ones or you mean the actual minimum, in which case it would be zero or something like this. So and things like this you just have to ask because it just depends on the context as to whether a personís referring to the smallest positive one, or the smallest one period, one zero. Of course, if things are full rank or whatever thereís no issue here. Thereís no ambiguity. Okay. So letís look at the geometric interpretation of how this works. So this is a full SVD. Thatís to make it simple to do the interpretation. You have A equals U, sigma V transpose. We know exactly what orthogonal matrices do geometrically. They either rotate, or they reflect, or something more complicated, but equivalent. They preserve angles, they preserve distances, they preserve norms. Right. So itís an isometry. Anything related to distances, or angles, or whatever, is preserved by V and U. Now, like rotations. Okay, so this says to operate by A on a vector you carry out three operations. First, you multiply by V transpose, then sigma, then U. So the first one is something like Ė and I mean rotate loosely in the sense of Ė of course, a rotation is given by an orthogonal matrix, but it could also be a reflection or some combination of the two. Okay. So the first thing you do is you rotate. Then you stretch along the axis by sigma I. Now, you put in the differing gains. So far thereís been no difference in gains anywhere because if you multiple by an orthogonal matrix itís actually complete isotropic in the sense that the gain is one all directions. Now you put in the gain and isotropy. So you put in the unevenness in the gains now. By the way, some people call this, Iíve heard dilate and, I donít even know how to pronounce this, but it might be dilatate. Does anyone actually know how to pronounce that? For a long time I thought it wasnít a word. But then I found it in enough books that I realized it was a word, and then I thought it was like a weird variation. Maybe something like from the U.K. or something like that. You know just some weird thing. But it turns out no, it appears to be a word. I donít know if thereíre exact synonyms or not, but anyway. But youíll hear this operation when you stretch or scrunch something along the axis is a dilation. That makes sense to me. Dilatation, actually, I heard people say it. Even people who seem to know what theyíre talking about say it, so I suppose it is a word. Okay. You also zero pad or truncate, because when this middle action, here, actually is going to map something of different Ė itís going to map of different sizes. You zero pad or truncate, depending on whether the output is bigger than then input or smaller. So thatís what you do. And the final thing is you rotate by U. So this means that we can calculate what happens if you have a unit ball. You can calculate exactly what happens. Take a unit ball. First, multiply a unit ball by V transpose. If you take a unit ball and you apply an orthogonal matrix to it, nothing happens. I mean, the individual vectors move. This point might move over here. But the ball is invariant. The ball is the same. So anything thatís in the unit ball, if the image of it is the same unit ball. Not at the detailed level, not on a vector-by-vector basis, but on the set basis, itís the same. Okay. Now, you take this ball and you apply a dilation to it. And in this case, itís a dilation along the axis because you multiply one here, thereís a singular value of two, and a singular value of .5. You take the two, thatís a long the X one axis, or whatever, and you stretch it by two, and you scrunch it by a factor of two and X two. Okay. Now you take this thing and you rotate by this output, these output directions, and you get this. And you can see all sorts of cool stuff here. You can see, for example, that, indeed, you won with the largest gain output direction because the unit ball and all the things that went in here, were things of norm one. The biggest that any of them ever came out is that guy and that guy, by the way. So those are the two points that came out with the largest possible gain. And look at that, sure enough, theyíre aligned with U one. And what input direction, by the way, ended up here? What input direction was it?
Instructor (Stephen Boyd):V one, precisely. So V one is some vector, here. Say that might be V one. V one got rotated to here or here, by the way. It wouldnít matter, it got rotated either to here or to here. And those are the two points, on this ball, that suffer, or on this sphere, that suffered the largest expansion here, and ended up here. Then it got rotated over here or here. So thatís it. So it says that the image of the unit ball, image means you overload matrix vector multiply to apply to sets. You multiply a matrix by a set of vectors. You just multiple all the vectors and you get a new set. Thatís what this is. And it says that the image of the unit ball is an ellipsoid and the principle axisís are exactly sigma IUY. Okay.
Now this allows you to understand exactly the game properties. And letís do an example. For fun letís do a couple of examples. In fact, letís start with this. Suppose the singular values are 5, 4, and 0.01, thereís your singular values. What does the image of the unit ball look like? What does it look like?
Instructor (Stephen Boyd):Itís a pancake. Exactly! Slightly oblong pancake, but itís a pancake for sure. Everybody agree with that? And this says something like this. There is an input plane, letís say itís a mapping from R three to R three, it says thereís an input plane, which is a plane where this thing kind of has a high-gain, like between four and five, and another gain, you might even call V three, you might even call that like an almost null space, because while AV three is not zero, that would qualify for full no space membership. It is not zero. But it comes out attenuated by a factor of 100. Okay. So itís something like almost a null space. In fact, this is exactly the kind of ideas you want to have when you do this. In fact, youíll find that all of these ideas about null space, range, all that stuff, itís wonderful, you have to understand all of it. However, in practice, these things are quite useless and quite silly, very silly, and very useless. What the SVD will do is it will give you quantitative ways of talking intelligently in the context of some application about things like null spaces, and ranges, and things like that. So here the range of this matrix is R three. Itís on two period. Up until, maybe, the third week of the class itís on two period. Now, in this context, depending on you know the details of the application, you could say that matrix actually has a range which is two dimensional. Itís a lie. It has a three-dimensional range. But itís third direction is so feeble, the gain is so feeble, that, again, depending on the context, you might say for practical purposes that matrix is rank two. Okay. So this is what this is going to allow you to do. Weíll see a lot of examples of this so this will become clear. Okay. So thatís the pancake. Whatís this? Whatís the image? If the singular values were 5, 4, 3.9, what would the image look like? Just roughly.
Instructor (Stephen Boyd):Egg, sure. Yeah. Maybe not even quite an egg. No, maybe an egg, I donít know. Something like an egg, right. So basically it says that thereís one cross section that looks kind of like an egg. Maybe not even as much as a football or something like that, but an egg. Thatís that. Okay. How about this?
Instructor (Stephen Boyd):Itís what?
Instructor (Stephen Boyd):A cigarette toothpick. Okay. Iíll take toothpick. Maybe I miss heard that. Did someone say cigarette? Okay. Yeah. No. Iíll take toothpick. Yeah. So itís a toothpick. Thatís what that is. And basically, actually, what you can say is, weíll see this quite precisely soon, you can actually say in this case that matrix is nearly rank one, depending on the application. I want to stress that. Matrix is nearly rank one depending on the application. So I want to stress that. So thatís the idea. Okay.
So letís see how the SVD comes up in estimation and inversion. Y equals X plus V, where V is the measurement noise, like this, and X is something you want to estimate. Now, probably the best way to really get more deeply than weíve gotten into this, although, to tell you the truth, the stuff you know is going to work quite well in practice and go quite far. So what you would do is you might do something like this. You might do least squares, here. You might take Y, choose to estimate X, you would take something a dagger Y, okay, which would be A transpose A inverts A transpose Y, and you would say that thatís the X that minimizes the discrepancy between what you actually measured and what you would have measured had the perimeter been X had, or something like that. Better models, well, maybe not better, I donít know, different models are obtained by looking at probability and statistics. So here you assume V has some distribution. But thereís another model, and the other model says this, that this noise, I donít know anything about this noise, but Iím willing to give you a bound on its norm. Okay. So some people call that a norm bound model or something like that, and some people call it a worse case model or something, and itís a lot more forgiving than a statistical model. Weíre not looking at statistical models so it doesnít matter, but a statistical model basically says V is random. So V is not out to get you in a statistical model. In this one, V can be out to get you. V can include someone intentionally trying to mess up your measurement. So V can actually include things like somebody intentionally trying to jam you. It can include something related to X. For example, V could include quantization error, in this case. So itíd be another model for quantization error. Something like that. But anyway, thatís the model where you know nothing about the noise except you have a norm bound on it. Okay. Now, letís put an estimator in, a linear estimator, and letís make it unbiased, that mean BA equals I, and you know what that means, it means that if you form X minus BY, if there were no noise, youíd get prefect reconstruction no matter what X is. So this means itís perfect reconstruction with no noise. Now, the estimation error is this. If you form X hat, thatís what you guess, minus what X really is, itís BV. Now this makes sense. Weíve seen this before. The key now, is to do this, among left inverses of A. What you want is a small one. Why does it want to be small? It wants to be small because B is what amplifies the noise. So B does two things. It maps a measured vector into an estimate, but it also maps measurement noise or error into estimation error. So you want B small. B canít be that small because, after all, BA is I. So for example, you canít take B as zero, as an extreme case, obviously. Okay. Now, we can then analyze, using this model, the set of possible estimation errors and thatís an ellipsoid. Actually, thereís a more refined ellipsoid you can get thatís smaller but weíll just do this one. So here you can get a more refined ellipsoid but for now, weíll just this. Weíll just say that if you get BV and V ranges in some unit ball, then in fact, thereís an ellipsoid, which is BV, as V ranges over all vectors of norm less than alpha. This defines an ellipsoid, and weíre going to call that B uncertainty ellipsoid. Note that has nothing to do with Y. Actually, thatís the key to the more refined estimate. But this is fine for now to give you a rough idea. Then it says that this is where the estimation error lies, but an ellipsoidís original model is symmetric, so you can say X is X hat minus X utility, thatís X hat minus, itís N, X hat minus the uncertainty ellipsoid, but minus and plus ellipsoid are the same because an ellipsoid is invariant under negation. And that says that the true X has to align an uncertainty ellipsoid, this E uncertainty, itís centered at the estimate X hat, and you can say exactly what the insert into the ellipsoid is. B here, at the singular values of B gives you the semi-axis of uncertainty ellipsoid. So you want B, if, for example, these were the semi-axis of B, then we can talk about what it means. In this case, if these were the singular values of B, therefore, these are the semi-axis lengths, when you multiple them by alpha of the uncertainty ellipsoid, in this case, very different. This one says your ignorance is pretty much isotropic. It means that when you guess X hat, where X can be compared to where X hat is, itís uniform, itís kind of your ignorance in this direction is kind of, you know, itís not too much different from your ignorance in that direction. And if you want, you can imagine a GPS system or something like that where youíre estimating position. And it basically says, I guess if someone forced me to guess where I am, Iíd say Iím here. But if someone says how far off are you, you say, well, I can be, you know, plus/minus three millimeters this way, itís about equal in all directions. Everybody see this? If, however, this is the story, it gets really much more interesting. This says well, in fact, this is in an estimation context this is singular values of B remember. This says, in that context you would say, well, in two directions I have uncertainty on the order of plus/minus four millimeters, letís say. In another direction Iíve nailed it. I have way, way better. Better, Iím much, much better, estimate of the uncertainty. So in this case the uncertainty ellipsoid is a pancake. Remember for an uncertainty ellipsoid small is good. This is better still. This says, in two orthogonal directions I have a very, very high confidence in my estimate. But in one direction, itís really way, way worse. So thatís what is says. Okay. So thatís the picture. Okay. So thatís the idea. Okay. So you can look out, for example, the worse error is the norm of B, for example, and it turns out, in fact, that the best thing, again, by this measure, is actually suing the least squares estimators. So if you use least squares estimator here, in fact, youíre uncertainty ellipsoid will be given by this formula will be as small as it can possibly be. So if what you care about is uncertainty ellipsoid, or if you believe this analysis and so on, then once again, our friend least squares is the best estimate you can come up with. In other words, if someone says, no, Iím not going to use least squares Iím going to use some other fancy thing, youíll get an uncertainty ellipsoid with some other left inverse of A. Youíll get an uncertainty ellipsoid thatís actually bigger in all directions. Thatís what will happen. And that comes out this way, thatís B least squares, B least squares transpose less BV transpose. And thatís that. So once again, we see that least squares is actually a good thing to do. So letís look at this example, our navigation using range measurements. So here I have four measurements that estimate two positions and we look at two positions. One was the just enough measurements method where you take the first two measurements, you say, look, Iíve got two unknowns, all I need to two measurements, Iíll use the first two, you donít even need the last two range estimates, donít even need it. So you form this left inverse like this. Itís got a two by two block and a block of zeros. And then weíll compare that to least squares, which is X hat is A dagger least squares. Thatís a two by four matrix. A dagger is this matrix that blends your four ranges into your position of your X and your Y position. Okay. And from the same example, hereís what we get, this is where alpha equals one. In the first case, you get this. And in the second case, you get this. Now, what is correct here, is that if you were to translate this over so they have the same center, theyíd lie on top of each other, in fact. Oh, sorry. This one completely covers that one. Okay. So thatís the picture. Now, there is something a little bit confusing about this and Iíll say what it is and it has to do with the fact that our analysis here can be sharpened a little bit once you have Y. This idea that this set, this uncertainty ellipsoid, the idea that this set contains X minus X utility, thatís completely correct. However, this does not depend on Y. Note itís independent of the actual measurement you made. It turns out, once you have the measurement you can actually get a smaller uncertainty ellipsoid. And itís not very hard to show. I wonít do it now. But now weíre going to explain why, for example, if one estimator says you have to be in the shaded set, another one says you have to be in this shaded set, then for sure you have to be in the intersection. And the intersection of those two is some weird thing that looks like this. Itís this little guy here. Everybody see that? And then youíd say, wow, so although this was not a good method, it actually did give us information because it ruled out, for example, all these points over here that this model allowed you to keep. It turns out that if you do the more subtle analysis of uncertainty ellipsoid, you get something thatís the same shape as this but itís smaller, like itís scaled down. Something like that. So, okay. I just mention that because itís, well, I confused myself this morning when I looked at this. Thatís what it is. This doesnít matter. The main point is you can understand uncertainty and how a measurement system works now by looking at the singular values of B. Thatís what you can do. Okay. I think what Iíll do is I might actually, Iíll skip this thing, which is just a completion of squares argument, that is a method to show, and if you have any left inverse itís going to end up with BV transposed bigger than the B least squares B least squares transpose. So Iím just going to skip that because itís just a completion of squares argument, because I want to get on to one other topic. Now, what weíre going to do is Ė I guess, I can make some other comments about, Iíll make just a couple more points about estimation and inversion. So in estimation and inversion, letís go back to that and look how that works. You know Y equals AX say plus V, like that. Now in some cases, in fact, Iíll just tell you in a case where, I donít know, that I was involved in, and it was geophysical inversion. So X had a dimension around 10,000, something like that, and Y was a bunch of magnetic anomaly data, and it was something like 300,000 or something, if these numbers are not right, the storyís going to make the right point. So 10,000 variables, 300,000 measurements like, no problem. In fact, that sounded like a 30 to 1 ratio of measurements to perimeters you want to estimate. Thatís a pretty good ratio, right, 1.1 to 1 is not so great, 30 to 1, thatís sounding pretty good. Youíve got about 30 times more measurements than you need. Everybody cool on this? Okay. So theyíd say, no problem. A is full rank, by the way. And so you would imagine you would form this. You would form that. Something like that. Okay. And in fact, you can do this and you would get things. Things would come out and it would be shown in beautiful color that would show you the substructure, the presumed substructure and things like that. So that would be it. Okay. No problem. Now, this matrix A, I can tell you what itsí singular values were. Okay, itsí singular values were this. Actually, how many singular values does it have if itís 300k by 10k and itís full rank? So how many singular values does it have?
Instructor (Stephen Boyd):Itís 10k, 10,000. So Iíll just jump into the story. It had about 20 significant singular values. After that they dropped off precipitously and were way small. They were like below the noise floor. Okay. Everybody see what Iím saying? So the sad thing, at that point, people had to accept the following: They paid a lot of money to fly an airplane with this squid, you know, magnetic anomaly detector around. They got 300,000 measurements, 30 times more measurements than they had on those, 30 times. Which you would think would be way, thatís a lot of measurement redundancy, 30 to 1. And it was very, very sad, because they thought they took 300,000 measurements and in fact, they didnít know it, but they actually took only 20. So they thought they were in the 30 to 1 measurement to data position, but in fact, they were in something like a 1 to 500. So what it meant was this X hat, though very pretty, very nice when rendered in color, basically, was completely meaningless. Now, you can estimate about 20 perimeters in X if you wanted, but certainly not 10,000. Okay. So this is an example of the kind of reasoning or understanding you would have when you fully absorb the idea of SVD. And itís sort of this step beyond what happens when you first just sit around and just talk about rank, and range, and null space, and things like this. This matrix has no null space. There was no arrangement of subsurface, thereís no X for which AX is zero. None. None. So in fact, without the noise you can get X exactly, in fact, by many methods. Because thereís lots of left inverses of A. This is your understanding as of week 2.5. Okay. Now, itís much more subtle and it basically says for all practical purposes you found A was full rank, it was actually like rank 20. You have a question?
Instructor (Stephen Boyd):Iíd like to say yes, but the answer is no, I donít know why. You know, I can make up a story though, about it. Itíll be pretty good, too. Yeah, I could make up a story. I could probably even get the number 20, I mean, with a long enough story. Right. But no, I donít know why that is. It just is. So thatís a very good question. Actually, itís because of that that youíd want to then Ė once youíve come to the realization that although you have 300,000 pieces of data, you actually only have 20 measurements, independent measurements.
The next question is what are you going to augment it with? Do you want to drill a couple of exploratory wells, you want to do magnetic anomaly, gravitation anomaly, or do you want to set up some explosives. These are all getting other different measurements. And it would be interesting there to find out which one of those would augment these to give you. And all you do, completely trivial, is you add some more rows to A. Any measurement is a row to A. You add more rows to A and find out what happens to your singular values. And if you go from 20 non singular, you know, significant ones to 50, you just added 30 more measurements.
Instructor (Stephen Boyd):Oh, absolutely! Yeah. The additional measurements, beyond the number of the perimeter youíre estimating, that goes into smoothing, blending, averaging, this kind of thing. I mean, that is, the matrix we just looked at, I donít know if you remember it, I remember it, the two by four matrix, I mean, that kind of says it all. You should think of a left inverse in that case and actually, I love the names, sensor fusion is one, itís a great name, sensor blending. And basically, itís a fat matrix kind of with all relatively small numbers. It takes all of those measurements in, scrunches them together, and gets very good measurements. So beyond any measurement beyond 10k, would have gone into averaging out the noise and stuff like that. I donít know if I answered your question. As a polite way, he said, no, thatís what he said, but oh well, that happens. Okay. Our next topic is, also, a topic on which there are entire classes, like whole ten week classes, and itís a huge area. Itís the idea of a sensitivity of linear equations to data error. So Iíll say a little bit about that. Itís not something weíve looked at in this class. In this class, when you use metlab, for example, which in turn uses LA pat, metlab does nothing, right. When you type A backslash V, it doesnít actually matter whatís happening, but something like the equivalent of forming A transpose A inverse A transpose times B is happening. And you know that thatís not done in perfect precision. Itís done with floating point numbers or whatever, and that means that the numbers that come out might not be exactly right. And youíre use to that because, I guess, when you solve some problem and you look at the X is supposed to be equal to Y, youíd take the norm. Youíd look at X minus Y and then entries would be on the order of A minus 15, and thatíd be your hint that what youíre seeing is just the artifacts of round off errors and stuff like that. Thatís a huge filed. Itís call numerical analysis. And weíll see a little bit of it now, about how this works. But singular value becomes positional central to the whole thing. So letís just do a square matrix, just for fun. Y equals AX. Well, of course, that means X equals A inversed Y, end of story. Not much more to say. Actually, thereís going to be more to say, and itís actually quite interesting. Now, suppose you have a noise or an error in Y. For example, Y equals Y plus delta Y. Now, in numerical analysis, delta Y is like a floating point round off error, for example. So there itís really small, right, it could be on the order of, you know, 1E minus 10, or, I mean, it depends on the context, but weíre talking errors out there in the 10th, 15th, 20th digit. Right. Actually, in an engineering context, you want to imagine delta Y, itís generally much bigger. If Y is a measurement, for example, and you use a 12 bit A to D, thatís only 4,000 numbers. So what that says is, the fourth digit, did I get that right, 4,000, yes, the fourth digit is absolute nonsense, if I have a 12 bit A to D. If I measure something with 12 bits, anything, it doesnít matter what it is, and then it says basically that whatever the true Y is, I donít know, but a delta Y was added to it that was on the order of one in 4,000. Okay. By the way, thatís pretty good. A lot of high-speed stuff operates with much smaller precisions like 8-bits and things like that. Okay. So when youíre doing engineering, if youíre doing economics delta Y is probably on the order of Y, basically. I mean, itís like friends in economics say, but never on record, you know, when we talk about accuracy of things, theyíll say things like, well, Iíll tell you the truth, if we get this sign right, weíre happy. But theyíll never say that on the record. I donít know why. It has to do with the sociology of the field or something. Anyway, so theyíre happy of they get the sign right. That means that this delta Y is on the order of Y. But anyway, all right. Well, if Y becomes Y plus delta Y, X, obviously, becomes X plus delta X, where delta X is A inverse delta Y. Thatís simple enough. And thatís says that how much X has changed, because Y changed, is simply less Ė the worse case change is norm A inverse times norm delta Y. Thatís it. And what that says is the following, if the inverse of a matrix is large, and I mean large in norm, then itís says small errors in Y can lead to large errors in X. By the way, it does not say does lead. I guess, do would be English, actually, but anyway. It does not say it must lead. Thatís not the case. For you to actually suffer the largest increases here, delta Y has to be in the largest gain input direction of A inverse. Thatís what has to happen. Okay. But unless you can rule that out, itís possible, and it says you can get larger. And now, this says, basically, you really canít solve for X given Y with small errors. And that means A can be considered singular in practice. Okay. So I mean, we can do a quick example here. And by the way, this clears up something a bit odd, which I will explain now. Take a matrix thatís singular. So itís N by matrix and itís singular. Then you ask someone hereís a measurement, Y equals AX. Iíve measured it. You know, this Y equals AX, and X and Y plus V, so thereís a small noise. V is delta Y here. Thatís it. So thatís your measurement system. And then you say, please estimate X. And someone whoís taking the first two weeks of [inaudible] will say, you canít, your A is singular. Of course, I canít estimate X, itís ridiculous. In fact, you know if itís rank N minus one, thereís a whole line of things with no noise, give you the exact same answer. Itís impossible, so forget it. Get yourself a better measurement setup. Okay. So you go back into the lab, and itís a fact, actually, that if youíd take a matrix A and you add, if you pick sort of entry at random or if you add a small number to any random number to the entries, with probability of one, the matrix will be non singular. So you go to the lab. All you have to do is find some physical measurement thing, tap on it with a hammer. I guarantee you A is now non singular. So a very gentle tap, just tap it once, Iíd say, itís taken care of, no problem. And they look at A, A changed in the fifth digit. And I say, check it, itís non singular, and it will be. By the way, it will absolutely be non singular. Okay. Now, and Iíd say, go ahead, and then Iíd walk away. But actually I should walk away very quickly because itís obviously not going to work. A inverse, A had a singular value just exactly zero. After I tapped the measurement system with a hammer, A had a singular value that was one E minus eight. Okay. A inverse now has a singular value thatís ten to the eight. The norm of A inverse ten to the eight and that says that, if you go down the pad here, it says that small errors in Y can be amplified by as much as a factor of ten to the eight. Okay. So if you wanted say, a couple of digits of that, you know, if you wanted X to be accurate to like, you know, to .01, no problem, Y has to be accurate to ten to the minus ten. Okay. Which is not with a 16-bit A to D is going to give you or anything like that. Okay. So I mean, this all completely obvious. You know, it would be idiotic if taking a little rubber hammer and tapping on a measurement setup, you know, or if I told the pilot in the magnetic anomaly thing to just do another thing and Iíd pushed on the control stick or something like that, and that made it non singular. Anyway, it was non singular there, anyway. But that doesnít affect anything. In fact, itís singular value analysis that tells you this. Okay. So what this says is that a matrix can be invertible, metlab, which is to say LA pack, will happily invert the matrix for you. By the way, if it gets too close to singular, then LA pack will issue a warning and say something like R con warning matrix is singular to working precision. At that point, youíre so far out of it, it means that basically, with double precisions itís finding out. But the bad part for engineering, in fact, for all applications, is Ė I guess itís the good part, is that the really evil ones are when you invert it, you see entries like ten to the nine, ten to the twenty, you know somethingís fishy. But the worse part is when itís kind of plausible and itís giving answers thatís plausible but completely wrong. That can happen, actually. Okay. So Ė all right. Now, I think Iíll mention one more thing and then weíll quit for today. What you really want to do is it doesnít really make an sense to say that the norm of delta Y is Ė I mean, if I ask you, if norm delta X is .1, is big or small, well, it depends on what X is. If the norm of X is 100 then the norm of delta X is .1, itís pretty good. Thatís like a .1 percent error. If the norm of X is .01 and norm delta X is .1, thatís terrible. That means that your delta X is ten times bigger than the X youíre interested in. So you should really be looking at relative things that develop errors. Now to do that you do this, you say, if Y is X and norm Y is less than norm A, norm X, and therefore, if you work out delta X over X this thing is the relative change in X. Thatís less than equal to the norm of A times the norm of A inverse times the relative change in Y. And this number comes up all the time. The product of the norm of A and the norm of A inverse, thatís called the condition number and itís traditionally called kappa of A. And itís the maximum singular value of A divided by the minimum, always bigger than one. And this is read informally something like this. It says that the relative error in solution X is less than the relative error in the input data times the condition number. So the condition number is what amplifies or can amplify relative error. So thatís the picture. Or, again, thereís a misspelling here. But in terms of, if you take the log two of this, and again, very roughly, youíd say something like this. The number of bits of accuracy in the solution is about equal to the number of bits of accuracy in the data, thatís the long two of this thing, minus log two kappa. So when you solve Y equals AX for X given Y, you lose accuracy. By the way, if the condition number is one you lose no accuracy. Matrices that have condition number one, Iíll give you an example, would be an orthogonal matrix. They have condition number one. All singular values are the same, gain factor all directions equal. Thatís why people who do numerical work love them. And by the way, itís also why a lot of transforms youíll see, like DCT, FFT, or DFT, a lot of transforms youíll see, actually, are orthogonal. I mean, thereís many reasons for it, but one nice part about it is when you use transform like that, and then the inverse, and stuff like that, you actually are not losing bits of accuracy. Okay. So this is the picture. So weíll continue from here next time.
[End of Audio]
Duration: 78 minutes