Instructor (Stephen Boyd):Are we on? Okay, hey, I guess weíve started. Any questions about last time Ė if not, weíll continue. I think theyíre Ė I wonder if thereís some announcement. Letís see, one is that Homework 1 will be due Thursday. Weíre probably gonna post Homework 2 later today. So I know some of you are very excited about that. The reason is we had at least a couple of people who are gonna be gone next weekend and asked us to figure out Homework 2. So we did it, and itís figured out.
So weíll post that. Of course, itíll be due a week from Thursday. Thatís how thatís gonna work. I donít think there are any more announcements. Oh, there is one more Ė we now have a third TA, Thomas, who has a class at this time, so heís not here, but he sneaks in for the last half hour. So maybe if I remember, Iíll point to him and he can wave his arms when he comes in or at the end of the class, something like that.
Okay, well, letís just continue. I need to go down to the pad. Last time we looked at linearization as a source of lots and lots of linear equations. So linearization is you have a non-linear function that maps RN into RM. And you approximate it by an affine function.
Affine means linear Ė sorry, thatís not linear. There we go Ė thatís linear plus a constant, so thatís an affine function. You approximate it this way. In the context of calculus, people often talk about a linear approximation and what they really mean is an affine approximation. But after awhile, you get used to this.
So linearization is very simple. You simply work out the Jacobean or the derivative matrix, and what it does is it gives you an extremely good approximation of how the output varies if the input varies a little bit from some standard point, X0. Thatís the idea.
And in fact, in terms of the differences or variations measured from these standard point, X0 and F(X)0 thatís Y0, this relation is linear, so the small variations are linearly related.
Okay, so letís just work a specific example of that. Itís an interesting one; a very important one, too. Itís navigation by range measurement and of course, this is roughly to give you a rough idea or is actually how, itís part of how GPS works.
Weíll get more into detail. Weíll see this example will come up several times during the course. So here you have X and Y, two variables unknown coordinates in the plane and we have a bunch of beacons at locations PIQI, so these are X and Y coordinates of the beacons. And what we measure is a range. And a range, because the beacons can only measure range, ranges to this point, it could be of course, be the other way around, that the point can measure its distance to the range.
But for now, weíll assume everybody has all the information, so here the beacons get the range to this point and thatís nothing but the distance and so you have a bunch of points here and each one has a range and its not hard to figure out, for example, the ranges you could figure out where the point is.
In fact, if you know the range from a beacon it means that the point lies on a circle of a fixed radius and if you have two of those, it means itís in the intersection, now itís ambiguous; itís down to two points that are possible, and if you have a third beacon itís completely determined. This is in the plane.
So, letís see how this works Ė letís see how linearization works. Well, here the mapping, here we have four ranges and itís a non-linear function of the point, XY of the location of the point, and of course, itís the square root of XI minus pi squared. Itís the Euclidian distance.
So itís this function, which is of course, non-linear in X and Y. Letís linearize around the point XY0 or in this case, you got the change in the range, thatís a four vector is equal to the change in X and, Y, multiplied by Matrix A. The entries of the matrix A actually can give you work out geometrically exactly what they are, they are unit vectors. Theyíre unit vectors Ė the row is a unit vector pointing from this point to the beacon, like that.
So basically it says actually itís the other way around, itís the negative of that, so, sorry. So if you step in this direction, you would reduce the range. If you step in a direction orthogonal to a beacon, actually to first order the range doesnít change at all. And if you step opposite direction, the range increases. So what that says is that this Matrix A which is 4 by 2 in this example, that these entries tell you how much small changes in position translate into small changes in the four ranges like that.
And so thatís the Ė and in many, many applications of this, for example, if you know where X and Y is at one step, itís now a little bit later, so it may have moved, but it hasnít moved far in a linearized model would give you an excellent approximation for how the ranges are gonna change, okay?
So this is sort of the picture. So thatís an example, and youíll I guess in Homework or something, youíll see plenty of examples sort of like this.
Letís just work out and actually just do this intuitively for this case. Actually for the specific case we have here, letís estimate, just by eye, some of the entries of A. So in fact, letís figure out for what I have right here, I would like to know what the second row of A looks like. So Iíll start with that entry. Whatís the number; whatís it look like?
Instructor (Stephen Boyd):What is it?
Instructor (Stephen Boyd):I'm not hearing anything clear, so I need someone to Ė
Instructor (Stephen Boyd):Here, I'm only asking you to do a thought experiment with your eye. The thought experiment with your eye is to move X in here, X and Y a little bit and to see what happens to the range. If you move X a little bit, suppose you increase it. Iíll take my coordinate system for where it goes like that, if I increase X a little bit, what happens to the range to Beacon 2? It goes down ĖOkay.
And it goes down Ė if that were actually horizontal, it were exactly horizontal, if it moves to the right a meter, letís say, the range is gonna go down by a meter. Letís assume thatís 20,000 kilometers or something like 20,000 meters, okay, as it might be in GPS.
So if it goes down Ė so this something like about minus 1, okay? And now tell me this entry. Itís about 0, okay, and that has a meaning here. And so thatís just a quick exercise just to do it, and of course, you should always be doing this and then of course, checking it against the formulas, just checks to make sure what you think and what the formula thinks is right. Okay.
Okay, thatís a whirlwind tour of various applications, and now I want to talk a little bit of one level above all these applications, kind of organize them. So there are a lot of applications or models or categories of Y equals AX and in fact, youíre gonna find out that much to use, you know, the material of the first four weeks of the class, everything is gonna hinge on actually hammering or pounding with more or less, you know, sometimes with little violence, sometimes with lots of violence, hammering your problem into a form that looks like Y equals AX, okay.
So itís gonna be that; thatís what youíre gonna be doing. And of course, in a real application it wonít look pretty like this. No one will ever walk up and say Y equals AX. Theyíll actually walk up to you and theyíll go on for hours and hours about their particular application and theyíll tell you about the stein model of the electron density and the troposphere. Theyíll go on; theyíll be cosines and sines flying around. There may be some PDEs and things like this. Thatís because theyíre in that field.
Itíll be your job to step back, ignore all of that and calmly step forward later and write it a Y equals AX. So thatís the way this is gonna work, just to give you a rough idea.
Now, what does Y equals AX mean Ė well, one area is estimation or inversion. Iíve already talked about that and when we looked at other examples, well when we actually get into methods, weíll see more.
In these ideals, X is something you donít know. Itís something you want to guess or find out or something like that. These could be parameters in a model. They could be a transmitted signal in communications, something like that. It could be something that you really want to know, but canít go and directly measure.
Y represents what you measure. So Y is something you do know or you can measure or something like that and from that you want to deduce X. That would be the type of thing you would want to do. A, in this case, is represents your measurement set up or in the communications context, itís your channel. So itís what maps, whatís transmitted to whatís received. That's what A is in that case.
All right, in a design problem, X actually is, in fact, itís the opposite. X is something is what we can control. X are the knobs we can turn; itís the design parameters; itís the thrust; itís the, that we can command an engine to give. It is control surface deflections. Itís things we can mess with, we have control over.
Y is an outcome. So here, in this interpretation, so Y is a vector of outcomes and in a case like this, you know, the problems are obvious. You would want to do things like this. Please achieve this desired outcome.
That means find an X for which Y equals AX. If you canít achieve that desired outcome, come as close as you can. These are the types of things youíll do. Youíll say, ďWhat outcomes can I achieve?Ē These are the types of questions in this interpretation weíll see.
Now, in another interpretation, Y equals AX will be simply a mapping or transformation, so it might be a geometric transformation or just some transformation. It might be a coding or a decoding or something like that and you think of X as an input, Y as an output. That simple.
It could be geometric; could be anything. Now, there actually even are combinations of all of these things, where itís a problem that itís partly inversion and partly involves something in design or something like that. You can have weird applications where some components of the X are actually primers you want to know. Other components of X are things you can mess with, okay.
That's every common, for example, when you have a system and there are two things affecting the outcome. First of all, when what you do; thatís the part you can mess with, and the other part is what noise or other people of interference does.
So you get all sorts of variations on this, but weíll come back to these models many, many times. Okay. So letís talk about estimation or inversion. So here YI is interpreted as the ith measurement or sensor reading, which you know. Thatís the idea.
XJ is the jth parameter or to be estimated or determined and AIJ now has a very specific meaning. It is the sensitivity of the ith sensor to the jth parameter, okay. So thatís the meaning of this.
A as a matrix describes the measurement setup, or if you like to think of this as a communications problem, itís the communication channel. Hereís some sample problems. The most basic one is this Ė given a set of measurements, find X. Thatís the most obvious thing you could ask. Then you could be more subtle and you could say, ďYou know what, actually find all Xs that result in Y,Ē and that means find the all parameter vectors consistent with the measurements.
Thatís a very interesting thing to do. For example, suppose itís empty. There are none, right. The answer is there are no Xs for which Y equals AX. Then it says this model is wrong and that could be for various reasons. It might be that in fact, the model is not this, but actually thereís a bit of noise added, all right.
But in any case, itís better to know that thereís some noise there than not, and the other option is that the noise really is insignificant, but a sensor has failed. Thatís another option, in which case this would be a very important thing to know that no X is consistent with the measurement you just make.
That means something is wrong with the measurements or with the model. And that could mean one or more sensors has failed, for example. And thatís a whole area that is widely used, fielded and so on. Health monitoring, sometimes called Ė okay.
Now, if there is no X that gives you Y equals AX, and maybe that's because of noise and not sensor failure, you might say, ďFind me an X for which the outcome,í if, in fact, the parameter had been X and you believe the model, youíd get AX and youíd like that to match very closely what you observe, okay.
So that says find a set of parameter vectors, which is almost consistent with what youíve measured, okay. So weíll spend a lot of time looking at that. These are the types of problems weíll look at.
Now, when you flip it around and look at the control or design problem, itís quite different. Here X is a vector of design parameters and inputs. For example Ė an [inaudible] example would be the mass force example where X is a vector of forces that give you a whole program of what youíre gonna do to [a mass, for example.
It could be the input [inaudible] a vehicle over time. Here Y is a vector of results or outcomes and the Matrix A describes how the input choices affect results. So hereís some sample problems. Find an X for which Y equals Y desired. In other words, hereís desired outcome, please achieve it. That would be one example.
The next would be find all Xs that give a desired outcome. Thatís find all designs that meet the specifications. Now, I want to point something out. If youíre doing estimation and the answer to find all Xs that result in Y is a big set. Suppose thereís a lot of Xs that give you Y.
In an estimation problem, thatís not good, obviously, because it basically says you cannot say what X is given the measurements. You canít. If that set is big, it says thereís a lot of ambiguity in the measurement system, okay. Now, on the other hand, mathematically itís the same question. Itís identical. Itís find all solutions of A equals Y.
If in fact, in the controller design case there are lots of Xs that reproduce Y. Thatís good. Why? What makes that Ė what is that good? Why is it bad for estimation and good for design?
Instructor (Stephen Boyd):Exactly. It means that lots of designs to choose from. Many designs meet the specification, achieve the desired result and it means you can now start optimizing over some other objective. For example, some costs, or the size of that. You could say, ďFind me a small X,Ē or if thereís an additional cost, you would say, ďFind me a cheap X that achieves my goal.Ē So in fact, in this case, ambiguity is good. In fact, ambiguity in this case is design choice, is what it really is. Okay.
And the example here would be among the Xs that satisfy Y equals Y desired if thereís many. Find a small one. Thatís something that generally speaking has the interpretation of an efficient choice of X, okay? By the way, what I'm saying, thereís nothing to it, so donít imagine that what I'm saying is deeper than the say it sounds. Itís actually less deep than what I'm saying. I just want to point these things out explicitly once.
The third broad category is mapping your transformation. Now you think of this as a mapping and you ask questions like this, ďIs there an X that maps to a given Y?Ē Find an X that maps to Y or you might say, ďFind all Xs that map to a given Y.Ē And you might say something like this, ďIf thereís only one X that maps to Y, find it.Ē That means a de-code or undo the transformation. So these are kind of ideas.
Okay, so thatís just to summarize the broad categories. You can go back and look at the examples we did and ask yourself in each case which of these three it is. By the say, some of them are not any of these three. Theyíre just Ė they donít categorize neatly.
Okay. I want to do a little bit of review of matrix stuff. I want to remind you to please read all of this and you can scan it very quickly, all of the notes on sort of matrix, matrix arithmetic and things like that, matrix using vectors on the website. Youíre supposed to know this material. This is just to make sure the Ė this is to make absolutely certain that you know the notation and things like that.
But here Iíll review some of it very quickly. So weíll look at matrix multiplication. Then weíll interpret it out of any practical context and weíll look at it just as ways to interpret Y equals AX, so one way to interpret Y equals AX is this; when you multiply AX, one way to think of it is you are forming a linear combination of the columns of A; thatís what youíre doing.
Because if A is written as A1 up to AN and these are the columns, then AX which is Y is nothing more than X1A1 plus X2A2. And so in Y equals AX in this interpretation, this is Ėthereís many ways to say it. Itís basically your formula, linear combination of the columns of A. X is the mixture parameters. It tells you how much of each column to mix in.
So if X4 is 0, it says donít throw in any 4 in any A4 into the recipe. Okay. So thatís sort of the picture there.
Now, here, this is quite useful. You should know this. If you have X is EJ, the jth unit vector, the jth unit vector is a vector with 0s everywhere and a 1 in the jth position, I do want to warn you about one thing. In some fields, I wonít even name them, the vector e without a subscript is meant to be the vectors of all 1s.
I think thatís weird and sick. Thereís probably some people here in those fields. I'm not gonna name those fields. Weíre gonna use the vector, which is a bold 1, like that, and thatís gonna mean all 1s.
Now, this is sort of a transition as we move towards more overloading. After all, we donít write 0 and then make it bold when itís a vector because weíve moved to the next stage of overloading where itís not a big deal. So I presume in five years, Iíll come along and these bold 1s will become just ordinary 1s. Weíll see how that works. Hopefully, the context will disambiguate it, but for right now, thatís that.
I just mention this because there are places where E is used to represent these vectors of 1s. Okay. But EJ, I think everyone kind of knows what that means. I think thatís quite standard. These are the uni-vectors. If you multiply the jth unit vector by A, if you take the column interpretation itís absolutely clear what it means. It means you are making a mixture of the columns, and what you should use is no column except the jth columns and you should use the unit amount.
So AEJ gives you the jth column of the matrix. Surely all of you have started working on Homework 1, I guess. Anyway, but Iíll just note that this is relevant to a problem, I think. Did we assign that problem? Okay, we did, yeah. Good, all right, it is relevant to a problem. Fine.
Okay, so it turns out thereís a duel interpretation, a row-wise interpretation. The row-wise interpretation goes like this: When you multiply a Matrix A by a vector, you actually write the Matrix A as rows and now when you multiply that by a vector X, what youíre really doing is you are taking the inner product of each row of the matrix with the vector X, okay.
By the way these have different interpretations if you go back to our, like, you know, control or estimation or something like that, this is basically saying that these As for example, in a measurement set up, each A is actually the sensitivity pattern of 1 Ė letís see if I can get it right, or an input, itís not the sensitivity pattern; itís the effect on all the sensors of an input at jth input. Thatís what this is.
And this says the overall input you see is a blend of this. In fact, you know what, these would be called signatures sometimes. So this would be the signature of the jth input. Itís the pattern of sensor readings you see due to that input, and this says that the overall output is nothing but the sum of the weighted signatures according for each of the inputs, okay.
Here you focus on a row, a row of a matrix in a measurement set up, a row focuses actually on a sensor and so basically this says, this explains it by saying you actually for each sensor, you calculate the inner product, this gives you the sensitivity of a sensor.
So this is just this way, so itís a way to calculate in batch a pile of inner products, so thatís another way to interpret matrix multiplication. Now the geometric interpretation of this again, this should be review, is something like this. If my row is AI tilde here, then these dashed lines give you the level sets of the inner product with AI, okay.
So for example, this line gives you that line gives you the set of all vectors, orthogonal to AI tilde. Theyíre the ones that have 0 angle with AI tilde. All the ones that have an inner product of 1 are here Ė thatís 2; thatís 3 and in the general case, these are not lines; theyíre actually hyper planes. So for example in R3, these are planes and you can actually think of the planes as stacked up parallel and the normal to the plane is AI here.
So in this case, what it means to say if I tell you, for example, if I tell you that YI is 1, it tells you about X, that it lies on this line. Thatís the picture. Now, if I had another column, another row pointed in another direction, that would give me another plane and I could get the intersection and so on.
So this will be useful. Now, the stuff I'm pointing out, now these are all totally obvious things; however, youíll see that soon youíll start putting together obvious things, the chain of 2 or 3 or 5 or 6 and you start getting things that are then not obvious. In fact, itís shocking how few obvious things you can chain together and get something that is not immediately obvious.
Okay. Now, another very useful way to think of a matrix multiplication, matrix vector multiplication is this: when you multiply a matrix by a vector, you think of a transformation and you could write down something called the signal flow graph or block diagram. Now, hereís an example with the 2 by 2 matrix. You could think of it as a transformation, two input signals produce two output signals.
Notice it says you get a mixture here and what you do it you write it this way. Now the semantics of a signal flow graph, you donít have to know it, but itís actually quite straightforward. Each node is a value; in this case itís an input value and this is delivered to something that scales by A11 and A12, and a node like this when two things come into a node and the semantics are that you should add the signals, okay.
So thatís what this is, and now you can interpret all sorts of things. You can think of like A11, well indeed, thatís the game from X1 to Y1. A21, you can interpret as sort of a cross game. Itís how much the input X1 affects Y2. If, for example, and it might have an interpretation, might Ė depends on the application of an interference term.
Right, if for example, these are knobs and these are outcomes, if this were diagonal or nearly diagonal, it means that Knob 1 affects Output 1 mostly and Knob 2 affects Output 2. By the way if itís a measurement system thatís also good because it means the first thing youíre trying to measure mostly affects the first measurement Ė that kind of thing.
So sometimes, not always, the cross terms have an interpretation of interference or something like that, but certainly not always. Now itís not interesting in general to write this out, but it becomes interesting when the matrix actually has a space, non-trivial sparcity pattern. So a sparcity pattern is of course, a substantial set of entries that are 0 as a meaning.
By the way itís also interesting Ė itís traditional when they gain is 1, to not draw it and to simply to draw a straight line Ė makes sense, in fact. So hereís an example. A matrix is block upper triangular, so it looks like that. What does it mean? Actually I'm hoping sort of after this lecture or actually maybe by the end of this week, you will never, ever look at a matrix as a spark without Ė youíll have an autonomous reaction that you cannot help.
When you see a matrix with a non-trivial sparcity pattern, youíll know what it means. You wonít just look at it and say, ďOh, yeah, sure, whatever,Ē or something like that. Youíll look at it. This has a meaning. This 0 has a very specific meaning. Well, itís very easy to write it out. If you write it out in block form, it means this and what this means if you stare at it, whatís missing here is the A21 term and itís missing right over here.
And what it says is that Y2 depends only on X2, so the second block of outputs donít depend on the first block of inputs; however, the second block of inputs do affect the first block of outputs and thatís what that means and you should just Ė when you see that, thatís what it should mean to you. Theyíre the same thing.
Now, when you draw the block diagram, well itís kind of obvious, you get this. And you know, you look at this and your eye is now telling you what you knew anyway which is the following, X1 doesnítí affect Y2. Why? Thereís no path in the signal flow diagram from X1 to Y2. So thatís what it means, okay?
Oh, these are kind of obvious things. Okay. Letís look at matrix multiplication. You know what matrix multiplication is; if you have an M by N matrix and an N by P matrix, the inner dimensions have to agree. You can multiply them and youíll get a matrix which is M by P and the formula is this; itís CIJ is the sum over K, intermediate variable AIKBKJ, like that.
Now, matrix multiplication comes up a lot. It has lots of interpretations. Weíve been looking at a special case where B is N by 1. So matrix multiplication, though, has lots of interpretations. Thatís one of them. Now, 1 is the composition interpretation. Supposed you have Y equals CZ where C is AB. What this really means is something like this, Y equals AX and X equals BZ.
So letís see Y equals Ė did I get this? Y equals AX and should I put a Z in there? No, no, I have it right. Sorry, Z is the input. Sorry. I confused myself. It represents a composition and in terms of all-block diagram, you write it this way, that Z gets mapped, first it goes into B; that produces BZ, which is X. Thatís here. And Iíve noted the dimensions of these signals here. X then goes into A and it comes out as Y, like this.
Now, the one thing youíll notice immediately is that block diagrams in algebra are backwards. Sorry; it just worked out that way. Somewhere around 1850, somebody made the terrible mistake of ordering the indices the wrong way or writing from left to right on paper or whatever. Anyway, it just didnít work out. You know, itís kind of like driving on the left or right, I guess except thereís no intrinsic difference there.
Here, it just sad that algebra Ė it goes backwards from block diagrams, but after awhile you get used to it and in fact, the way you can really make this really understand it is when someone walks up to you and you think of AIJ, the J indexes the input and I the output, so you can think of it this way, as input to output, so A32 is the gain from the second input to the third output, whereas had this all been done right 150 years ago, everything would have worked out.
Now, by the way, thereís weird subfields that have tried to do something about this. A friend of mine at Cal Tech decided about ten years ago, he had some kind of religious conversion he decided that he would henceforth write all block diagrams going from right to left. He stopped after a year or two, but anyway, it just Ė I donít know. It was fine, but I mean so what? So he did that.
There are fields, probability is one of them, where they decided instead of using column vectors as your basic vectors, youíd use row vectors, and then everything works out again, because vectors are row vectors and you multiply and block Ė all your intuition works there the right way.
So and thatís even arguably kind of right, but itís not the way other people do it, so sorry. So anyway, you just remember this picture. Just block diagrams and indexing in algebra as practiced by the vast majority of people who do algebra sadly are backwards. Okay.
Now, thereís lot of other interpretations of C equals AB. Iíll show you some. Hereís one Ė you can multiply AB. Itís actually a batch matrix vector multiply. So basically itís this; you take B and you write out its columns, and then itís actually A operating on the first column and AB. This is very useful. It means that if you ever need to multiply a matrix by a whole bunch of vectors, you can put those vectors into a matrix and do matrix multiplication.
I'm going to say something about that in a minute. Itís not obvious and there might only be a handful of people here who actually know what I'm gonna tell you.
Okay, now you can also write it out this way and itís block, things like that and Iíll give one more interpretation which is an inner product interpretation and here it is. The inner product interpretation says that when you multiply two matrices, the IJ entry is in fact the inner product of the ith row of A with the jth column of B. Thatís what it is.
And you know this because that is after all, how you think about when you multiply two matrices and you want to get some entry, you go across the ith row, like this or I donít know how you do it, but this is how I do it and you go down the jth column like that and you go down and youíre doing a cumulative sum.
Or if you like, you just think of that as an inner product of these two vectors; okay? So that's it, and you get all sorts of things. If you have a bunch of vectors, this thing called the grand matrix, that calculates actually all the inner products of a set of vectors, okay? Sop thatís the grand matrix.
So let me ask you a quick question on this right away. Suppose that A and B are square matrices and AB equals I. Now, you know what that means; it means that B is A inverse and it also means that A is B inverse. A and B are square, okay? Now, I want to ask you about this. Tell me, what is the inner product of, letís let AI tilde be the rows of A and letís let BJ be the jth column of B and I would like to know, what does that equal to?
Instructor (Stephen Boyd):Zero Ė provided I is not equal to J, okay? So youíve just deduced the following; the Ė letís see if I can get it right. The ith row of a matrix is orthogonal to the jth column of its inverse provided I is not equal to J. Whatís the inner product if I equals J Ė 1 exactly.
By the way, this fact will come up in lots lf cases and it will look totally magical and it will look like it came out of the blue and itís totally elementary. Okay. Let me say something about matrix multiplication via paths. You can Ė if you understand the block diagram B composition. AB is actually, and the way to think of it, of course, is this, is when you see Y equals ABX here, you should think of it, you should immediately write that down.
In fact, of course you can associate it anyway you would like. But this is the way, as an operator, you should interpret it first and what this means is that B is first, even those B is on the right and thatís why this diagram goes over here, like that, okay?
So this is AB and whatís very interesting here is this term, AIKVKJ; thatís the gain of a path from X1 to Y2, but itís the path that goes via Z2. And you simply multiply this gain and this gain, okay?
Thereís one other path, by the way. Thatís this one and if you add these two path gains, you will get exactly the, now let me get it right, the 2-1 entry of C, which is the product. So again, what Iím saying is obvious, but you put a bunch of these things together and all of the sudden, things are not totally obvious or something like that.
But in generally CIJ is the sum of the gains over all paths from Input J to Output I and in fact, specifically, when you sum over the K there, the K, if you wanted to put a comment in your code or whatever, K has a meaning. K is the intermediary node.
In fact, you would even literally say itís the sum over all paths from Input J to Output I via Node K. Thatís exactly what it means. So things like this should not be just definitions. They have a meaning, and this is the meaning. Okay, now I'm gonna say something; maybe some of you know this, maybe not, though, because they donít really teach this.
Supposed you have [inaudible] matrices, all right. Everybody knows the formula, CIJ is sum on K. AIKBKJ Ė here we go; thereís the formula. All right, now supposed you were gonna write some code for that. Doesnít look too hard. Looks to me like three loops or something like that for, your loops on I, you loop on J and then you run a sum on K. So itís like three lines of C or something like that, maybe four.
Okay, but whatever language you like, itís like four. Everybody understand that? And the number of operations you do, is gonna be on the order of, letís just make them square so I donít have to think about it, so theyíre all N by N and the number of operations youíre gonna do [inaudible] is on the order of N cubed, okay? Okay.
Now, I'm gonna ask you some cool questions Ė so hereís one. Could you do it faster Ė maybe some of you know. Does anyone know? Yeah.
Instructor (Stephen Boyd):You got it. Okay, this is totally bizarre. Thereís actually an algorithm which is order ended at 2.8, whatever and Jacob you could type in or you have a browser and find out what the current number is and the number, you know, varies. It goes down and it goes down by small amounts. You know, itís a huge big deal if you make a 2.799, or I donít even know what the number is.
It means nothing to you, I can tell. This is a hard Ė you know, I remembered that from the first day. It was like we were estimating things with the three big quantize and we were getting like, you know, six-bid accuracy and you were like, ďYeah, thatís cool.Ē
All right, so there is in fact, an algorithm that will multiply this out and itís something. I'm gonna leave it there because.
Instructor (Stephen Boyd):Itís 2.41? Wow, okay, wow Ė in progress. Okay, 2.41. All right, by the way this method actually at the moment has no practical use whatsoever. Now, itís an extremely interesting topic in computer science. It doesnít have, the reason is that the constant in front is absolutely enormous and youíd have to have matrices so big that you canít store them all that kind of stuff but you never know, some day it might actually come up.
Iíll tell you a little bit in a minute how itís done, although itís related to the Ė now letís go. Thatís the theory; letís talk about the practice. So hereís my Ė I have a practical question for you and itís this. Everyone here could write a four-line C or whatever program to multiply two matrices. Letís say thousand by thousand. Everyone here could do that and it would compile and it would work fine and it would produce the product.
It would produce C, given A and B, right? Now, the question would be, oh and everyone would be doing because no one here is gonna write the super-fancy stuff that does this while doing that formula right there. Thatís what weíre implementing. Weíd all be doing the same amount of floating point operations and the question would be could someone else, me or someone else or something like that, write a function that multiplies two 1,000 by 1,000 matrices faster than you? Thatís the question.
What do you think? This is not exactly a complicated Ė it doesnít look like a complicated thing to do, right? It doesnít look to me like thereís a lot of room for programming creativity here. Anyone know the answer? Whatís the answer?
Instructor (Stephen Boyd):There we go. Okay, perfect Ė so I told you; I told you a few people in here would know. Most of the people donít. Guess what? Someone who knows what he or she is doing will beat you so badly you wonít even know what happened in a thing like this. It could be if youíre lucky, you would be slow only by a factor of 4. If youíre unlucky, youíd be slow by a factor of 100 or more, okay?
So, now thatís pretty bizarre. You are doing exactly the same floating point, exactly the same calculations. Youíre doing 1 billion calculations, so thatís not the difference. The difference has to do with memory and the cache hits and misses and things like that. Let me say how thatís done, just for fun. How many people knew this or knew about this Ė so certainly some.
Okay, so by the way, do you have to write this code? No, itís all automated; itís all open source. Public domain has been used for 15 years and in fact, when you go to [inaudible] lab and you type in A times B, guess what happens? It calls this open source library; itís called LA path, okay?
So let me tell you how that works. The way it works it this. Itís another interpretation of matrix multiplication. Iíll just say a little about this because itís fun and the stuff weíre doing now is so boring, that it seems like I should introduce things so people donít know. So what you do is this; suppose your matrix is, I donít know, 1,000 by 1,000 and I block it up into, I'm gonna make it actually 1,200 by 1,200 and I block it up into little chunks of 400 by 400 matrices, okay/
If I multiply them, the same formula works. In other words, I multiply this matrix times that one plus this matrix times that one, plus this matrix times that one and that gives me the one-one block of the product. So you can multiply matrices in a block way. Okay? Everything works. You can just; you can multiply matrices. Now, if you add up the number of operations you just did, congratulations. Itís exactly the same, so that didnít change and you know, how could it have changed. Thatís kind of stupid.
Actually, if you want to know where does that 2.4 comes from, it comes from this. Itís the observation that when you multiply two 2 by 2 matrices, there is a way to multiply them. This is the most basic one to get less than 3 is when you multiply them, thereís some weird way to do it like you know, with 7 floating point operations, as opposed to 8.
Now, you [inaudible] that, so you take a big matrix, you divide it by block 2 by 2 and you do the 7, you do that smaller and smaller and youíre gonna actually find out you get M to the 2. something; however, itís the same idea that works in practice. So the way this works in practice is why might it be faster to block a matrix than multiply it? I know thereís ten people in here who know. How about some of the others?
Instructor (Stephen Boyd):What?
Instructor (Stephen Boyd):Precisely, exactly. Right, so if for example, one of these guys fits in L1, registers or something like that, itís gonna be way fast. The whole thingís gonna get pulled into L1. All the operations will be L1 and you wonít have to wait for that, or if this is really big, disc stuff which is like an eternity.
Okay, so actually with the memory locations are adjacent. You have a very nice memory flow and data flow and stuff like that and something like this. So, the way this really works is this, on a processor, you run something thatís called ATLAS, which is Automatically Tuned Linear Algebra Software. It runs on your machine and it determines optimal cache sizes for your processor, your L1, your L2 and things like that.
And when you then say in effect, in that lab when you multiply two 2,000 by 2,000 matrices, you will be using an optimized BLAST. BLAST is Basic Linear Algebra Software and it will block it up into, I donít know, 30 by 30 and then below that, something smaller and I donít know. But the point is, it will be way, way faster than if you wrote out in your four-line C program.
Okay, so, I said something that now, especially those who have a strong mathematical background, and are currently bored out of their mind, didnít know, for the record. How many people care? Oh, well, thatís another story.
Okay, what Iíll do now is, this is gonna be on a review, this lecture. Weíre gonna cover a lot of stuff that you should have seen before, but every year 20% of the people in the class actually havenít seen it before and they generally do just fine.
So Iíll define things like vector space and sub spaces, but I'm not gonna go into the horrendous detail that you would in a normal linear algebra class. So a vector space Ė thatís sometimes also called a linear space over the reels, although occasionally we will look at vector spaces over other fields like the complex numbers.
We wonít look at it, but for example, bouillons are finite fields like that. And it consists of a set of the points and theyíre called points or vectors. You have a vector sum and thatís a function that takes this argument two vectors and returns another vector. And you have a scalar multiplication, but the vector sum is denoted Ė you donít write it this way, you know. You donít write it as V-sum XY, although you could.
Itís actually traditional to write it this way. But in fact, itís really that. And thereís a scalar multiplication that takes the scalar argument in a vector and returns a vector and again you know, you donít write it this way, following this thing, alpha X. You simply write it as alpha X for compact notation.
And you have a distinguished element 0 in the vector space. Thatís not the 0 number and in fact, 15 years ago, you might have made that 0 bold or something like that to sort of give you a hint that youíre overloading it and itís not the number 0. Now, these have to satisfy a list of properties. They are Ė and I wonít go over, you know, your model for vector space is something like M vectors, and I wonít go over this in too much detail.
Has to be commutative. Has to be associative and that means that when you add them, you can add them in any order. Zero has to be an additive identity. There has to exist a negative for each vector. Scalar association is associative; thereís a right distributive rule, a left distributive rule and the scalar 1 multiplies by X has to give you X.
So it has to satisfy these properties. And let me ask you a couple of questions, how about doubles in a, if you write just the arithmetic of I triple e floating point standard, letís say doubles. Is that commutative?
Everyone know what I'm talking about? I'm talking about writing a one-line program where you add two numbers.
Instructor (Stephen Boyd):Okay, this is associative?
Instructor (Stephen Boyd):No, exactly. It is not associative. Itís awfully close, but it is not associative. Okay, so you can get round-off error easily in this case, okay. Itís not exactly, so itís very, very close, but itís not exactly. Okay. I wonít go into any of the others? How about Ė is scalar multiplication associative? Just for fun.
Instructor (Stephen Boyd):No, itís not because you could underflow or overflow something with these things. There are things here that you really donít have to worry about too much, but itís good to know that in fact, these are already not true for doubles on a computer. These would be true.
You can Ė there actually are systems for algebra and other things, which are infinite precision, in which case, all these things hold exactly. So for example, if you store numbers, as rationals as ratios of integers, then you can arrange for all of these things to hold exactly. And there are systems that do that.
Okay, letís look at some examples. Well, the [inaudible] example is just RN, so thatís the set of N vectors plus the column vectors and you have standard component wise vector addition and scalar multiplication, so thatís the standard. Hereís another one; itís a big silly.
Itís a single vector in RN that has all 0s in it. Hereís one. Itís the span of some vectors where the span of some vectors is the set of all linear combinations of them. And these are some particular vectors in RN, and so thatís Ė now you have to check that these are, in fact, that these are actually vector spaces and you have to check various things.
The way people, and the way you think of this checking process is something like this. You check that for example V3 is closed under-scalar multiplication and you check whether itís closed under addition; that would be the thing you do, so youíd ask yourself if a vector is a linear combination of V1 through VK and another one is as well and you add those vectors, is that a linear combination?
The answer is yes it is and you can construct the linear combination from the other two linear [inaudible] by adding the corresponding coefficients, okay?
Now, a sub space is a subset of a vector space that is itself a vector space. So thatís a sub space and it means something like, you know that itís close under vector addition and scale in multiplication and these examples that we just looked at were sub spaces and again, this should be review.
And let me mention a few, something that you may not have seen, which are infinite dimensional vector spaces, so letís look at hereís a vector space. Itís gonna be this. Itís the set of all functions that map R plus, thatís the interval of 0 infinity into RN for which X is differentiable, okay?
Now, this is quite complicated because points in V4 are themselves functions. So actually if you like, you could think of things like a short, but complicated C declaration or something like, for something like V4, so itís elements are themselves functions that take as argument, a non-negative real number and return a vector.
So by the way, you can think of these as trajectories if you like of something, like a trajectory of a vehicle where X1, X2, and X3 are the positions and something like that and X4, X5, X6 are the velocities or something like that; it could be anything. Itís a trajectory.
Now, you have to Ė you canít just say, ďHereís the set.Ē You have to say what the sum and what the scalar multiplication is. So how do you add two functions? Well, we have to define that and the way we define it is this, so this plus is what plus? Is that the plus of two scalars, two vectors, RN?
Thatís not RN. That plus is what plus. When this is resolved by context what plus function actually does this refer to? Is the plus in V4; itís the plus that adds two functions because the data type of X is a point in V4 which is a function. That is function addition. So whatís the data type of X plus Z?
Itís a function. Itís a function taking as argument a non-negative element and returning a vector. You then call it with the arc T and what is the data type of that? That's RN because thatís the return type of X plus Z. Thatís RN, okay.
That is also RN, that is RN, what plus is that?
Instructor (Stephen Boyd):Thatís the plus in RN and that equals what is the equals in RN, exactly. Okay, so I wonít do that much more, but the problem is when you see things like this, you know, in fact you look at it and it just looks so innocent. You just rearrange the same stupid, or whatever; itís like 8 ASKII or 7 ASKII characters or 9 or whatever it is, you just rearrange these characters a little bit.
So watch out because these little rearrangements of 9 ASKII characters on the left and right-hand side to which itís very easy to look at and go, yeah, yeah. It can mean a lot because of overloading, so watch out.
Okay, scalar multiplication is you have to say what it means to multiply a scalar by a function, and thatís defined this way, okay. Hereís a sub space. If the set of differentiable functions or trajectories here that satisfied X. equals AS; thatís a linear system, so you now know that the set of linear, the solutions of an autonomous linear dynamic system is the sub space, okay?
Okay. Thereís the concept of an independent set of vectors. You say that a set of vectors is independent if the only way to make a linear combination of 0 is by putting in all 0 in the recipe. Thatís when it needs to be independent. Now, what I'm saying is very, very important here.
Independence is an attribute of a set of vectors. It is not an attribute of a vector. Okay. So it makes no sense to say, let me tell you the slang. On the streets, people say this is a set of independent vectors. Anyone would understand what you just said if you said itís a set of independent vectors.
But thereís a big difference between Ė youíd say that the same way youíd say this is a set of non-zero vectors. When I say this is a set of non-zero vectors, it means itís a set of vectors, each of which is non-zero.
If you use the same expansion, for this is a set of independent vectors, you mean something like itís a set of vectors, each of which is independent. I mean that has a meaning and it is absolutely not the same as what the person meant.
So weíll try, I will slip into it. After a few weeks, I will talk about an independent, so letís say it right. The correct way to say it, maybe weíll just practice it all together is to say, to say, I canít even get it Ė let me try. Iíll try very hard. Youíre talking about an independent set of vectors. That was correct. You donít talk about a set of independent vectors.
Now having said that on the streets, youíd talk about a set of independent vectors. But I want to make very clear, it makes no sense, the argument of independent is a set of vectors; itís not a vector.
Okay, so what does this mean? To say that the only way you can add some vectors up and make 0 is by plugging in 0 coefficients. It says basically that the coefficients are uniquely determined. Thatís very interesting. It says that basically if I make a vector with one mixture of the Vs, and if I make it up with another mixture of the Vs, it turns out the recipes are identical.
Thatís what it says. How do you show that? Well, you subtract this from this and you get sum alpha I minus beta I times a VI equals 0 and then if theyíre independent, you conclude immediately by this that alpha I minus beta I is 0, so the recipes are the same, okay.
Hereís another way to say it, no vector, if you have an independent set of vectors, itís like if you can watch me, when I slip back into the slang you can start complaining or something. I really should just should always say independent set of vectors. So, all right, so no vector in an independent set of vectors, can be expressed as a linear combination of the others and thatís if thatís true itís an independent set of vectors.
Okay, and you can sort of check. Iíll let you work out to convince yourself of these things, okay. Basis and dimension Ė well you say that a set of vectors is a basis. Basis is like independence. Itís argument, itís a set of vectors; it is not an attribute of a single, of the vectors individually, like for example, non-zero or normalized which means the norm is 1.
So you have a set of vectors is a basis for a vector space if they span the vector space, and what that means is anything in the vector space can be written as a linear combination of these vectors and if theyíre independent and you know what that means. It means every element in the vector space can be uniquely expressed as a linear combination of these basis vectors. Okay, so thatís what it means to be a basis.
Weíre gonna look at a lot of examples and things like that, so if this is going by too fast, basically I'm going fast because first of all, itís review and second at this level, itís kind of boring. Okay, so hereís a very basic fact. If I have a vector space, there are many bases for a vector space Ė many.
But it turns out thereís an invariant among those bases, so in other words, thereís one attribute of a basis that doesnít change, and that's its cardinality. So the number of elements in a vector space, never Ė itís not the same. You canít find a basis with six elements where you found one with seven. Itís impossible.
If one basis has six elements, all bases have six elements and thatís called the dimension, that unique number. Now if thereís no basis, then we say the dimension is infinite, okay. And I want to mention something here thatís something a bit different. You will hear the word basis used in other contexts. For example, you might be taking a course in Fourier transforms or something like that.
And people will talk about something like this. Theyíll say that sine, you know, KT and cosine KT where K equals you know, 0, 1, blah, blah, blah. Thatís an infinite number of periodic functions, but these form a basis for sort of periodic functions, 2 pi periodic functions. So youíll hear people talk about that. Thatís actually not a basis in this algebraic sense.
Itís a different concept I just mentioned. So thereís a question?
Instructor (Stephen Boyd):Thatís right.
Instructor (Stephen Boyd):Thatís correct. This is the definition of basis. In other contexts, for example, in some infinite dimensional spaces, actually I'm not even sure they use the word basis there. I actually have to go back and check. They might hedge. So what this is is, I guess a basis there means that any element can be written as an infinite sum. For us thatís not the case. Thatís not what a basis means. It means a finite sum.
And actually for someone doing just linear algebra, it means finite sum Ė actually just algebra, not linear algebra, just algebra. It means a finite sum. Just to warn you, you hear basis in different contexts. Another weird one is this, which I find very strange, but this comes up in signal processing now. I donít know how they got away with this and even statistics. Now, let me say what it is.
Suppose I have a vector space of Dimension N. Letís say itís R10, okay. So the dimension is 10. And letís say Ė if I come up with, you know, a basis V1 up to V10, just make these the unit vectors. So there, E1 through E10. That is a basis for R10, okay?
Now, suppose you add some other weird stuff. Suppose someone walks up to you and says, ďHereís my basis, V55,Ē okay. If I say, ďHereís my basis for R10.Ē You would say, ďNo, no, no, come on. Thatís not a basis. Youíve got 55 elements and the dimension is 10; itís not a basis.Ē And they go, ďOh, youíre still stuck in that old idea of basis.Ē
This is, they call this an over-complete basis. I'm not kidding; this is not a joke. This is actually intelligent people on this campus. They say stuff like this. I mean itís weird. Thereís a name Ė and you say, ďWhat the hell does that mean, over complete basis?Ē
For hundreds of years a basis has meant independence and spans, and they go, ďOh, yeah, thatís what it is, except theyíre not independent.Ē They go, ďOkay, but we have a name for that; thatís called expand the vector space.Ē Thatís what weíve called it the last 300 years.
And theyíre like, ďOh, oh, no, this is like wavelets and blah, blah, blah.Ē And you know, this is why image processing and modern this and that, and it just, anyway makes no sense. But thatís fine; they can do whatever they like. They can take a term thatís been used with no ambiguity for several hundred years, turn it around and use it some sick way for their stupid sub field, thatís fine with me.
So you will hear things like that, over complete basis, which is just like a joke, I donít know what it would be. It would be something like a non-true theorem. Itís a new concept really and it describes a theorem. Itís like a theorem; it makes a specific statement, but itís false. Or itís not always true and so that would be a new thing, and weíd call it a theorem, but itís like a theorem, but itís just not always true, you see.
So but anyway, youíll hear it there and youíll hear it in other places, too, but I think they call it frames. Thatís another name for it in signal processing, okay.
But all of these things youíll find a local dialect will emerge when they start using these terms and then the local dialect, it depends on how isolated it is from the rest. You know itís kind of like evolution in Australia or something like that. And some of these fields are like pretty weird, right and they go out there and they go off and they have a whole theory and they make whole new names for things like basis, independent and so on.
All right, so weíre now gonna get to Ė weíre gonna keep going on this review of linear algebra. And this is the part in linear algebra that really I mean I guess most people when I think back, when I talk to people and I say, ďWhat do you remember of your linear algebra class,Ē most people say as little as possible.
But when, in fact, the memories come back, they remember somebody at the board droning on and on about the four fundamental sub spaces. And Iíd say, ďWell, what are those,Ē and theyíd say, ďI donít know.Ē Or theyíd remember some of these things; so thatís what weíre gonna talk about now.
A quick overview Ė I know a lot of you have seen this; this is review. So I'm gonna emphasize is not actually the technical aspects which I presume was covered in linear algebra class you saw. I'm actually gonna concentrate on what the meaning is.
So the null space of a matrix, of an MIN matrix is defined as this. Itís the set of points in RN, thatís sort of the Ė I think of that as the input space that gets mapped to 0 by the matrix. Thatís the null space, okay?
So null space is actually itself an operator; itís a complicated one. It takes its argument and M by N matrix and it returns, the data structure, the data type it returns is a set. In fact, itís a sub space Ė thus its name.
It returns a sub space of the input space, okay. So itís easy to just look at this, but these are very complicated operators and thatís what the null space does. It takes its argument a matrix and it returns a sub space of the dimension of the input of that matrix, if you want to call it the input.
Well, itís obviously, itís a set of vectors thatís mapped by 0 by Y equals AX and is also interesting. Itís also the set of vectors that are orthogonal to all the rows of the matrix A. Why Ė by the row interpretation of the matrix vectors multiplication. Because to say AX equals 0 says the first row of A in a product with the vector, thatís gonna come up backwards for you, but youíll Ė I should learn to do that backwards, but I probably canít so is 0, and that would be true for all of them.
Now, the meaning of it is this. It gives exactly the ambiguity in Y equals AX. The ambiguity in X in Y equals AX, because it says this. If you have Y equals AX and you have any element of this null space and you add that element to X, then A multiplied by X plus Z gives you, if you expend it out, AX plus AZ, but AZ is 0 and thatís AX, so you get another solution, okay?
That means if you have a solution to Y equals AX and you add to X anything in the null space, youíre gonna get another solution. By the way, there is, the null space is never empty. It always has the 0 vector in it, okay, because the 0 vectors is always mapped to 0, okay?
All right, so thatís the null Ė so it says that if thereís more than a 0 vectors in the null space, it basically says, thereís gonna be ambiguity in X if you have Y equals AX, all right. Now, turns out it exactly characterizes it. And that means this Ė suppose you have two solutions, Y equals AX and Y equals AX tilde, then it turns out that in fact, X and X tilde are related because theyíre difference is in the null space.
So it basically says this idea of adding something from the null space into a vector to give you a new solution thatís completely general. All solutions come about that way, okay. And thatís easy to show.
But letís think of some examples, just real quickly. Hereís one. Tell me this. Letís think of the forced mass example. Letís make for ten steps. So X is the forces you apply over ten seconds, each for one second and YAX is gonna be the final position in velocity. Thatís a 2 by 10 matrix A, okay and what I would like to know, is tell me about the null space, I mean not a lot, but what does it mean?
First of all, is there any element in the null space other than 0 and what would it mean? Whatís the answer? What is the meaning Ė what have I said? This is in the null space and I told you [inaudible]. Thereís like 1 minus 3 plus 5 minus 8, 0, 0 plus 4; how would you check me?
Instructor (Stephen Boyd):Yep. You would think of the product with both rows, which means you would actually multiply A by Ė youíd take my F; youíd multiply it by A and youíd get 00. If you didnít get 00, youíd say, ďItís not an element of the null space.Ē If you get 00, itís an element of the null space.
Now, my question to you is what does such a forced program mean?
Instructor (Stephen Boyd):Okay, so I would say that my interpretation of such a forced vector is itís a joy ride. It means Ė no, the condition is this. Itís any vector of forces that has the following property. It acts on the mass, okay. It can shoot off to the right. It can shoot farther to the right. It can shoot off to the left and go back and forth, but after ten steps, it must land exactly at the origin and with 0 velocity.
In other words, itís a set of forces that ultimately does nothing, okay? Now, by the way, is that a good thing or a bad thing? Well, if youíre paying a bill for the fuel, thatís probably a bad thing, but it may not be a bad thing. I mean itís neutral. Okay, so now the question is somebody give me an element of null space.
Instructor (Stephen Boyd):Zero Ė that was good. That was, okay. Somebody give me a non-zero element of a null space. That was very good; that was very quick, very good, and it was correct. How about a non-zero element of a null space?
Instructor (Stephen Boyd):So youíre forced, you wanna go 1 minus Ė 1, and what 0s, after that?
Instructor (Stephen Boyd):I donít believe you; you know why? Your final velocity will be 0, but your position will be 1, so thatís not gonna work.
Instructor (Stephen Boyd):Another minus 1? No, no, no, no, then your velocity Ė how about this, minus 1, 1, and then how about that. Does that work? That works. This moves you, so youíre now stationary after two steps youíre stationary, one meter to the right and then this undoes it and moves back. Thereís an element in null space, great. Any others Ė or is that it?
Instructor (Stephen Boyd):5 minus 1s?
Instructor (Stephen Boyd):And five 1s. No, no, I donít think so. No, no, no, no. Thatís Ė your velocity will be 0l but youíre gonna be way, way off to the left. So I donít like that one. Maybe we could shift this one. We could put Ė we could shift this to the right and generate a lot, because the point is all this discussion suggests the following. Thereís lots of elements in the null space, okay?
And actually it already means something interesting. It means this. It says when someone says you have ten forces to apply to the mass, I want you to be at a certain position in velocity at the end of the this ten seconds, thereís a giant pile of force programs you can use to affect this; thatís what it says.
Theyíre characterized in fact, by the null space, along with a particular solution. So thatís what it means, Okay. That was just to kind of see what this looks like. Thatís what the null space means. Okay. All right, so if a matrix has only the 0 vector. If the null space, right, so that means the null space is 0, by the way thatís the set consisting of the 0 element. I probably donít have to say that. That is not this and it is not this, okay?
This makes no sense. This one makes sense, but is impossible and wrong, okay? So this is, I guess we would call this a semantic crime. This would be a syntax crime, right, because it doesnít even make sense. Thatís a set and thatís a single element. So I mention this, right. Okay. So thatís just a little, a small syntax rant there.
So if the null space of A consists Ė is the set consisting of one vector. Now, by the way, the way you say this on the streets, is you say 0 null space or the null space is 0. Okay, thatís the correct way to way it, just so you know. If youíre deposed or thereís lawyers present, thatís what you say.
But otherwise, you know, informally, you say, ďYeah, a null space is 0 or something like that. Thatís fine; but you would never write that. Now, this has a huge meaning. It says this. It says the vector X can always be uniquely determined from Y equals AX. Now thatís very interesting.
It says basically that if you think of this as a transformation, no information is lost. It means we can undo it. By the way we do not get no how to undo it. Weíll get there, but the point is it means at least in principle, no information is lost. It can be undone. If I give you Y, I can reconstruct X. We will very shortly see how to reconstruct X.
Okay, so you donít lose information. By the way in an estimation problem, this or reconstruction problem, this would be very good. It means your sensor sweet or the measurements you proposed to make are good enough that collectively from your measurements you can deduce the parameters you want to deduce.
Okay, thatís what it means. Itís a good thing in that context. In the context of design, itís maybe a bad thing, possibly. Maybe not Ė in terms of your leisure time itís good because basically it says there is no design problem. Once someone specifies why, thereís only one X; it satisfies it. Thereís nothing for you to do. So thatís bad, by the way, from an employment perspective.
Now, this also means the mapping from X to AX is 1 to 1. It means basically that if you draw one of these pictures like this, and you show elements here in RN that get mapped to RN, it says that different things go to different ones and if you donít have this, you canít have two things landing on the same element.
Or another way to say it in a context like one of the things we look at, it would basically say something like this. You canít have Ė letís go to geophysics, you canít have two subterranean patterns of densities that produce the exact same gradiometer readings above earth. That would be great if it were true. Itís not, by the way.
But if it were true, it would be great. It would mean in principle, you know, using some big computers you could reconstruct exactly what was below you, okay.
Iíll mention just a few things and then weíll quit for today. Another way to say that a matrix is 1 to 1 Ė thatís also slang for it, that the Ė itís not slang; itís okay. Itís 0 null space or 1 to 1. It says that the columns of A are independent, so it just rewording the same thing.
Why? It basically says to say that the columns of A are independent, says that whenever you make a linear combination of A of the columns and it adds up to 0, then the linear combination of the columns, thatís linear vector multiplying. So whenever you matrix vector multiply A by a vector and you get 0, the only possible way that could have happened is if that vector was 0. Thatís the vector of coefficient. Thatís just a restatement of the same thing.
Weíll start on this next time; this is kind of where it gets interesting. And this is not obvious, this one. Not remotely Ė this says that if a matrix is 1 to 1, it has a left inverse. Thatís really cool. Thatís a matrix. You multiply on the left by the Matrix A and you get the identity. Okay, so thatís the picture.
By the way, this is extremely important. Letís think about what it does. If you have Y equals AX and let me multiply both sides of this equation by Y, I get BY equals BAX which is IX which is X, okay?
That says if you have a left inverse of a matrix you should think of it Ė what it is is itís a decoder. It is a perfect decoder, a left inverse is what it is. If this is a channel in a communication system, this is a perfect equalizer.
If this is some kind of measurement Ė if this is a measurement system, this B is the matrix that processes your measurements and gives you what the parameters are. Itís a perfect thing that undoes A. Itís also linear anyway. So weíll continue with this next time.
We havenít said how to find a left inverse, but you will know very shortly how to do that.
[End of Audio]
Duration: 80 minutes