Instructor (Stephen Boyd):So, letís start with a couple of announcements. I think weíre a little bit behind on updating the website of the readings. So it should be kind of obvious. You should read the chapter on geometrical problems. Thatís kind of obvious, I guess because weíre going to start that today. The other is you should read this appendix on numerical linear algebra. Itís not very long; itís like 15 pages, just because weíre gonna hit this material next week, and whatís that?
Instructor (Stephen Boyd):Oh, itís more like 15, yeah, not 50. You wanna read this because the fact is you donít want to hear all this material for the absolute first time when I cover it like next Tuesday or something. So thatís just a hint that weíre not gonna be moving slowly on this material.
Okay, I want to finish up something about an experiment design which actually relates to our next topic, which is geometrical optimization problems, so experiment design, you recall from last time is this. Youíre gonna make a number of measurements and you have a pallet of possible measurements youíre allowed to take.
Thatís V1 through VP, parameterize these things, and youíre given a budget, for example, just a total number. Youíre just told, you know make 40 measurements out of this pallet of 50 possible measurements. Which 40 would you take?
And we talked about that last time. When you commit to a set of 40, which is actually nothing, but a set of integers on you know, M1 thorough MP that add up to 40. When you decide on your allocation, youíll be scored on a covariance matrix. Thatís your error covariance matrix.
Now, thatís obviously a vector optimization problem because one personís error covariance matrix could be very small in one direction, big in another and then you could have the opposite, so thatís kind of obvious.
The first thing is you have to scalerize. Then the second is there is this issue, itís a smaller issue, itís the issue of a relaxation from an integer problem to a continuous problem, and we talked about that last time.
So the most common scalerization is de-optimal experiment design. So in de-optimal experiment design, you minimize the log determinant of the covariance matrix. So this is a covariance matrix. This is, of course, convex clearly in the [inaudible] here. And this corresponds geometrically to minimizing the volume of the corresponding confidence ellipsoid.
This is sort of like if youíre doing a problem involving positive definite matrices where you want it small, log determinant or determinant is something like the least squares there. Itís kind of like if you canít come up with and defend another measure, then you might as well just go with log volume or something like that. So itís something like least squares.
Okay. So this is the de-optimal experiment design. Now, one of the things is when we start doing applications, it gets extremely interesting to do things like taking out a problem that comes up as an application. This is an experiment design. This is the optimal experiment design and you actually worked out a dual for this problem.
Whatís often the case is that a dual of a practical problem will actually turn out to have an interpretation, which is another practical problem, and it can be one from a totally different field or something like that. After the fact, everything will make sense, but it will not be remotely obvious at first that the two things are connected.
So in this case, a dual, which Iíll work out shortly is this, so these two are duals, and of course, what I mean by that in the loose sense, what I mean is that I transform this slightly, formed a Lagrange dual, and then made a few more small transformations of that one.
Maybe I solved a [inaudible] variable. Maybe weíll get to that. This is called Ė so these are duals in the loose sense. However, they are duels in the following sense. Any feasible point here provides a lower bound for this problem period.
Okay, thereís an easy transformation Ė if you solve this problem, you can solve that one and vice versa. Thereís zero duality gap here, absolutely zero in this case, because the weak form of Slater holds. Slater condition says you have to have a point thatís strictly feasible.
Well, sorry, the strong form of Slaterís condition hold here because Iíll just take all the [inaudible] to be one over P, okay? And then the strong form holds, but anyway, so zero duality gap between these.
Now, this problem actually has a very simple interpretation, [inaudible] as a constant, so youíre maximizing log det W or if you like, youíre minimizing say log det, say, W inverse. Thatís the log of the volume with a constant multiplier of the half or something like that.
And then also an additive constant, but then an additive constant and a factor of a half or something like that, thatís the log of a volume of an ellipsoid, and this would be K transposed W inverse parentheses inverse DK less than one, and this is the problem of computing the minimum volume of ellipsoid that covers the points VK.
Okay, so I mean this sort of, by the way, this problem comes up a lot as, well. In fact, Iíll mention statistical application of this right off the bat. Actually weíll get to some of that in the next lecture, so maybe Iíll wait. So what this says, and you can work out what complimentary slackness is. Complimentary slackness says that if [inaudible] is positive, that means you actually carry out an experiment of type is [inaudible] three is positive, that means youíll actually use B3.
Youíll actually do a positive number of those experiments. Then you must have one minus VK transposed W VK 0. That means VK is on the boundary of the smallest enclosing, the minimum volume enclosing ellipsoid. So the picture is this Ė a very simple one. We looked at this last time, but very briefly. You have something like 20 choices of experiments to carry out here.
Okay, now these experiments are pretty much the same as these, I mean thereís a negative sign; it doesnít matter. But the vector V is smaller and that corresponds to, you know, twice these measurements are the same as these and have twice the signal to noise ratio.
So it would be foolish to choose any of these and indeed, none are chosen. Now, over here, all the weight is put on these two, and this sort of makes sense because the only difference among these is the angle and given the choice of measurements, you would take measurements that are to the extent possible orthogonal. Okay.
So itís actually short of chosen, this point and this point and this is actually put a weight of .5 on each of Ė meaning if you were do to some, if you were to give an budget of experiments, half should be here and half should be here.
Thatís what it means and it kind of makes sense. There would be no reason to skew them because theyíre approximately, theyíre 30 degrees apart or 35 degrees. I mean theyíre as close as youíre going to get to orthogonal. Theyíre the most mutually Ė they have the largest angle and so they give the most mutual information or whatever, not any information [inaudible], but theyíre maximally mutually informative if you take ones that have a high angle, so thatís what these are.
Now the interesting thing is the dual has a perfect interpretation. You take these 20 points and you compute the minimum volume ellipsoid that covers them centered at zero and that is, in fact, the one drawn here, and sure enough it touches two points. These two here, and these two actually are gonna be the ones you use in the experiment design, so thatís another way to understand in the experiment design youíre choosing points that are far.
Now, itís not as simple or as Ė itís not as simple as choosing the points that are farthest. Now, that would be a greedy scheme to choose the experiments that have the highest signals to noise ratio. Thatís not the case because in that case, if one sensor had a very high signal to noise ratio, youíd basically only take that one. You actually compute the minimum volume ellipsoid that covers these points and then these give you the other ones that fill in other directions and so on.
Okay. So thatís that. And I think Iíll skip the derivation of the dual, other than just to mention a few things about it. It should be straightforward for you now. Thereís no way you could look at this and just say, ďOh, okay.Ē But, you know, itís something that in a few minutes you should be able to trace through. Itís quite straightforward.
The only interesting thing here, I donít know if weíve done enough problems on this or weíve gone over this enough, is actually when you have a matrix problem like this, you actually have to calculate the gradient of the log determinant of inverse of a matrix.
Thatís complicated. The gradient is a linear function on matrices, and you actually have to, I mean donít just take, the formula is basically itís minus X inverse. But you have to be very, very careful to understand what that means.
Thatís actually covered in an appendix in the book, so Iíd recommend reading that. In fact, letís add that to our list, and I donít remember what appendix this is. Okay, so itís the one that covers, letís say, that, what does it mean and so on and so forth.
Okay. All right, so how do you form, when you reformulate it this way and have a matrix constraint here, itís quite straightforward to introduce a Lagrange multiplier. Normally, it would be new transposed times you know, X minus B. Here itís the matrix equality. The transpose is an inner product. In inner product for matrices is traced in general [inaudible] Y. These are some metrics of trace XY. So you write this as Trace Z times the matrix residual here.
Okay, so thatís the origin of that. Otherwise everything else kind of works out the same and Iíll skip the details. Just to say the bigger picture is, basically for any problem you look at, and have time to actually think about carefully, itís actually worth your while to work out the dual because the dual often has a very interesting interpretation, give you some better idea about how it works and all that sort of stuff.
Okay. Weíll move on to our last sort of generic family of applications and these are in geometry, so geometric problems. These came up actually all the time, and weíll start with some of the ones that are least obvious, the extreme [inaudible] volume of ellipsoid. So weíll look at that. These are some of the most interesting and least obvious applications. In fact, itís the least obvious what you can do in these cases.
So the Lí owner John ellipsoid of a set C, which need not be convex, I mean, at all, is the minimum volume ellipsoid here that covers the set. Okay, so thatís the definition of the Líowner John ellipsoid. Actually itís unique. Weíll actually by formulating it as a convex problem, weíll prove that right away.
So here it is. Weíll parameterize the ellipsoid as the Ė this is the inverse image of the unit ball under an affine mapping, okay? And thatís a parameterization. I should point out thereís at least like four or five totally different parameterizations of an ellipsoid, right and you can write it as a quadratic form. The forward image of an affine mapping of the unit ball, you can write it as the sub level set of a quadratic function. You can write it all sorts of different ways.
Which of these methods you use to describe an ellipsoid, to parameterize the ellipsoid will completely change the convexity properties, and so when you have different problems, youíre gonna have to choose one of these. So itís not like this we could not, this will not work out, if we parameterize for example, it this way, X minus XC, a set of X, such as that.
X minus XC transposed, say P inverse, X minus XC is less than one. Thatís another parameterization of an ellipsoid. And the data structure to describe this ellipsoid is a center point, XC, and a positive definite matrix P. Now, itís easy to go back and forth between this and this. Itís just linear algebra, okay. Itís easy; however, when you formulate the problem here, youíll end up with a non-convex problem for the Líowner John problem.
Here itís gonna end up being convex, so you have to choose the right parameterization. It makes a great deal of difference which one you choose. So this is the inverse image of the unit ball under an affine mapping. Itís an ellipsoid. And you know, itís not hard to go from this to this. Itís not a big deal. You know, A transpose A is P or something.
Now, in this problem here, it turns out you can assume without loss of generality that A is positive definite. And thatís not at all hard to show. I mean one is this; itís got to be non-singular. If A were singular here, then it would, of course, have a null space and in fact, this would be called a generalized ellipsoid or a cylinder because anything in the null space it would have an infinite dimension here.
Thatís what this, it would have a directional along which was infinite. It wouldnít be bounded in that case. Nothing wrong with that, by the way. Sometimes that comes up. That corresponds to A being positive semi-definite and not positive definite.
Now the question is why can we assume A to be positive definite? Does someone have any suggestions about why? If someone gave you an A here that was non, not symmetric, not positive definite, how can I compute, how can I determine the same ellipsoid with an A that is positive symmetric and positive and definite? How do you do it?
Student:You just flip the side of the eig indice.
Instructor (Stephen Boyd):You could flip the side of the eig indice. A might not even be symmetric, but youíre on to the right track. What would you do, so letís right the SVD of A. Letís write A here as U, sigma, V transposed times V less B, this is less than 1, all right.
So this is the set of V for which this is true; right? And I need a suggestion. This is A, and I want to write this in a new form where sort of the new A and the new B are symmetric, A is symmetric positive definite.
So theyíre all writing the skeleton. So I need a suggestion.
Instructor (Stephen Boyd):Whatís that?
Instructor (Stephen Boyd):Youíre gonna choose what?
Student:U equals V.
Instructor (Stephen Boyd):Okay, choose U equals V, yeah, I mean thatís kind of what we want to do, but you canít say that. I mean somebody just gave us this description with A and B. Their A is not symmetric. So we donít have the right to say choose U equals V. They said, ďNo, no, no, this is the A I'm giving you.Ē
So one thing you can do is this. This is a vector and if I multiply that on the left by any orthogonal matrix, it doesnít change the norm. Thatís the same. So I could multiply, letís just do this in our heads; letís multiply this vector on the left by first U transpose.
Instructor (Stephen Boyd):And then by V, and because I was multiplying by matrices, the first was whatever, so weíre gonna write it this way. V, U transposed times U sigma V transposed V plus V. Everybody cool with that. I'm mean thatís identical. This is equal to that because thatís an orthogonal matrix. Everybody cool with that? Okay, and what do you have now?
Now weíre done.
Student:You have V.
Instructor (Stephen Boyd):Yep, now you have this and everythingís cool. This V, sigma V transposed plus then something here which is V, U transposed B Ė
Instructor (Stephen Boyd):Yeah, I did, yeah, somewhere; where here?
Instructor (Stephen Boyd):There we go. Thank you. Okay. And I'm gonna call this A tilde and I'm gonna call that B tilde, and I'm done because I just took the original ellipsoid. I mean this would be the code youíd write at the top of something where you decided in a group of people doing whatever development that the data structure of an ellipsoid is the inverse image of an affine mapping, okay?
However, you donít put in the specifications that the matrix should be positive definite. Okay. Somebody passes in this, but for your method, you need this. This is the stuff you write at the top that translates it to an equivalent one with positive definite A. Okay. Everybody okay on this?
Student:You could have [inaudible] right?
Instructor (Stephen Boyd):No, itís a singular value decomposition. Itís not eig in value decomposition, singular value decomposition and itís non-singular with the original matrix A, so theyíre all positive, so weíre cool.
Student:[Inaudible] A is not singular, then A tilde is. If A is singular Ė
Instructor (Stephen Boyd):Oh, pardon me, right if A is square and non-singular and therefore U and V are square orthogonal Ė theyíre orthogonal matrices.
Student:The whole point of this was that A was not squared.
Instructor (Stephen Boyd):No, no, no, no. The whole point Ė A is squared. Good question. Actually you can do this, by the way, if A is not squared, too, but youíd have to Ė you can do this in that case, too.
What A has to be in that case is, well, letís see, it has to be fat and full rank or skinny, mini fat and full rank, I think is what it Ė it has to be; sorry, skinny and full rank is the most general. If you want to allow people to describe an ellipsoid in this way in the most general way, A, I believe can be skinny and full rank. Thatíll do the trick.
Student:Youíre not losing information [inaudible].
Instructor (Stephen Boyd):Yeah, exactly, so in that case weíre cool because and in everything I did here was kosher, so letís just, you know, in that case you have to check, go back and check that U has the right Ė well, I guess U transposed to U was always cool. So, okay. Question?
Student:Why did we embark on this journey in the first place?
Instructor (Stephen Boyd):Well, I havenít said yet.
Student:Not the journey of the ellipsoid, the journey of positive symmatrizing the matrix?
Instructor (Stephen Boyd):Now, would we embark on a journey Ė well, we shouldnít answer that. I was gonna say, ďWould we embark on a journey that was not needed?Ē The answer is yes, we do that every day, but theyíre kind of weird things. I donít usually have it in writing, those weird little journeys and side trips. So obviously, this is gonna come up, right? Indeed it will in two lines, so, but thank you for putting the posting the sign there which is, you know, why are we doing this? Okay, so letís move on.
Student:Why does it have to be Ė so youíre assuming A is not singular?
Instructor (Stephen Boyd):Thatís right Ė oh, why? If A is singular, this doesnít describe an ellipsoid; it describes something called the generalized ellipsoid. Itís called a cylinder, in fact, because it will be infinite in some directions, in fact, specifically in the null space of A. Along the null space, any point in the null space of A can get infinitely big and satisfy this, and everythingís cool; right.
So in that case, so thatís not an ellipsoid. By the way, such a thing has infinite volume. And if weíre minimizing volume, itís not gonna be something. Itís sort of infinitely bad; its not gonna be of interest. So thatís why.
Okay. Now, I'm gonna use this parameterization and yes, I wanted a symmetric positive definite for a reason and now you know why. Here is it. By the way, you see this problem here; this problem is actually correctly the Líowner John ellipsoid problem even when A is not symmetric; okay. It is because I assure you, log det A is gonna be the volume, itís gonna be the volume while multiplied times the volume of the unit ball on that space. If thatís gonna be the log of the volume, actually itís already added to that of this ellipsoid period, or one over it is, all right.
So, however, the log of the determinant of some non-symmetric matrix is some insane function that is not in general convex; in fact, itís never convex Ė just not. So however, if I assume A has symmetric positive definite, then this problem here is convex and that means we can solve it.
So thatís it. So here I minimize log det A inverse, and you have to check about the volume and things like that, but I mean that does transform Ė the volume transforms by the factor det A inverse here.
Now, to say that Ė I mean I can rewrite this way. I donít have to write it this way. I can write this, right, for all B and C. This just basically says that for every point in C, youíre in the ellipsoid. Youíre in the ellipsoid if and only if AB plus B, when you take the norm, is less than 1.
To say that the ellipsoid covers C is to say that for any point in C, norm AB plus B is less than or equal to 1. So thatís this statement. Now, by the way, we know immediate that itís a convex problem.
Some people, by the way would glorify it and make it very complicated and call this a semi-infinite problem. Why? Because that is a convex constraint Ė in fact itís a two-norm constraint on the variables. By the way, the variable here are A and B. So actually reading this kind of stuff requires serious concentration because itís a very standard convention that symbols like A and B and C and alpha and beta are parameters.
And variables are things at the end of the alphabet like U, V, W, X, Y. So this is totally turned around and the variables are A and B and the parameters Ė well B is a dummy parameter, but nevertheless.
Okay, so you have to read this correctly. That is a two-norm constraint. Itís nothing more, but itís 1 for every point in C. Oh, if C is infinite, this is a so-called semi-infinite problem. Okay. Now, you may not be able to have a tractable representation of this semi-infinite constraint. In some cases you can. I mean one is a finite set, so here I give you a bunch of points and you have to calculate the minimum volume ellipsoid that covers them.
Now, in this case, itís easy to say this because you just write that. This is a convex problem. So now you know something. And of course, itís not just the L'owner John ellipsoid for the set consisting of these points. But in fact, the convex [inaudible] those points and thatís a polyhedron described by vertices.
So you just learned something which is by no means obvious, and that is that if someone gives you a polyhedron defined by its vertices, then computing the minimum volume ellipsoid that covers those points is a completely tractable problem. Okay. Everybody got that?
And in case you think, you know, sort of this is obvious and everything works and all that, watch out, because if I give you a bunch of points like this and I give you a polyhedron described by the vertices and I ask you for example, to solve a variation on this which is please compute the maximum volume ellipsoid that fits inside it, thatís actually MP hard, okay?
So Iíll show you, of course, Iíll focus on the ones that actually we can do, but watch out. One step off this path and things get very complicated and these are not obvious facts. Theyíre just not obvious period. Okay. Let me mention a couple of things about this one. This has huge applications, sort of choosing maximum volume ellipsoid around some points.
And let me sort of just give you one or two, theyíre beautiful applications. Hereís a really very nice one. Suppose you just simply have V1 up to V, doesnít really matter, you know, 10,000. It doesnít matter. These are measurements Ė theyíre vector measurements. Theyíre in our tenth.
Thatís it; theyíre vector measurements. And what you want to do now is check are there any measurements that like stand way out. In other words, are there outliers. Thatís the question here. Are there gross Ė are there some measurements here that are big and somehow donít fit with the set; okay.
So how would you solve a problem like that? Itís vague, comes up all the time. I want to identify outliers in a bunch of samples. What would I do? Make some suggestions, actually. Letís start with unsophisticated ones. So what would you Ė what would be the first thing you might do?
Instructor (Stephen Boyd):Sure, yeah, absolutely, so the simplest thing is you might Ė letís subtract the mean of these from here. So no theyíre sort of centered roughly around zero. That would be reasonable. Then actually at that point, you might just look at the norms of all these things, and if all the norms were on the order of you know, half, one, two and a handful of them have the norm of 50, then thereís your outlier problem solved, right?
So basically you have a cloud of points like this and a couple of just way out there, okay. Everybody agree? So that would be the first thing you might do. Then you might say well, no, no, thatís not right. What Iíll do is Iíll take these points; Iíll subtract the mean. That will center them and now Iíll calculate the empirical covariance matrix of them. That gives you the ellipsoid that kind of covers the cloud the best. Everybody cool on that?
Then I might even change coordinates so that ellipsoid was uniform, was a ball. Thatís a very hard change of coordinates Ė the one I mentioned before. I donít recommend it without a lot of practice.
So you change coordinates and the ellipsoid becomes a ball, came out right. And now it means roughly speaking the data kinds of varies equally and now if you start seeing points that are way out these could be your outliers. Everybody okay on that?
Now, the problem with that is that that one Ė well, that would actually, that would, sorry, that would actually work Ė that would probably work okay. Turns out a much more sophisticated method than that is actually to compute the minimum volume ellipsoid that covers these, okay.
The points that are on the surface of that ellipsoid actually are ones which you would declare as candidates for outliers, okay? And in fact, the way this often works is you take those points and you remove them and then you redo whatever you were doing, some lead squares problem, some singular [inaudible]. It really doesnít matter; you do the processing.
What youíre looking for is something like this. You want to remove a small number of points, do whatever signal processing or statistics you were doing before and have the answer all of the sudden get way better; right?
So you want to remove eight points from the set of 10,000, do some signal processing and all the sudden have like an excellent fit. This is a hint that at least that those eight included the ones that were messing you up before. Everybody see what I'm saying? That would be outlier [inaudible].
Okay, so this is called Ė the method. This is called in statistics Ė itís called ellipsoidal peeling. Itís the greatest phrase, right, so you have a giant pile of data. You stick an ellipsoid around it and you remove those points. Once you remove those points, by the way, if youíre quality of estimation of whatever signal processing or statistics youíre doing didnít get like really good, you do it again.
Youíve removed those points and the ellipsoid shrinks down to fit the next one and this is called ellipsoidal peeling and you do this in waves or whatever until you like what you get or until you have no data points anymore, in which case you went too far. Yeah?
Student:I was gonna ask a question. Letís say you do it, you peel like two or three layers and you donít see any substantial change. Does that tell you that maybe, you know, you didnít have any variable outliers before?
Instructor (Stephen Boyd):Yes, it might tell you that, yeah. Generally, when that happens, when you do a couple of cycles of ellipsoidal peeling and it doesnít get better, generally you donít admit to people that you ever did that.
So yeah, you just go back and say, ďItís not working.Ē Thatís what you do, although youíre prepared if someone says, ďHow do you know that thereís not Ė maybe thereís just like five or six outliers, just like bad data points that are completely messing up your, you know PCA or whatever youíre doing, or your imagery construction or whatever it is?Ē
Thatís how, and then youíd say, ďWell, you know, I did do some, I did some ellipsoid peeling. And nothing got better.Ē By the way, weíve already talked about it last time, but whatís another method for dealing with outliers Ė I mean just grossly? What?
Student:[Inaudible] different norm?
Instructor (Stephen Boyd):Yeah, youíd go to an L1-type norm or a Huber norm, or something like that Ė Huber is not a norm, but youíd go to a robust norm. By the way, you could do the same thing there, the peeling thing.
Instructor (Stephen Boyd):Take all of them out. You take all the ones that are on the boundary. If you donít want to take as many out Ė if this is 10,00 and these are an R10, I forget how many are gonna be on the boundary, but itís generically something like 50, so 50 out of 10,000.
Student:[Inaudible] do it again.
Instructor (Stephen Boyd):Yeah. I do minimum volume ellipsoid. I think generically about 50-55 or something like that would be on the boundary. By the way, if you take out the 55 and all of a sudden, like your image appears and thatís like somebodyís head. Youíre doing medical imaging, then you know the following, you know that A, there were some outliers, No. 1 and No. 2, they were within the 55 year removed.
Now, if you care, you can go back and start putting points in one at a time until the head goes away, okay, and then you say, all right, that, you know, so if you care, but on the other hand if you have 10,000 measurements and you throw away 55, you know, it would seem unusual if suppose instead you threw away 22 outliers and whatever, 33 non-valid data points.
Loss of 33 points, I mean I'm just talking Ė this is very hand-waving, but loss of 33 points out of 10,000 or I should say 9,950 is not gonna really make a big difference, so thatís what you do, so you can add them back in or whatever you like.
These are actually fun methods. They can be shockingly effective, and I should add actually on the whole issue of outlier detection Ė I should make one comment about it. When an outlier is like totally off scale, anybody can do it. So if like youíre getting a bunch of measurements, you know, and a bunch of them are like you know, 1, 2, 3, minus 5, minus 6 and all the sudden, one is like 1E plus 37, okay. You donít need advanced methods to detect that as an outlier.
Thatís because a bit flipped in a floating-point representation or something, you know, in the exponent. You donít need this for that, okay. On the other hand, if the outliers, you know, basically donít Ė if an outlier is just like itís just off a little tiny bit. Itís just like some like 1 over [inaudible], and you know, itís not Gaussian, but itís not huge, not big enough to mess up your algorithm or you have enough data that it gets averaged out nicely, then you donít have to remove it.
But thereís a band of situations right in the middle where being smart actually is gonna pay off. Itís that band in the middle, itís when the outliers Ė the outliers are small enough well they donít really hurt anybody. If theyíre big enough, anybody can spot outliers, but thereís a pretty big band right in the middle where being smart is gonna make the difference between successful signal processing statistics or whatever this is, whatever application it is and not. So thatís just my comment on this.
I guess this is silly Ė thatís the case for all problems, this band. Yes.
Student:[Inaudible] continue all the measurements. Itís just to [inaudible]?
Instructor (Stephen Boyd):Yeah, so youíre removing outliers and so the argument, so first Iíll tell you the truth. The truth is say youíre allegoristic, so it cannot be defended; however, you can say in its defense, the following: if I simply moved, it removed the values of V with large norm, that would be something that is not scale invariant.
If you were to rescale the data or transform the data, I would remove a different set because I can make a difference out of points look big. Okay, the interesting thing is if you do minimum volume ellipsoid, thatís actually affine invariant.
Youíll select the same point because think about it, if you have a bunch of points, and you transform them, you know, like by any linear mapping, you wriggle them all around and you re-calculate the minimum volume ellipsoid, you get a commutative diagram. So the minimum volume ellipsoid that covers a set of points is affinely invariant. It commutes with an affine change of coordinates.
Therefore, if someone says why are you going to the trouble, why not just take the norm and remove the ones with the largest norm, right, you would say, I'm doing something a little more specific, I'm minimizing the one that is large, but my measure of large doesnít depend on the coordinates used or the scaling used; itís more sophisticated.
Funny, if you draw that story out long enough, people wonít notice that it actually doesnít answer the question, which is why. Itís merely a justification. Okay.
All right, so thatís an application of this. Thereís sort of a dual problem, thatís maximum problem inscribed the ellipsoid. So here I have an ellipsoid Ė weíll get to this. Here weíre gonna describe the ellipsoid, by the way, as the forward image, not the backward image of Ė itís the forward image of a unit ball under an affine mapping.
And again, I can assume, I wonít go through the details that B is positive definite here. Now the volume is proportional to det B and the problem is to maximize log det B subject to this.
What this says is another way to say this. It basically says that the [inaudible] over the norm ball of BU plus D is in sync, so thatís another way to say it. If you want the semi-infinite representation, you would write BU plus D is in C for all norm U less than 1.
Now we know, by the way, that is a convex constraint on B and D. Why Ė because for each U in the unit ball, this is a convex constraint. And remember what the variables are here. Itís B and D. This is not obvious. U is fixed. Thatís an affine expression. Itís affine in B and D. Actually, itís linear in B and D, in fact.
And so you have the constraint that a linear function should be in a set. Sorry Ė this is the case of C is convex; sorry. If C is convex, this is a convex constraint. Okay.
Now, you still have to be able to evaluate this to make this interesting, and thereís actually a very simple case when you can and that is when this simplest case is when this thing Ė when the set C is a polyhedron because I can tell you whether an ellipsoid described this way lies inside a half space.
Thatís easy. And the way you see that, thatís an easy calculation. We want to know is it true that BU plus D Ė letís write a half space, so the half space is the set of the X such that, letís say, F transposed X is less than G, okay. Thatís our half space, and I want to know is it true that BU plus D is in H or all U norm less than 1. Thatís the question, okay.
And the answer is easy. Yeah. We can calculate that. The question is whether F transposed BU plus D less than or equal to with a question mark like that for all U less than 1. Everyone agree?
Now, by the way, weíre focusing on this as a function of B and D. Weíre getting there. Now, the second term is this, F transposed D. It has nothing to do with U like that, and here I'm gonna write this in a different way. I'm gonna write this as B transpose F, transpose U. And t his should be true for all norm U less than 1.
Now, if someone walks up to you and says, and you form an inner product with a vector and says, ďYou know, I need the inner product of a vector with U, U ranges over the unit ball. We know exactly what these numbers range over. They range between plus/minus the norm of B transpose F.
Okay, in fact to be quite precise, thatís [inaudible], but in the general case, general norm, itís plus/minus the dual norm, the dual of that norm. So the maximum value this can have is this period, and that is with the choice U equal B transpose F divided by norm B transpose F. Thatís the worst thing that can happen. So these are all if and only ifs. I havenít drawn them, but itís that. Okay.
And then you have that, and thatís it. Weíre done. Okay, because now this is tractable. That is a convex function of B. This is a convex function of something happened there Ė D, thank you. Thatí a linear function of D. The whole thing Ė this is clearly convex and B comma D. Okay, so thatís the condition and thatís what we have here.
Maybe I have it wrong here, or did I assume B is symmetric Ė yeah, sorry. I assumed B is symmetric, so the transpose isnít needed here. Okay, so I get that. By the way, just let me point out, youíre used to it. I donít know Ė weíre six weeks into the class. Thereís something like that. Itís not a big deal to look at that and say thatís a convex problem in B and D.
These are things that if you donít know this material, that looks like a hopelessly complicated problem, I mean hopelessly complicated just because, you know, itís got the log of the determinant and all that. You wouldnít know anything. This is non-differentiable; you know that. The norm Ė thatís not norm squared; thatís norm, right, but you look at that and say thatís like a second order of constraint. Itís nothing, the second order of constraint.
So all right, this allows you to calculate, to compute efficiently the following. This says that if you have a polyhedron described this time not by the vertices, but by the inequalities, so basically you have these things, AI, like that. You can now calculate efficiently the maximum volume of ellipsoid that sits inside it; okay.
And that you can do efficiently. Now, interestingly, if I take a polyhedron described by inequalities and I ask you to calculate the minimum volume ellipsoid that goes, you know, the minimum volume that covers it, thatís MP hard, again. So you have to watch out with these things. Theyíre not Ė I'm just casually mentioning them here. This is not simple. Things that are Ė you know, easy problems and just minor variations on them are very hard.
Student:Is the MP hardness because youíre using the representation of the polygon, which doesnít actively constraint?
Instructor (Stephen Boyd):Something like that.
Instructor (Stephen Boyd):Yeah, MP hardness is actually very easy to see. It turns out, you wanna hear something hard, is just the following. Forget finding the minimum volume of ellipsoid that covers a polyhedron described by inequalities. They this much easier problem. Suppose someone just asked you the following.
Here is a specific ellipsoid. Can you please tell me if it covers my polyhedron? Thatís the feasibility Ė not even the feasibility problem. Thatís not finding the polyhedron thatís feasible. Itís basically check if a specific polyhedron is feasible.
Thatís MP hard. Turns out itís maximizing a convex quadratic function over an ellipsoid. And I can easily embed that to all sorts of MP hard problems.
Yeah, but I donít know that I really answered your question. I donítí know that I really have a good answer for why is it easy to go from Ė if the data structure for the ellipsoid is vertices, why is it easy to calculate the minimum volume in covering ellipsoid, but hard to calculate the maximum volume 1 inside. And why does it reverse here? I donít know; I donít really have a good story for that, so I donít know. Question?
Student:So given this description, for example, we have this [inaudible] can you actually envision and find vertices?
Instructor (Stephen Boyd):Excellent question. Now, of course, if you could efficiently go, if you could transform between these two data structure to describe a polyhedron, then of course, right here, right now, we would have shown P equals NP. Now that Ė I'm not ruling it out. Could happen someday, maybe around Week 8 or something like Ė no. But it Ė no, I'm glad you pointed that out because in fact, thereís a catch there.
So the question is given a polyhedron described by inequalities, can you describe it by its vertices, and the other way around? And in fact, these are just two different data structures, to describe the polyhedron, right. Thatís no problem. Now itís interesting because, for example, ellipsoids, we were just talking about multiple data structures describing ellipsoid. Those are all equivalent. Theyíre all about the same size. They all involved like some sort of a matrix and they involve a vector.
You know, roughly, you know, in that Ė you know, transforming from any one to any other is just linear algebra. Thatís all very straightforward and everything is polynomial. Itís all like N-cubed or whatever, right.
So those are sort of equivalent, theyíre different, but equivalent parameters; theyíre polynominally equivalent parameterization. Now, in general, thereís some pretty shocking things, Iíll just mention some of them. If you have a polyhedron described by inequalities, it is possible to list the vertices. Itís possible. Thereís algorithms that just work, just lists them.
Hereís the problem. The number of vertices is exponential in the problem size. Okay, so even, you know, if you have a Ė I mean thing s you can just write down in a minute, you could just write down, for example, take 100 dimensions, 200 linear inequalities, okay.
So thatís nothing, right. You can solve an LP of that size, well, in three weeks, youíll write a code for it, and that can be solved literally in milliseconds. Okay. So you know, everyone has the right visualization skills. When I say minimize [inaudible] where A is 200 by 100, weíre all visualizing the same thing. Weíre visualizing a polyhedron and a direction and you kind of find the vertex or whatever thatís most in that direction. Everybody got that?
Now, hereís the problem, the number of vertices of a polyhedron in R100 defined by 200 linear inequalities is absolutely immense. I mean itís some number that, you know, itís on the order of like 100 factorial or something like that. I donít know what it is, and by the way, if I'm not quite right, and thatís not equal approximately to the number of sub atomic particles in the universe, I have to change those numbers very slightly and it will be. Trust me, okay.
So thatís the problem. Itís the going from inequalities to vertices can lead to a huge increase in the size of the problem. Thatís the first problem. Now, by the way, for certain cases itís not. I mean if everything is an R2, anybody can figure out a way to transform between these two representations, vertex versus face or in that case, edge.
Anybody can do it in R3, as well, and if you go to Google and type things like transform vertex face blah, blah, blah, youíll find tons of papers, but itíll be for specific cases like R2 and R3 is what youíre gonna find.
Youíll even find for example, in some RKs that theyíll be polynomial algorithm. But K has to be fixed, the dimension, so youíll even see that. By the way, the other way has the same problem going from one to the other. So anyway, let me summarize all of this because there was a weird, random walk, so let me summarize it.
The change in data structure to describe the different parameterizations of ellipsoids, thatís casual, itís linear algebra. It is just linear algebra. It has taken, you know, some square roots, a couple of singular value, inverses, nothing, okay.
Changing the data structure described a polyhedron from vertex to inequality description, thatís not casual. So thatís a very important thing to note, I mean unless youíre in R2. In R2, there doesnít seem to be any real problem anyway because your eyeball can solve all of them and it hardly matters; okay.
So I didnít even consider those to be real problems.
Student:[Inaudible] the problem we just solved earlier before we got this example, not correlate with what youíre just saying about covering [inaudible] finding the minimum [inaudible].
Instructor (Stephen Boyd):In the problem before, youíre given the vertices and youíre asked to find the minimum one. Thatís easy. In this problem, youíre given the faces and youíre asked to calculate like the minimum, maximum volume of ellipsoid that sits inside it.
Those two are the easy problems. You flip those two, or re-match them and theyíre actually essentially infinitely hard, not only infinitely hard, but theyíre basically you donít even get to this point. Just if someone just alleges an ellipsoid covers a polyhedron, you canít even verify it.
You wouldnít know, let alone optimize over such an ellipsoids. So thereís a Ė I mean these are subtle and you have to look at them carefully, think about them. Okay, now thereís an amazing fact here and itís actually beautiful Ė itís the sort of fact about geometry.
Itsí absolutely beautiful and itís this. If take a convex bounded on [inaudible] interior set and compute the L'owner John ellipsoid so that might be say this polyhedron here and you compute the L'owner John ellipsoid, so you find the minimum volume ellipsoid that covers it, okay. And thatís this one.
The following is a fact, comes right out of duality. I wonít derive it, there are derivations in the book, comes right out of duality. If you take that ellipsoid and you shrink it by a factor of N, then youíre guaranteed to be inside the set.
Okay, by the way the same is true here. If you take the maximum volume of ellipsoid that fits in a set and you grow it by N, it covers the set. Okay, now, you might say N is a big factor. Well, yeah, but at least itís a factor.
And it basically says the following, by the way, if you split the difference, I can say it this way, I can approximate any convex set, you know, bounded [inaudible] interior and that kind of stuff, any convex set in RN within square root N.
Thatís what it means because I would split the difference here, right, Iíd shrink this ellipsoid by square root N, and then Iíd get one where if you scale it by square root N, you cover, you shrink it by square root N, youíre inside.
By the way, if this set is also symmetric, you can sharpen it to square root N, in which case I can even say I can get it into the one quarter approximation. The importance of this is extreme, both theoretically and practically and let me say why. This basically says that ellipsoids are universal geometric approximators of convex sets.
Thatís what it says because I can bound sort of Ė you know, I can get a bound and thereís a fixed factor whereby I can shrink and expand and fit inside and outside the set. Everybody got that?
Now, thatís gonna come up; itís gonna have lots of implications, but what about this. What is someone says, how about a ball. A ball doesnít have that property obviously because for example, hereís a convex set. I can make it as skinny as I like, but sort of the minimum size ball that fits this is quite big and the minimum sized ball is like that and thereís no bound on these.
So I could hardly say that I cannot approximate a convex set in RN by a ball. How about a bounding box? How about a rectangle? Can I do it by a rectangle? The answer there is N, sorry, thatís got N parameters. The ball has only, well itís got N. A bounding box has like two or three N or something.
Thatís not enough either because in this case, for example, the bounding box can look like that. Thatís the bounding box, the smallest box that covers the set. The biggest box that fits inside is gonna be tiny period. And thereís gonna be no bound on the ratio.
So ellipsoid is sort of the first time when you get enough sort of degrees of freedom to carry out universal approximation of a convex set and let me just ask you another one just for fun. How about simplexes? Thatís the convex held of N plus one point. By the way, the answer is not obvious here, so youíre just gonna have to guess, but Iíd say you can do it, so how about simplexes, what do you think?
Can you approximate any convex set by a simplex; what do you think? You can, yeah, you can. Thereís enough Ė you have enough free parameters in a simplex to do that and itís easy because you approximate it by an ellipsoid, transform the ellipsoid to the unit ball or something like that and then stick a simplex around that and you can even bound the ratio or all that. It wonít be N; itíll be N-squared or something. It doesnít matter; itíll be something. Itíll be bounded. Question?
Student:[Inaudible] rectangle instead of Ė
Instructor (Stephen Boyd):Oh, you mean a rectangle that can rotate? Yeah, then you can do it. Right. Then you can do it. Of course, we donít know how to calculate such a thing, but if you could do it, that gives you enough degrees of freedom to cover it. Okay. So ellipsoids are not just sort of convenient things that describe shapes of things. Itís actually very worthwhile to understand and remember that thereís a sense in which they capture sort of the, letís say, yeah, something like the first order approximation of any convex set, as an ellipsoid.
By the way, let me mention one more property of this. Suppose you took, letís say, ee263. Letís just suppose you did. Okay, and someone says, ďWhat are you doing?Ē Or you take a control class or for that matter you take a statistics class. And someone says, ďWhat did you do all quarter?Ē Everything you did involved quadratic norms, right? Youíd have quadratic functions.
In control youíd have X trans plus QX. In statistics, it would be hidden by something else or whatever, and someone would then ask a question like, theyíd say, ďWhat did you learn in two seasons?Ē ďWell, I can minimize a norm.Ē And theyíd say, ďA two norm, really and then theyíd say, ďDoes the sum of the squares actually come up in your prop? Is that really whatís important in your helicopter design?Ē
And you could look at them and see how gullible they are and if they look gullible you could try to go, ďOh, yeah, absolutely.Ē Yep, sum of the squares of the vertical acceleration, pitch rate, absolutely.Ē That canít be more than 1.3. If they go for it, the conversation is over. If they donít go for it, then youíd say, ďAll right, Iíll tell you the truth. No. We donít care about sums of squares of things. Itís just because we can do it and because thatís the only class I took so far.Ē
So thatís generally the truth about [inaudible] squares. Okay, so all right. How does that have to do with this? Well, letís suppose that there is something you really care about like maneuverability or ride comfort of something. It doesnít really matter. And suppose what you do now is you go around and you run huge tests, wind tunnel tests. You have pilots write things out and say whether they like the ride or not.
And anyway you end up with some weird set that looks like, you know, some kind of weird convex set that looks like that. Thatís like nothingís happening and this is sort of a set of ones judged by, you know, wind tunnel simulations, actually getting pilots to ride it and all that and theyíll say this is okay. If youíre Ė this is of course, in a multi-dimensional space, but you know, and if youíre in that set, you know, one of these things is like pitch rate, oneís this, oneís the RMS. It doesnít matter.
If youíre in that set, everythingís cool; thatís it, okay. Thatís the real set, and this is not an ellipsoid. Let me tell you it is not an ellipsoid period. Thatís the real Ė this is the set, okay.
Now, this actually Ė now what you know now allows you to do something that would let you sort of go back and in a posterior way justify 263 material. For that matter, you could go back and justify ordinary regression and hereís how you do it, like somebody suggested.
Thatís actually the set of acceptable ride qualities as judged by horrible long simulations, questionnaires with pilots and blah, blah, blah and whatever, you know, so all right.
This is where people like threw up and stuff over here. This is where they crashed actually. This is where they threw up, but anyway, so what do you do next. Precisely, you just make an ellipsoidal into an approximation. Probably in this case, you might want to make this one. Again, itís you know, you might want to make the inscribed ellipsoid.
Then you take this and you go back to the intern who has only taken 263 and you say, ďThis is the norm youíre going to use. And they, yeah itís a stupid 6 X 6 matrix or 8 X 8 matrix of a bunch of entries, and they go, ďWhere did all those come from?Ē And you say, ďLast yearís intern,Ē right.
It was a lot of work to get it, but you see my point here, that you can actually Ė by the way, the ideas I'm talking about as far as I know, like no one actually uses them in practice, but everyone should use these ideas in practice, right. I mean this just comes up everywhere. I mean if youíre in control, a lot of the methods actually fielded involve quadratic forms like Q and A: and S, then you ask the truth is where do they come from. People make them up.
They donít have to be made up. You can actually do something intelligent and actually get things to do shape the way you really Ė that do actually capture, not exactly, obviously not exactly. But crudely, they capture the shape of the set.
By the way, over here, this sufficiency was a factor of N. Thatís something that only matters to a theorist. Okay, and by the way, can anyone guess whatís the worst set to ellipsoidally approximate? Just in R2.
Student:Something like Ė
Instructor (Stephen Boyd):Letís first talk about ellipsoidal approximations in R1. Thatís a short conversation. How does it go? How well can you approximate a convex set in R1 by an ellipsoid in R1?
Instructor (Stephen Boyd):Oh, very well because theyíre both intervals. Okay, so that was the N equals 1. Letís do N equals 2. Well, you know it can never be worst than 2. Can it be as bad as 2, a factor of 2 Ė thatís if theory says 2. And the answer is sure. A simplex is like the absolute worst thing you could be asked to approximate because in that case, youíre outer ellipsoid looks like that. Your inner one looks like that. Theyíre 2 to 1. Thatís the end, and thatís general.
So this N here cannot be made better. Now, a simplex is kind of a weird sick thing. Itís got all sorts of corners, you know itís got corners on it. In fact, itís a sort of, itís as pointy as a convex set can be roughly, okay.
Now, most what [inaudible] a lot of convexes that come up in practice. The approximation number is definitely not N. Itís often much, much less, and for example, if we actually went out and rode and worked out ride qualities or things like that for a vehicle or anything like that, you would find most sets that arise through natural causes Ė what I'm about to say is, of course, just total hand-waving, but I believe it to be the case.
Well, actually since I'm not making a statement, it cannot be disproved; however as a rough idea, I will say this, most of the convex sets that come up through natural causes can actually be approximated by ellipsoids stunningly well. Nothing close to square Ė to N. Okay, so I just mentioned that. Thatís for those of you who are worried about that factor of N; you neednít be. Okay, so thatís our discussion of that.
Centering Ė thatís an interesting thought. So in centering, it works like this. You have a set and you want to find a point thatís deep inside the set. By the way, weíve already seen one application of this, which is design centering, which is yield maximization.
So in yield maximization, this set describes the set of acceptable values. In other words, the point that youíre looking for is what you tell people to shoot for and manufacture. Thatís what you want to do.
I mean you can also think of less socially positive applications. This could be the range of points where if youíre within there, you take out a target. And then you want to ask somebody you know, ďWhere do you put your sight?Ē Right, and thatís another question. The answer is you donít put it there, and you know, you put it right in the center, what center. You want to maximize the probability now that youíre in the set.
But weíll go back to manufacturing, okay. So here, now the simplest, actually the yield [inaudible], we already talked about that, thatís actually a convex problem provided the probability distribution is log concave. Doesnít mean you can solve it. You can, but not by methods from this quarter, but you can solve it. Abstractly itís a convex problem.
So a lot of people use a various eurisitc for that. They work unbelievably well. One is this Ė you find the center of the largest ball that fits inside the set depending on what the set is, I mean if itís a polyhedron, for example, we already looked at that like Day 1 for linear programming.
This is a linear program to calculate the maximum volume, but the largest ball that fits inside the polyhedron. Thatís an LP. By the way, I can say largest because you would only say largest ball if this were Ė youíd only use the word largest if there were linear total ordering.
For balls, there is because Ė at least if youíre talking about the size of a ball, itís the radius and so the totally ordered. You would never say whatís the largest ellipsoid inside a set because that makes no sense. You have to put in something like largest volume ellipsoid or something like that.
Okay, so thatís an LP, and we just worked out this.
Calculating the maximum volume ellipsoid in a set is also a tractable problem and this is call the maximum volume ellipsoid centers. So itís called XMVE. This is X [inaudible], and you can think of lots of others. One, this one has a very important property.
Itís affine invariance. That affine invariance means if I transform the whole problem by, for example, scaling the coordinates, if I stretch it some way, that will change radically the [inaudible] center.
But it wonít change the maximum volume ellipsoid thing because everything will transform by the fact the determinant of T where T is the linear part of the transformation. So this is affine invariant. Now as to which is better; which is not.
It just means that if someoneís calculating in [inaudible] center, you should ask them the following question. How well do you trust your choice in coordinates, you know, have you scaled everything properly? You know, this king of thing. Is it true that X3 being on the order of 1 is about the same as X2 being on the order of 1.
If they donít immediately answer that question with, ďOh, yes, Iíve been very careful to scale everything here, very carefully so theyíre all on the same order,Ē they donít immediately answer it that way, then you need to poke them and bug them and say you better check your scaling, because it matters here. Here it doesnít matter at all, those affine invariant.
So, okay. Another center is the so-called analytic center of a set of inequalities [inaudible]. Thatís a typo. So this weíre gonna look at in great detail in two weeks from now so when we actually talk about [inaudible] point methods and how do you solve all these problems. So weíll save that for then mostly.
But itís this. What you do is you set up a problem. You have some equality constraints and you have some inequalities, and these donít really matter. You want the margin here is actually minus FI of X. So for example, if minus FI of X is 0, youíd say itís tight. Minus FI of X is 1, would say you have a slack or margin of 1 and you kind of want to minimize the margin.
And thereís lots of ways you could maximize the minimum margin, but an interesting one is to maximize the product of the margins, which is the same as minimizing the negative sum of the logs of the margins, okay. So thatís another weapon.
It sounds odd, but we will get to what it means later. Weíll also see that although this looks rather complicated, it turns out itís shockingly low complexity, so weíre talking 15 lines. Indeed 15 lines that you will write soon. So, and it turns out here by the way in this case, thereís also an ellipsoid inequality about if you calculate this youíll get an inner and outer ellipsoid. Yeah.
Instructor (Stephen Boyd):Yeah.
Student:How would you extract that center?
Instructor (Stephen Boyd):Thatís part of the data. I mean I forget how we parametize it with B or D; does anyone remember. Yeah, so thatís easy. Yeah, you might have to do some calculations, but whatever it is itís very, very straightforward. You just transform it. I donít know where I was Ė no, forget it; itís easy enough. Itís a quick linear algebra calculation to get the center.
Okay. So hereís an example Ė this will be important later, so I donít mind going over it. Also, by the way, just the idea of an analytic center is something that in many applications should be propagated. Youíll see lots about this in the next two or three weeks and related problems, but itís something that should just be propagated because anytime you find a problem where someone says, ďHere my specifications. Here are my inequalities. Pick me a point in them.Ē
And you say, ďReally, any point? Like what if I picked a point just barely in them,Ē and they go, ďNo, if youíre going to pick a point in them, you might as well get one that satisfies a bunch of them, you know or something like that.Ē Analytic center is actually probably a good choice and it actually comes up in a lots and lots of already comes up in things like maximum [inaudible] estimation and weíll see all those connections; okay.
So percent of linear inequalities, we have a polyhedron, and these are the level curves of this Ė this is a so-called log barrier function for this thing, this set, and you can see the level curves here. When you get really high curves, they kind of hug the shape of the polyhedron.
At the minimum, itís a smooth convex function. So itís got a minimizer here. Thatís the analytic center and the analytic center here, this is a smooth convex function near that minimum, this thing looks like a quadratic period. Therefore, the levels sets near the analytic center are ellipsoids.
That ellipsoid is a pretty good approximator of the shape of this set and indeed thereís a bound. Now, the bound is interesting. The bound, so you can puff it up by, well a factor of M, that would be M-squared. You could puff up the ellipsoid by a factor of M, but M is not the space dimension, itís the number of inequalities, so itís a worse than the maximum volume ellipsoid or L'owner John ellipsoid. So thatís it.
Weíll get to these ideas later. Okay. Letís look at another topic is the idea of discrimination classification. Itís got lots of names, and letís see how that works and so you want to separate. Here I have a bunch of points in RN, and theyíre labeled. They have binary labels. Theyíre binary classification, so I can think of it this way.
These, I'm labeling them actually by the symbol, so X is one set; Y is another. This could be a bunch of vectors of something where there actually was something present and this could be a bunch of vectors where something was not present or something like that, but I know which is which here. I'm told which are which.
And what I want to do is this. I mean the oldest problem and by the way, linear discrimination, this goes back easily to MIT in the Ď40s and probably earlier than that, so this has a long, long history of linear discrimination.
So what you really want to do is find out is there a separating hyper plane for this data set, very basic question and if there is, it means is the following set of inequalities feasible, that the X is lying on one side and the Y is on the other. Now, the variables here is A and B.
Now, there is one difference here. These inequalities are strict; okay. In fact, we havenít dealt with strict inequalities yet, and weíre going to now and youíre going to see a trick. By the way, letís work out the non-strict classification Ė letís have that discussion right now. Under what circumstances can two sets of points in RN be non-strictly separated? That means when does there exist an A and B for which letís say this is true. When?
Instructor (Stephen Boyd):Always and by what choice of A and B?
Instructor (Stephen Boyd):Thatís right; zero. If I choose A equals 0 and A and B equals 0 and this always works, even if the points are all like messed up and next to each other. In fact, what if theyíre identical. So thatís why it doesnít make any sense to look at that problem. We have to look at the strict problem. You havenít Ė we havenít looked at strict inequalities yet.
So now Iíll just tell you the trick is very, very simple here. These inequalities Ė take a look at them. They are equivalent to a set of non-strict inequalities. They are equivalent to these. Right, put a 1 and a minus 1 there. Okay. This trick you will be carrying out this trick. You need to get it. Okay.
Hereís the argument. Watch this. If there existed any A and B, where a strict inequality held here, and strict inequality held here, this is homogeneous in A and B. So is that. So suppose the A and B you give me that satisfies this has this. These are, you know, 1E minus 5 and these are minus 1E minus 5. I can multiply those A and B by ten to the 5 and make the gap plus and minus 1. Okay, so I can make it this way.
Now, conversely, if A and B satisfies these inequalities, obviously these imply that. So thatís how you do this, so these are tricky and you have to Ė and if you do these wrong or casually, just you know, normally, thereís a lot of cases where it is strict inequality can just replace it with a non-strict inequality, and no oneís gonna get hurt and in fact, the original one didnít even make any sense.
Hereís an example. If someone says to you, I'm designing a circuit and the power canít be more than 50 milliwatts and you say, ďReally,Ē and then later you get a design document and it says as Item No. 1, P is less than 50 milliwatts, like that. It probably means whoever made this up hasnít thought very much or whatever, because youíd say, ďReally, could it be exactly 50,Ē anyway.
This is silly. But they probably meant that and thereís no engineering difference between the two. Makes absolutely no sense because youíll judge this by spice or something like that and your error will be, you know, it wonít be that good anyway.
Anyway if youíre manufacture, itís gonna be plus or minus several percent anyway. Just change the room temperature and itís gonna change. So the point is, thatís a case where the distinction between strict and non-strict is totally irrelative. This is just due to the ignorance of the person who wrote the specifications. They didnít know there was a difference, okay. So thatís this one.
This is not one of those cases because if you just casually make these non Ėstrict here, the answer is 0, okay. If youíre using some tool to get the answer, youíll not get an answer, because if you give a problem to a solver like that, itíll be very, very happy to give you the answer and it wonít be the one you want.
So there are cases where the strict versus non-strict inequalities actually matter in practice. They matter a great deal. I mean also this all matters conceptually, too. So you should not confuse the two ever frankly, but okay.
Now, to check if thereís a set if there is a separating hyper plane is actually just a set of linear inequalities. Thatís this, okay Ė in A and B. So for us itís an LP feasibility problem. Thereís nothing more to say period. Okay, we will say more; weíll say it next week, but thatís it.
Let me just mention something that you might use this for. You might use it this way. You might actually take a whole bunch of snapshots of stuff where something actually happened like a target was present or something. Or a disease was present or something. This can be gene expression data.
This could be the gene expression data, giant piles of gene expression data where disease was not present. Okay. These of course, would have a dimension. These Xs could be a million dimensional. For example, something like that. Okay, then what you want, I mean if, then supposed there were a separating hyper plane. There wouldnít be.
But letís imagine that there were one. If there were a separating hyper plane, you could actually now do something quite interesting. Somebody shows up, you do the gene expression or array. You get the data and you plug it in and this number here will either be positive or itíll be negative.
If itís minus 3, you might guess with some confidence, depending on if you believe any of this, that some disease is present. By the way, if it comes out about 0, what would you Ė thatís a good time to say we donít know. So weíll get to the shades business. So thatís the kind of thing you would use separating hyper planes for so it would be to basically predictions of new points where you donít know which of the two outcomes actually will happen you want to predict. Okay, so weíll quit here.
[End of Audio]
Duration: 77 minutes