ConvexOptimizationI-Lecture05

Instructor (Stephen Boyd):They should install it before then. Okay. Great. Weíll start [cuts out] announcements, the first which we put on the website also, which is the authoritative source of information is that this weekís section which would have been yesterday will in fact be today. As a matter of fact, itíll be at pretty much the worst possible time. Itíll be at 11:00 a.m. today, in other words almost immediately following this lecture. So itíll be 11:00, 191 Skilling, so I guess thatís upstairs. Thatís the section this week. Letís see another announcement is just to remind you if you were a victim of EE Quals and have filled out the proper paperwork, then your Homework 1 is due today. Then you should jump right into Homework 2. And weíre working on Homework 3 for you, so it should be Ė Homework 3íll be nice.

Okay. Iím trying to think of this. Any other announcements? Oh, thereís one other. In the next week, weíre making the transition in the class Ė actually, even in todayís lecture from sort of boring background material to actually stuff thatís useful. So thatís actually gonna happen in the next week. But part of that is that by next Monday and by Homework 3, thereíll actually be a software component to the class. So next Mondayís section Ė Iíll announce this again, but next Mondayís section will be on CVX, so weíll explain how that works and so on. Itís actually quite well documented. I would recommend if you get bored or something like that, if Homework 2 isnít quite enough for you, or something like that, you get bored or who knows what, what you should do is install CVX. Now actually you should install it Iíd say periodically as we silently fix errors, and you can watch the build number increment. But you should go ahead and install it, maybe read the manual. This is just if you get in some weird mood where you feel like doing this, you should do that. Anyway, you wonít have a choice. By next Monday, you will have to install it and start reading the manual and all that kind of stuff. And itíll tie into everything youíve seen so that should be okay. So thatís not really Ė itís just looking ahead. The other of course is I should mention that the reading assignment is that you should be reading Chapter 4. Actually, hopefully you should have skimmed it already at the very least because thatís what weíre gonna work on today, and probably the next lecture as well. So any questions about last time? Otherwise, weíll zoom forward and actually get to Ė actually, itís gonna almost be useful soon, maybe even today.

Student:Are you gonna discuss generalized inequalities?

Instructor (Stephen Boyd):Am I gonna discuss generalized inequalities? No. That was the question. That was my answer, too. No. No, itís clear enough in the book. And when we get to something where itís relevant, like experiment design Ė itíll also be relevant in detection and estimation, then Iíll go back over it. Also in multi-criterion optimization which weíre gonna do later. So Iíll go back over it. Okay. So today weíre gonna go through optimization problems. The first part is a bit boring. Itís just setting down basic terminology like what does it mean to be feasible, what are the constraint sets, actually what is a convex optimization problem, but then itíll transition to actually useful, so you actually find out about what a linear program is, what a quadratic program is, second order cone program, and these types of things. Weíll get to those maybe later Ė hopefully, this lecture. So weíll start with just the boring nomenclature of an optimization problem. So an optimization problem looks like this. This is a notation for it is to minimize an objective, f0(x) subject to fi(x)=0. Thatís from i = 1,Ö,m, and hi(x) = 0. So this is sort of a standard form for an optimization problem.

By the way, if you read the Russian, the Soviet literature, youíd see different things, which is actually Ė I guess many Russians are now teaching in U.S. and European universities, and itís now propagated and you might as well know what it looks like. They would write it this way. Itís actually a perfectly good notation like that. So you will see this, and it means that f0(x), the objective is to be minimized. So I guess I donít know how to pronounce this in Russian, but itís something like that. So youíll see this, and youíll see some other ones. Now youíll also see some things that drive me insane, and Iíll mention one of them. Youíll see this, and that really drives me insane because the semantics is wrong. Min actually has a meaning. Like you have the min Ė min takes as argument a set of numbers. In the strict definition, min takes as argument a finite of set of numbers, and it returns a number. Minimize is actually a verb. Itís not a mathematical operator, and itís part of the verb that you use to create an optimization problem, so these are different things. But youíll see this.

Actually, sometimes they just barely escape by putting a period at the end, which suggests that it wasnít min. It was short for minimize. But anyway, youíll see that. Sloppy. Now here x is the optimization variable. This is called the cost. Itís sometimes called the objective. Itís got all sorts of other names or Ė if itís in a specific application area, itíll have a name, too. Then these are called the inequality constraint functions, and the hs are the equality constraint functions. Now here the right hand sides are zero, but weíre gonna see very quickly how you would Ė if the right hand side werenít zero, how you could transform it. So itís just useful to have a standard form like this.

Now the optimal value of this problem Ė so the type of object or data what this is thatís an optimization problem without the min, or with the min period. Thatís a problem. By the way, it has no numeric Ė by itself, you canít say this problem is three or something like that. But you can talk about the optimal value of a problem, and thatís simply the infimum or minimum of f0(x) over all x that satisfy the equality constraints. And thereís a completely standard convention that the infimum of an empty set is infinity, so thatís absolutely standard. That says youíd say that the optimal value of this problem is infinity. If thereís no feasible x, if thereís no x that satisfies these inequalities and equalities. And that makes sense because if your objective is to minimize something, then the value plus infinity basically means youíve done infinitely poorly. You could not have done more poorly if your job was to minimize something if the value is infinity. And that is after all what infeasibility means. It means you didnít even get to the point where you could actually compare two points and say yours is better than mine because there werenít any that met the hard constraints. Okay. Now on the other hand, this could be minus infinity, and that would be the case if there were a sequence of xs all feasible with f0(x) k say going to minus infinity.

By the way, in different problems this is referred to different ways. For example, in something called risk sensitive estimation, they have a wonderful name for that when the problem is unbounded below. Itís called euphoric breakdown. Thatís the technical name for that case. They have all sorts of other names for it in different fields and things like that. I guess itís infinitely good, I guess. What it really means is you probably set up Ė your modeling of your problem is not that great, but okay. So mentioned the idea of optimal and locally optimal points. Well, you say a point is feasible if itís in the domain of the objective and it satisfies the constraint. By the way, if it satisfies the constraints, it must also be in the domain of the constraint functions because you certainly wouldnít say it satisfies the constraints if it didnít also Ė to satisfy the constraints, fi(x) actually has to return a number and not the NID token, the not in domain token because thereís no way, if that returned not in domain, that you could actually say that the inequality was valid. Okay? Okay. And a point is optimal if itís feasible and if its objective value is equal to the optimal value of the problem. Thatís optimal. Capital Xopt not totally standard notation, but I think fairly standard. Thatís the set of optimal points. And this can be empty, for example if the problem is infeasible of course, itís empty. If the problem is unboundable below, this is empty. And it can be even when p* is finite, this can be empty. So thereís Ė you can have everything, in which case Ė

By the way, to say that Xopt is the empty set, the way people would say that is youíd say the optimal value is not achieved is what youíd say. If itís finite, it Ė Iíll give you an example. Letís just minimize 1/x on the positive axis. So if you minimize 1/x, the optimal value is obviously zero, but itís not achieved because thereís no x for which 1/x Ė thereís no x in R++ for which 1/x is equal to zero. So you can get as close as you like and so on and so forth. So in this case, you would have Xopt is empty, and you would say thatís a problem with optimal value zero which is not achieved. Now you say a problem is locally optimal if when you add a constraint Ė notice that this depends on x Ė if you add a constraint, that you not stray far in norm from x. R has to be positive here. If you can add such a constraint, and x becomes optimal, then itís locally optimal. And then these are kind of Ė you can just work through these. These are very straightforward. So if you minimize minus log, itís unbounded below obviously because I just take infinitely large xs and the sequence of xs getting larger and larger, negative log gets smaller and smaller.

Entropy here Ė that negative entropy Ė that actually has a perfectly well-defined minimum that occurs at 1/e. You just differentiate, set it equal to zero, and the optimal value is -1/e. So this would be one with none of the pathologies here. If you have f0(x) = x3 Ė 3x, here itís unbounded below. Obviously, as x goes to minus infinity, this thing goes to minus infinity, so itís unbounded below, but you have a local optimum at x = 1. So if you take a look at this, you will find that around x = 1, this thing goes like that. The derivative is zero, and in a little neighborhood around it, actually itís locally convex there, and that means as you stray from x = 1, the objective value goes up, so thatís locally optimal. Iím going quickly over this because itís kinda boring. And then when we get to the actual specific problem classes, itís more interesting. Now when you have this standard form problem, when you write this down, thereís the implicit condition Ė this basically says this function is what youíre gonna use to compare to proposed points that meet the basic requirements. The basic requirements are that you should be feasible, but thereís even something even more basic.

You have to be in the domain of all the fs, the objective and the constraints, and you have to be in the domain of all of the constraint and objective functions because if youíre not in the domain of them, it means that one of these things evaluates to Ė it doesnít make sense. And you would certainly not say something is feasible if one of these things Ė if you were out of the domain of one of these. So thereís an implicit constraint that the only xs youíll consider are in the domain of all functions, all objective and constraint functions, the objective and all constraints. So thatís sometimes called an implicit constraint because you donít write it down explicitly. Actually, merely by writing down the symbols fi and then called on x, you are actually imposing the constraint that x is in the domain of fi. So thatís why itís implicit. Later weíll see thereís actually a difference between implicit and explicit, and it will make a difference, but itís something to think about. When you write out the constraints explicitly like this, these are called explicit constraints. And you say a problem is unconstrained if it has no explicit constraints. Here would be a very common example, one in fact that weíll see a great deal of. Itís minimize the following function. Itís the sum of the negative log(bi Ė aiTx). Now to talk about the log of something, at least if youíre not in a complex variables course, not in the context of a complex variables course, to talk about the log of something, this has to be positive.

And in fact, our definition of log which would differ of course from the definition of log used in a complex variables course, complex analysis course Ė our definition of log has a domain that is R++. So merely by stating log Ė if I write down log(u), I donít even have to put that in a constraint. That has a side effect, and the side effect Ė Iím talking about a side effect the way when you call a function. That has a side effect, and the side effect is that requires you to be positive, so thatís the way it will work Ė I mean this is all standard, but thatís the idea. So here, these have to be positive here. Now to say that bi Ė aiTx is positive says that thereís actually a positive slack in the inequalities aiTx < bi like that. So the set of x that satisfies this is actually an open polyhedron. So itís an open polyhedron where this even makes sense, the domain of this Ė open polyhedron. Now on that open polyhedron, this function here is convex, smooth, everything. So what you would say is if someone asks you, ďWhat about this problem,Ē youíd say, ďWell, that problem is unconstrained.Ē I guess maybe the right Ė if you really wanted to be Ė specify, youíd say itís formally unconstrained. It has no constraint functions. Thatís what youíd say.

Of course, thereís an implicit Ė and then if someone says, ďUnconstrained? Does that mean I can throw in any x I like?Ē The answer to that is no. You can throw in any x that satisfies this, and thatís an implicit constraint inherited from the domain of f0. Okay. Letís see. Letís talk about the feasibility problem. One variation on the optimization problem is this, and itís sometimes written this way. Find x subject to Ė by the way, you will also sometimes see subject to written as such that, which I donít actively Ė Thatís a question of style, and by the way, this is bad style. But nevertheless, itís within the Ė itís only in the Ė itís between the green and Ė itís somewhere in the yellow range for me. Itís not Ė thatís not what such that means in English, by the way. Subject to are the English words that say exactly what happens when you do something subject to restrictions. You donít do something such that restrictions. Thatís actually not English, but you will see this, but not from me you wonít. And the worst of course is min this. Oh, and by the way the initials S period T period go either way, though itís really your Ė I donít know, so S period T Ė thatís even worse, though. So find x subject to these constraints.

This just says there is no objective, and what that means is that any x that satisfies these constraints is equally attractive. Thereís just nothing Ė youíre completely neutral. It just means find anything that works there. Thatís called a feasibility problem, and itís really the same as this, or itís convenient to make it the same as this. You make it minimize the constant objective function zero subject to, and then these inequality constraints. Now that looks stupid, but itís not and it encodes everything correctly. So what it means is this: the optimal value of this problem is either zero or plus infinity. If there is a feasible point, then Ė any feasible point here is optimal because if I have a feasible point, I could ask what is its objective value, and the answer would be zero, but thatís as small as the objective value gets among feasible points. If there is one, thatís p*. Therefore, any feasible point is optimal here. On the other hand, if itís infeasible then the p* is you take the infimum of zero over the empty set, and thatís plus infinity so everything works out just fine when you do this. Yeah?

Student:[Inaudible] x offset, just the intersection of every domain and every constraint?

Instructor (Stephen Boyd):No. Itís not the intersection of domains. The optimal set here coincides with the feasible set, so basically in this problem, in this formulation if youíre feasible, youíre optimal.

Student:But doesnít that include being in the domain of everything?

Instructor (Stephen Boyd):No.

Student:How can it be in a domain and not satisfy that?

Instructor (Stephen Boyd):Here, Iíll make an example. Ready? Letís do one. I could say log(1 + x) is Ė doesnít really matter. Iíll make it easy on myself there, something like that. Now I have to figure out what that means, but anyway Ė and then Iím gonna add the following constraint. Or I guess Ė letís see we havenít done that. We havenít turned things around yet, so Iím gonna write it like this. And then Iím gonna write another one which is Ė x = Ė 10.

Now to say that youíre in the domain here, thereís no domain restriction from this one. There is a domain restriction here, and itís that 1 + x should be positive. But thatís not the same as the feasible set. The feasible set Ė this says that x should be bigger than or equal to ten. Thatís the feasible set so that theyíre not the same. You have to go back and look at it carefully. Now I can say what a convex optimization problem is a mere three weeks into the class. Actually, I kinda said earlier. So here it is. Thereís a restriction, and the restriction is this: the objective must be convex function and the inequality constraints must also be convex. The equality constraints, that is the his, must be affine. And that means I can write them this way. I can also just write it as Ax = b in very compact notation.

Now if the objective is quasi-convex, then this is called a quasi-convex optimization problem, and you might just write it out Ė this is where Iíve just substituted for the equality constraints a single matrix vector equality constraint. Now the feasible set of a convex optimization problem is convex, but watch out because thereís another definition of convex optimization problem. I guess itís what I call Ė thereís the strict definition, and there is the looser definition. The looser definition actually makes more sense, is more appealing, and so on, and in terms of actually doing convex optimization, itís useless but let me say what it is. And it sounds right. If someone just bumps into you on the street and says, ďWhatís a convex problem?Ē you would say this: itís minimizing a convex a function over a convex set. What could be more natural? And in fact, thatís called the abstract of the loose definition or something like that. And in fact, itís okay but I wanna point out here that for us an attribute Ė so to say itís a convex optimization problem, it is not an attribute of the feasible set. Itís an attribute of the problem description, period.

So thereís other weird ways to write Ax = b, and they will not work with these things. For example, I could write this. I could write || Ax Ė b || = 0. You have to admit if || Ax Ė b || = 0, Ax = b. So I havenít changed anything here. Thatís not a convex problem in the way weíll describe it, even though itís absolutely equivalent to one with Ax = b. Okay, so for us whether or not youíre a convex optimization problem is an attribute of the description of the problem and not for example the feasible set or something like that. And this will be clear in a couple of cases. Hereís a simple example. Letís scan this problem for convexity and see what happens. Well, the objective Ė what do you think of the objective? Well, the objective is okay because itís convex and youíre to minimize it. So the objective is cool. Letís pass on to the first constraint.

Well, weíre already in trouble with the first constraint because this constraint here defines a convex set. The first constraint Ė itís nothing but x1 = 0 obviously, right? 1 + x22 is positive. I can get rid of it. However, this is f1. As Iíve described this problem, thatís f1 and that is not a convex function, and so this problem is rejected. It is not convex. And the second one also is not convex because Ė that is indeed a convex function of x1 and x2, but youíre not allowed to have convex function equals zero. You can have a convex function less than or equal to zero. So this Ė I would have to call that f2 now Ė this actually wouldíve been okay. That wouldíve been fine. Thatíd be a convex problem because you have a convex function here less than or equal to zero. But the point is here is you take these and you rewrite it in an equivalent way. By the way, the problem Ė these are not identical problems. The problems are identical only if the objective functions and constraint functions are identical. Then the two problems are identical. However, theyíre equivalent, and weíll use a kind of an informal idea, but nevertheless completely clear idea of what equivalent means. Equivalent means that by solving one, you can construct a solution of the other and vice versa. And construct it means not using too much work. So itís something like Ė I mean if youíve had a course on complexity theory and computer science or something, itís something like that except weíre not gonna be very formal about it. Itís something like a polynomial reduction of one problem to another. So thatís what we mean by equivalent except that we wonít get too fancy about equivalences are there. For us, itís gonna be an informal idea. Yeah?

Student:When you say they share a solution set, do you mean both in terms of the value of x and in off the value of x?

Instructor (Stephen Boyd):Absolutely. In this case, I didnít change the objective, so thatís why they Ė

Student:But you could change it?

Instructor (Stephen Boyd):I could change the objective. Right. So for us, equivalent problem means this Ė I think the best way to say it is that you could write a very thin wrapper code around one problem that would then solve the other problem. Thatís what it means. And in this case, it would be very thin. I mean what would happen is this: to solve this problem, you would solve that one, and then return the solution. So thatís a pretty thin wrapper that goes around this. In other cases, weíll see you might do other things. Actually, weíre gonna see lots of little equivalences, and Iíll make it very clear whatís happening. But thatís the right way to think of equivalence as weíll use it. It means writing a wrapper around one to solve the other. Thatís what itís gonna mean. And a wrapper by the way that doesnít do like a huge amount of work. Okay. Letís say a little bit about local and global optima. So if you have a locally optimal point of a convex problem, itís globally optimal. This is by the way the kind of thing that people on the street would know. This is kind of the basic thing youíd know about convex optimization would be this, you know variations on it. And itís actually very impressive if you think carefully about it. It basically says if you have a problem, if you have an optimization problem Ė it says if you were to restrict your search Ė you have a point which is a candidate for being optimal, and it says if you restrict your search arbitrarily closely, locally Ė but if you do a full search in there and find that thereís actually no better point locally, you can make the stunning conclusion from having examined this R ball which is tiny Ė in fact, it can be as small as you like Ė you can make the stunning conclusion that in fact even if you were to search over everywhere, thereíd be nothing better.

Although, you know after a while you get used to it that the proof of these things is like three lines or something like that, so itís not a big deal. Still, what itís saying is actually stunning if you think carefully about it. To say that by Ė well, itís actually highly implausible when you think about it. It basically says if you do a competent search nearby and find nothing better, then in fact you donít need to go and look anywhere else. Actually, this is the best you can do. Of course, thatís inherited from all the properties of convex functions and so on. So letís see how you would show this. Itís actually totally straightforward. Suppose you have a point thatís locally optimal but not globally optimal. What that means is thereís a point Ė another point thatís optimal Ė in fact, you donít even need the point to be optimal. So you just need a point thatís feasible and has the objective value better than your locally optimal point. So suppose that happens. Then itís very simple what to do. You simply construct the line segment between your local optimum and this better but feasible point. You construct the line segment, and you start moving towards Ė from where you are locally optimal to this point thatís better. What happens is Ė of course, as you move on that line, you remain feasible because x is feasible, y is feasible, the feasible set is convex. Therefore, all along that line segment you will be feasible.

Then what can you say? Well, now you have a convex function that basically is locally optimal at first, but then later actually achieves a value lower. And of course, thatís impossible. So thatís the idea. Itís very simple to show this. And I wonít go through all of these details, but thatís kind of the idea. What I just described is it. Okay. Letís look at an optimality criterion. Again, it looks very simple. Itís very geometric, and itís gonna generalize what youíve seen before, this idea of the gradient vanishing being the condition for optimality. So here it is. A point x is optimal Ė this is for a convex optimization problem Ė if and only if itís feasible, and the following is true: it says that the gradient of the objective evaluated at the point (x)T(y Ė x) = 0 for all feasible y.

Now geometrically, it means this. Hereís your feasible set here. These are the level curves of the objective. So this is clearly the optimal point. Itís the point in the feasible set with the lowest objective value. So thatís right here. You write the gradient of the Ė actually, you write the negative gradient of the objective function. That would be the normal to the hyperplane here, the local hyperplane of approximately constant objective like this and it says that this hyperplane here actually supports the feasible set at that point. It looks like that. Thatís the picture. I can actually sort of explain this. It just comes from this thing, right? It basically says the following. You know the following is true, right? f(x) plus if f is convex, this is true for all y. So if these are the level curves of your function like that, and I evaluate it say here, and the negative gradient points in that direction Ė it says basically if I draw this hyperplane like this, this inequality tells me something. It says that every point in this entire half-space here has a function value higher than the function value there. Thatís what it says in particular because if this is non-negative, then thatís bigger than that. So thatís the basic Ė the basic inequality comes from there, or the basic geometric conclusion is that when you evaluate the gradient of a convex function, and it points in this direction, it says that all the hyperplanes Ė I guess well the hyperplane Ė the half-space with that normal, all those points have worse value of f, and thereís no point even looking at it because itís absolutely guaranteed.

So what this says is all feasible points live within this half-space here where the objective is worse than or equal to the value there. Well, guess what? That means that point is globally optimal. Okay? And by the way, to show the if and only if, itís very easy. I just showed one way. The other wayís even easier. If this is true, then this basic inequality proves it for us. So itís quite straightforward. So this is it. By the way, this innocent little condition here Ė and notice that if the problem is unconstrained, letís see what this says. Weíll look at these in list fashion. Letís take this simple Ė by the way, weíre gonna come back and weíre gonna look at optimality criteria in much more detail later, much more detail. But for now, this is just your first glimpse at optimality criteria for convex problems.

Student:What happens if the objective is not differentiated by that point?

Instructor (Stephen Boyd):Then this doesnít apply. Yeah. So the question is what happens if the objective is not differentiable by that point, and my response was then this doesnít apply. Okay. So letís look at an unconstrained problem. Well, letís see. What would mean to say if itís unconstrained, then this is just basically for all y in Rn, so wait a minute. If I can choose y to be anything I like in Rn, this vector here can be anything I like. What does it mean to say that I have a vector Ė letís call that g. Suppose I told you that gTz = 0 for all z in Rn. Whatís your conclusion? Yeah, g = 0. But how do you know that, by the way? Whatís a quick argument for that?

Student:Because if g is not zero [inaudible].

Instructor (Stephen Boyd):Okay. Yeah, sure. Hereís a quick snappy one. Try that z, and youíll get your conclusion real fast because you get minus norm g2 = 0. So g has to be zero. Thatís the only way. So this condition which looks kinda complicated Ė actually, the only thing you have to remember is this picture. So you remember this picture, then thereís nothing more to say. So here, the condition is that the gradient is zero. Equality constraint, that Ė we can also get this, and this will recover these things like Lagrange multipliers which youíve probably seen earlier and you will see again. So you wanna minimize f0(x) subject to Ax = b. Well, x is optimal if and only if Ė I mean the condition is this: youíre feasible and the gradient plus AT? = 0. Now to show that Ė and Iíll just give a quick plausibility argument, not give the whole argument for that, but Iíll show that. It says basically the following: the gradient of f0(x)T(z Ė x) = 0 for all z that satisfy Az = b. Okay? Something like that. Well, I can Ė thereís many ways I can write this. Actually, this is Ė whatís even more interesting, we also know that Ax = b, and what we see is that it means z Ė x is in the null space of A. Thatís what this says.

So this says that the gradient of f0(x) is orthogonal to everything in the null space of A, so weíre gonna write that this way like so. Oh, you know what? Iím sorry. I skipped a step. Sorry. The gradient has non-negative inner product with everything in the null space of A. However, if you have a subspace Ė if you have a non-negative inner product with something in a subspace, then it also is non-negative when you Ė well, if you reverse the sign of z Ė x, which is still in the subspace, itís still in the subspace. Therefore, you have a non-negative inner product with a vector, and you have a non-negative inner product with its negative. That means you have to be zero. I skipped that. So that says this. Grad f0(x) is in the null space of A perp or something like that. It means itís orthogonal to the null space of A. Okay? Now that is R(AT), so that says grad f0(x) is in R(AT). That means that grad f0(x) = AT times some u, and Iíll later make u Ė Iíll mess with u to make it fit our form. But actually, Iím kinda done because I could write this as grad f0(x) Ė letís see how I wrote it here. Yeah. I can write plus AT(-u) = 0 for some u. And I should Ė I guess I [cuts out] smart, I wouldíve put the minus, but it wouldíve been kind of weird, so anyway. There we go. Thatís the condition. By the way, donít worry about these. This is just your first glimpse at this. We will spend a week, week and a half on all these things later in much greater detail. Okay.

Now it starts getting interesting. It could be argued Ė I mean I presume that you would know this or have heard of it. I would presume you heard this: generally, Lagrange multipliers are taught as a behavior in an advanced calculus class somewhere where someone says how do you solve that and youíre taught the following Ė then you should do this. And it doesnít make much sense, but youíre told thatís what has to happen. This would maybe be the first one where itís not obvious, and something maybe you havenít seen. So what are the conditions under which Ė when would a point x in the non-negative orthant minimize a convex function? Thatís the question. Well, letís just apply the Ė weíll end up deriving it, but itís gonna look something like this. It basically says grad f(x)T(z Ė x) = 0 for all z positive. Thatís our basic condition. And letís simplify this now. So letís take a good look at this. This has to be positive for any non-negative z here, so letís see what happens. First of all, I can plug in a bunch of things. I can plug in z = 0, and I get the following that grad f(x)Tx = 0. Everybody agree with that? Thatís from z = 0.

And now I can do the following. I can let z Ė if an entry of this vector were negative, Iím in big trouble because if an entry were negative, I would take z Ė if the ith entry of this thing is negative, I take z = t times ei, and then I let t get super big. Then what happens here is this thing here Ė as t goes to infinity, this thing is an affine function of t with a negative slope, and thereís no way itís gonna be bigger than or equal to zero. So we can also conclude that this vector is positive. Thereís a squiggles there. These are straight. These are squiggled. Okay? But now itís getting interesting because look at this. I have the inner product of two non-negative vectors and itís less than or equal to zero. Now when you take the inner product, you take the sum of the associated products, but obviously if youíre doing that with two vectors that are non-negative and you end up with something thatís less than or equal to zero, thereís only one choice. It has to be zero, No. 1. And No. 2, it has to be complementary. In other words, we actually can say this. So basically, it says if xi is positive, then that correspondent Ė that component of the gradient is zero and vice versa. If this is positive, thatís zero, period. This is called complementarity, and youíre gonna see it a lot, or youíll see it a bit. Okay? So thatís the condition here as we just worked out this. And these are already kinda Ė these are not totally obvious. Okay. Next topic is equivalent convex problems, so as I said weíre gonna use an informal notion of this. You say that theyíre equivalent if the solution of one is obtained with modest effort from the solution of another. What weíll do is just give a bunch of common transformations that would preserve equivalence, just examples to give you a rough idea.

Hereís one. You could eliminate equality constraints. So I wanna minimize f0(x) subject to fi(x) = 0 and Ax = b. I can eliminate these, and the way I do that is I find a matrix F and a point z so that Ax = b is equivalent to having the form Fz + x0 for some z. Now how do you get F? I mean these are quite easy. F is like the null space. The columns of A would be a basis for the null Ė not a basis, sorry. They would span. Thatís all we have to do. They would span the null space of A. And then x0 is any particular solution of Ax = b. For example, it could be the least norms solution for example. So this is linear algebra to go from Ab to F and x0. But the point is this is what people would call a constrained representation of an affine set. And this is what people would call a free parameter representation. So this is parameterized by z which is now free. Okay? If itís a minimal one, you could say something about the size of these things, but thatís just linear algebra.

So what Iíll do now is Iíll make a new problem which is this. Iíll take x = Fz + x0 and I get this problem here. I donít needs this constraint because basically A(Fz + x0) = b for all z automatically because AF is zero. Thatís how you constructed F, and Ax0 is b. Thatís how you constructed x0. So you donít need this constraint. You drop the constraint, and you get this. Now these are not the same problem. I mean theyíre totally different. They have dimension in variables. This one has equality constraints. This has none. But theyíre clearly equivalent, and the idea would go something like this. If you wanted to solve this problem, you would solve this problem and then return as x, Fz + x0. Thatís what youíd do. Okay? Now the idea that it has to be a modest transformation to do this has something to do with the following fact that when you get A and b, it has to require modest computation to calculate F and x0, and indeed it does. Thatís just basic numerical linear algebra.

Student:[Inaudible] F again?

Instructor (Stephen Boyd):Yes? Oh, F Ė the columns of F are for example a basis for the null space of A. Thatís good enough. Thereís many ways to construct such an F like that. Okay. And by the way, if this problem is convex, so is this one. And the reason is that the objective is actually a Ė you pre-compose with an affine function, but if you pre-composed with an affine function, that preserves convexity. Okay? This one is unconstrained, no equality constraints. Now you can do the opposite, and in fact the shock or hopefully within three or four weeks you will Ė I mean later in the class thereíll be like Ė I donít know. When you Ė somewhere near the ninth week of the class, youíll think what were some of the shocking things Ė that a lot of it kinda sorta made sense intuitively, but some of it will be Ė thereíll be things where youíll say that wasnít obvious. Now Iím gonna tell you one. This is just a topic, but weíll get to it. Itís this. Itís actually often extremely useful to introduce equality constraints, so to ďuneliminate.Ē Everyone has a bias towards elimination because people like to solve problems with smaller numbers of variables. I will cure you of that before this quarter is over. So if you know what youíre doing, not only is it sometimes not a good idea to eliminate constraints, itís often a good idea to introduce new variables and constraints, which sounds very odd when you first see it, but trust me before the quarterís over, itíll be very natural to you. Yeah.

Student:Yeah, why do the equality constraints have to be affine? If weíre just gonna compose it and need to preserve convexity, couldnít we just have any convex Ė

Instructor (Stephen Boyd):Well, my answer to that which is Ė the question was why do the equality constraints have to be affine, so my answer to that is gonna be very unsatisfying to you. Itís this: thatís our definition of a convex problem. By definition, a convex problem Ė the only equality constraints we have our affine. Sorry. Well, generally speaking, other affine constraints donít lead Ė then the constraint set generally is not convex. And then you donít even meet the broader definition of a convex problem, which is minimizing a convex function over a convex set. So sorry, itís by fiat. Convex problem has affine constraints. I told you you wouldnít like the answer. Yeah. Oh well. Sometimes thereís good answers. Sometimes thereís unsatisfying ones. That was the latter. Okay. So how do you add Ė I mean you can actually introduce equality constraints, which by the way some people call unelimination. I kinda like that. So it works like this. You have a problem that looks like this. You could do the following. Iíll introduce a new variable y0 here. Iíll introduce a new variable yi here. And Iíll have this problem. Minimize over x and now also yi, so Iíve added new variables Ė f0(y0) subject to fi(yi) = 0, and then these equality constraints.

Now you could reasonably ask Ė itís entirely reasonable to ask the question why on earth would you add extra ridiculous variables because this doesnít look like progress. The problem started with no equality constraints. After this unelimination step, you have added variables and added equality constraints. This does not look like progress. Later youíll see shockingly this is the first step to lots of progress. I know itís weird, but this will all be clear later. Another trick is to introduce slack variables for linear inequalities. Now that works like this. Youíre gonna say I have AiTx = bi. Thatís fine. What Iíll do is Iíll convert that to an equality constraint. Iíll convert that to AiTx + si = bi. I mean itís stupid. This is the same as saying si is the slack. Itís bi - AiTx, and then Iíll write that the slacks are bigger than or equal to zero. Or to put it in our absolutely canonical form, I would write Ė si = 0. Okay? Here Ė to put it this way. Again, you know these are simple. Actually, whatís remarkable is that weíre gonna look at a bunch of transformations. Most of them are really stupid like these, and itís weird to think Ė yeah, itís like everything else. You learn about 15 trivial things, and then you put them together in just the right sequence, and all of the sudden something that was not obvious at all will come out. So yeah?

Student:When you minimize that problem, donít you know for a fact that the minimal si will be zero, or s will be zero since [inaudible] push up against the bound?

Instructor (Stephen Boyd):No. Also, thatís not true. Thatís just not true.

Student:So si can be actually finite?

Instructor (Stephen Boyd):Yeah. The slack can be positive. Sure. It depends. Itís basically you solve that problem Ė I mean if you wanna Ė by the way, each of these needs an argument that they actually are equivalent. So donít just trust me. After you do a few of these yourselves, work these out, youíll see. Also, youíll probably do some of these numerically, and basically if you donít know what youíre doing, youíll just come out with the wrong answer, so thatíll be another feedback mechanism for you. But letís just check here that theyíre equivalent. Well, if I solve this problem, then I can define si = bi - AiTx. Now if you solve this problem, itís feasible, so the sis are positive, and those sis are feasible for this, and in fact theyíre optimal, but Iíd have to argue that, which Iím not gonna do. They didnít go backwards. And you can say suppose someone solved this. Weíre gonna make a Ė the wrapper that would allow you to solve this problem if you had code to solve this, or if you had the solution for this or whatever. So if I solve this problem, I drop x and I return Ė the solution of this one gives me x and s, and I simply drop x and forward s, and argue that itís actually Ė here, itís optimal here. So that would be the argument. Iím actually not gonna do it because these are actually shockingly boring, but should be done once or twice or something or more, but not many more times than that, and you will. I promise.

Student:[Inaudible] x and s Ė is there another objective? Or what is Ė

Instructor (Stephen Boyd):No. It just means I have more variables. So it means that when you minimize over x here, it means that x is what you can mess with. Here when you minimize over x and s, it says you have x and s to mess with, so thatís what it means. Hereís a simple trick. Itís called the epigraph trick. Itís actually quite useful, and itís widely used in practice. Letís see how that works. I mean itís really dumb. Hereís what it is. You wanna minimize f0. You introduce a new scalar variable t, and you introduce this new constraint, which is f0 Ė this is basically f0(x) = t, and you minimize t. Now this is kinda dumb because if you solve this problem, first of all youíd be an idiot to choose any t other than f0(x) = t. Not an idiot, youíd just be Ė that t couldnít possibly be optimal is the point because your goal is to minimize t. However, I cannot adjoin this to this problem. It is an equivalent problem to write equals here. Thatís absolutely equivalent to the original problem. Unfortunately, itís not convex unless f0 is affine. However, with a less than or equal to, itís convex.

Itís a really dumb thing. Let me just draw this. Hereís a function that you wanna minimize like this. The original problem basically says find the point here that minimizes f0. This one does this ridiculous thing and introduces a new t thatís up here like that, and it says thatís your feasible set or something like that, and then it says go as far as you can in the direction Ėt, right? Like that, actually I guess the direction is -1, and you would recover this point here. By the way, this has immediate practical application. This says for example that if youíre gonna write a code to solve convex problems, you donít need to Ė without any loss of generality, you can just minimize a linear objective. Thatís quite standard. So you look on Ė if you go to Google and type C source and then some name of some convex problem, you will find itíll just Ė many times, they wonít even say it. Itíll just say it solves the following problem, and itís a linear objective. You go hey, I canít use this. My objective is not linear. Well, itís just expected that you know this, so this is just expected. By the way, itís generally not in the documentation which is actually interesting because thatís how widely this is supposed to be known. Itís also how unhelpful the documentation is. So this is the epigraph trick, and youíll see this used quite frequently. Oh, the other thing youíll hear is this. Youíll actually have people say just without loss of generality, the objective can be assumed to be linear because thatís a linear objective. Itís a linear function of x and t. So thatís just completely standard.

Youíll also hear people say things like this: they would say linear objective is universal in convex optimization, so whatever that means. It means something like if you can do linear objective, you can do anything. A couple of other equivalences Ė hereís one. You can minimize over some variables, so if you have a problem that says minimize f0(x1,x2) subject to fi(x1) = 0 for example, I could minimize this thing over x2, and that would reduce Ė that would give you this function, f0 tilde of x1, and that would give me this. By the way, for those of you who have seen dynamic programming for example, that would just be this Ė a fancy version of this would just give you all of dynamic programming, and basically then thereís nothing to say about it except itís just this. Oh, except interesting point Ė dynamic programming preserves convexity of a problem. You have a problem of a few variables, or in a general case a whole bunch of variables Ė for example, stages in an optimal control problem or something like that. If you minimize over some of the variables, you have to take account of that in the constraints, the objective, everything, but when you minimize over one family of variables in any convex function, the result is a convex function.

So we can now say something like this. Dynamic programming, if you know what Iím talking about, preserves convexity, so when you optimize over the last stage variable, and end up with an Ė if the original problem were convex over capital T stages, the T-1 stage problem also convex, and the T-2 and so on. Okay. Just out of curious, actually how many people have heard about dynamic programming and stuff like that? Okay, so not an empty set. Okay. Quasi-convex optimization Ė well, here f0 is quasi-convex. There are lots of actually practical problems that are gonna be Ė weíll see lots that are quasi-convex or something like that. And here you have to be very careful because for example this most basic thing which says a locally optimal is globally optimal is completely false. And hereís a perfect example. Iím making it exactly flat here. So thatís a locally optimal point, no doubt about it. And it is obviously not the global optimum here. And thatís a perfectly valid quasi-convex function.

So a lot of things are different now and need not be right. But hereís how you solve them with total effectiveness. It works like this. Generally speaking, if you have a quasi-convex function, I mean unless youíre just doing a theoretical analysis, if youíre actually gonna actually do anything Ė that is to say construct an algorithm that solves a problem, solve an actual problem, or anything like that other than just talk about it Ė then whatís gonna happen is this: youíre gonna actually want Ė you want a convex representation of the t-sublevel set. So itís not enough to say Ė so you construct a Ė now, a convex representation looks like this. It says that you look at this sublevel set, and you must write it as Ė Iíve written it as one convex inequality parameterized by t, but you donít have to. It could actually mushroom into a bunch of convex inequalities. It just doesnít matter. You need a convex representation of the sublevel set is what you need here. Hereís an example. Itís a general fact that if you have a ratio of a convex to a positive concave Ė actually, non-negative convex over a positive concave function, this function here is actually quasi-convex. Itís very easy to show that because Ė well, I mean thereís some horrible details, but letís do this. You wanna know is that a convex function of x.

By the way, if this were a convex function of x and t, that would be the proof that this function is actually convex in x. Itís not convex. Itís quasi-convex. That means t consider fixed, and you wanna know is the set of x that satisfies this convex. Thatís the question. Okay? So if you look at this, well you just multiply through, and you get p(x) = tq(x). Now youíre in pretty good shape because I write that as p(x) Ė tq(x) = 0. Thatís convex. Thatís concave. t is bigger than or equal to zero without loss of generality because if t is negative, the sublevel set is empty and convex, so Iíve already dealt with that case. Then this is convex minus non-negative multiple of concave. This whole thing is convex. Thatís a convex representation. Okay? It depends on t affinely, but that doesnít matter. In fact, the dependence on t is completely irrelevant. Okay? Now the way you solve a quasi-convex problem is this. You just solve it Ė if you think about what you can do now, if you have this convex representation for any T, it means you can do the following. You can actually ask a question like this. You could say is the optimal value less than or equal to t? And you have to solve a convex feasibility problem to determine it.

So you could say is the optimal value less than three, and the answer could be no. Then youíd start with a value of t where the answerís yes, like t equals 10. Then of course youíd do bisection, so youíd take the average of these. Youíd take 13, and then the next query would be 6.5 and so on and so forth. So that would be basic bisection, and that looks like this. So you start with a lower and an upper bound on p*. By the way, you might not have that, in which case sort of a practical version of this would start with t = 1 and query. And if that is infeasible, you would then try t = 2, then four, then eight, then 16 or something like that. And then when you get to a big enough number that you wouldnít be interested even if it were feasible, you would quit. So that would be one way. If itís feasible and you donít have a lower bound, youíd start with t = 1, then youíd try a half. If thatís feasible, youíd try a half, a quarter, and when you get down to something very small, youíd quit. Okay. Well, sorry thatís all on the assumption that itís a positive function that youíre minimizing. So you start with a known upper and lower bound on p*, and you simply query at the midpoint of your known upper and lower bounds, and you solve the convex feasibility problem. So this is just bisection. And every time you do this, your interval Ė you end up with two points, an l and a u. l is you actually have an x where the Ė sorry, you donít have an x. For u, you have an x that actually beats Ė that has at most the value f0(x) = u. And l on the other hand is a value known to be infeasible. Thereís no x that satisfies f0(x) = l. Anyway, and then at each step that interval of ignorance divides by two, so your ignorance halves at each iteration. So it takes log ten steps, and you reduce your ignorance by 1000, and 20 steps a million roughly. Thatís the idea. And yes, youíll definitely do this at some point.

Student:[Inaudible] wrote up there has objective zero implicitly?

Instructor (Stephen Boyd):Right.

Student:Okay. And what does the colon equal mean?

Instructor (Stephen Boyd):Where?

Student:In the bisection method?

Instructor (Stephen Boyd):Here?

Student:Yeah.

Instructor (Stephen Boyd):Thatís an assignment.

Student:Okay.

Instructor (Stephen Boyd):Okay. Now weíre gonna start in on Ė weíre gonna look at some problem families. This is actually quite interesting, quite useful. The problem families are interesting mostly because you can actually Ė all the things weíre gonna look at actually are extremely practical. I mean what weíve done so far is all very general and follows Ė actually would follow a traditional course on this material, but it would stay at that level. We start talking Ė the things weíre gonna talk about now are extremely interesting theoretically. In fact, some of them are at the height of fashion in various fields at the moment, not linear programming but other things weíll see. All right? The absolute height of fashion even among people doing strictly theory, and itís very interesting. But actually, whatís mostly interesting about it is that all of these things are extremely doable, so you can do this. In fact, you will shortly. So thatís sort of the Ė weíre now moving into things that this is more than stuff you talk about. These are things you can actually do. So I just thought Iíd point that out. So linear programming Ė a linear program is almost sort of the simplest optimization problem there is in one sense. All of the functions involved are affine. Everything is affine. The objective is affine. The inequality constraints are affine. By the way, and people call them linear so thatís okay. And the constraints are affine, again called linear constraints. It depends on whether when you say affine or not Ė linear, you allow an arbitrary right hand side. If you allow an arbitrary right hand side, then of course when someone says linear it means affine. Okay.

So thatís an LP. By the way, you know this is not a new topic, just to make that clear, not remotely a new topic. So Fourier wrote a paper about this which is a hint that this is not exactly a new topic. In fact, he wrote a paper on how to solve linear programs. It was a method that would be very effective if there were like four variables, but hey that was the beginning of the 19th century. Anyway, so modern history actually has a lot to do with Stanford, but to be fully honest it has a lot to do with Moscow State University and later Stanford, but it has a lot to do Stanford because this was sort of popularized in around 1948 by a bunch of people who were here, like George Dantzig. So the feasible set of course is a polyhedron, so your image of what an LP is should be this. Itís a polyhedron, and then the objective is affine, so I mean the level sets of an affine function are hyperplanes, so youíre just Ė hereís negative c, and youíre told go as far as you can in this direction. And this is a little example makes it totally clear thatís the solution. Okay? Now, before you imagine that this picture really fully does the trick of understanding it, I should point a few things out. If you have a linear program with I donít know 30 variables and like 100 inequalities, the feasible set oh itís a polyhedron. Itís a polyhedron. Wanna know how many vertices it has? Someone wanna take a guess?

I forgot what numbers I said, so what Iím about to say is a lie. It would become true by increasing those numbers modestly. The answerís just completely astronomical. Itís just absolutely astronomical. It goes up very quickly. So youíd write down whatís called a small linear program, and the number of vertices it will have on this polyhedron will simply be literally astronomical. It doesnít take much. You can easily Ė you will solve in this class many linear programs where the number of vertices in this Ė in the feasible set will exceed probably by a long shot the number of subatomic particles in the universe. You will solve many of them, quickly I might add. So I just point this out because this is the picture you should have. This is what I think of when I think of an LP, but be forewarned that this doesnít really give you a full Ė when you look at this Ė you know if you were to go -- if your roommate asks, ďWhat are you doing?Ē and you said, ďItís really Ė yeah, we learned today how to like go as far as you can in this direction on this set,Ē theyíd look at that and theyíd go, ďWhenís the drop deadline?Ē So just watch out. The pictures explain it, but I donít think you get the full idea of it. So letís look at some examples. Hereís a famous one. This is just here for historical purposes, so itís the diet problem. So youíre supposed to choose a bunch of foods, so you have n possible food types, and youíre gonna choose some quantities that youíre gonna feed some group of people. I donít know, presumably soldiers. I donít think anyone would eat this willingly, whatís gonna happen here, but youíre gonna feed some people some amounts of food.

One unit of food costs Ė you have these costs which are cj, and each one contains aij of some nutrient. Thatís like whatever 10 grams of the food or whatever our unit is here. And a healthy diet requires at least bi units of some nutrient. So how do you find the cheapest healthy diet? Well, you minimize cTx. Now cTx here is literally the cost because xi is the quantity of each type of food youíre gonna put into this mixture. And then when you take the inner product of c, that gives you the actual cost. So thatís minimizing the cost subject to Ax = b. A is nothing but this matrix here. Itís the nutrient matrix or whatever, and then b is this minimum daily allowed whatever, something like that. Okay? So this is a famous problem. x has to be positive. You canít feed people negative amounts or whatever. So thatís the Ė now, whether youíd wanna eat this diet is another story, and of course it doesnít take much here to realize itís a fairly poor model. It assumes these things sort of add linearly which might not be the case. You might impose Ė could I impose an upper bound on the amount of a nutrient or something like that here? Of course, I could. That would still be a linear program and so on. But this is just for historical reasons. Itís sort of the first one you would hear about. Okay. Hereís one: piecewise-linear minimization. So you wanna minimize a piecewise Ė the maximum of a bunch of affine functions. So you just wanna minimize that over x. Okay? Now, thatís an interesting problem because itís not remotely linear, not even close. In fact, when itís linear itís boring because I know how to minimize a single affine function. The answerís minus infinity.

So this is not linear, not remotely. Thereís nothing linear about that problem. Itís also quite useful because actually piecewise-linear functions can be used to approximate actually any convex function. So although this looks kinda silly, itís way more powerful than you think. So just minimize some piecewise-linear function. Itís an LP. You just put in epigraph form. So you just write it this way. Iím gonna minimize a new variable t subject to aiTx + bi = t. Done. Now itís linear Ė now the variables are x and t. These are affine linear inequalities, or to use the term that most people would use, these are linear inequalities in x and t. Thatís it. By the way, the reason you need to know this Ė thereís some reasons you need to know it because if you went to Google and typed source code PWL minimization, youíd find some Ė actually, what youíd find is terrible stuff from total backwater fields where they donít even know itís a linear program. Thatís what youíd find. And itíd be the worst stuff youíve ever Ė if you found anything.

On the other hand, you wonít find anywhere anything that would tell you if you didnít know that piecewise-linear Ė so I think thatís the problem. Everyone knows piecewise-linear minimization is equivalent to an LP. So what you really want to do is get LP solvers of which there is tons. In fact, sadly if you went to Google and typed that, youíd probably go to this PDF file, but the others would be very bad, I assure you. Weíll look at another one, and itís the Chebyshev center of a polyhedron. Now we already talked about this a little bit. We talked a little bit about the idea of yield maximization. So yield maximization was this problem. And let me just Ė this is just to sort of set this up, so youíd know why you would wanna do for example Chebyshev centering or something like that. Yield maximization was that we had some polyhedron of acceptable values of some parameters. Weíre gonna set a target for manufacturing. So weíre gonna set our machines to attempt to manufacture these points. In manufacture however, you will add a random perturbation to this point, and so what youíll get is youíll get a PDF that is centered Ė has mean value at this point, but otherwise tails off. And then weíre interested in maximizing the probability that youíre in this set. That would be the yield. Okay?

Thatís actually generally a hard problem, although convex if the PDF is log concave. But thereís actually a simpler method, I mean itís to simply say look, what you really want is you want that point to be well-centered. You want it deep in this set. For example, if the PDF were n zero i, so if the perturbations were Gaussian, uncorrelated, then basically it says that if you look at the PDF, the level sets would be circles. And so basically what you might argue then is you want this thing to be Ė to find the point here where the largest circle around it fits in this set. Itís a heuristic, but itís something that would actually work very well, just something like that for a nice rounded perturbation. So that leads you to this idea of the Chebyshev center of a polyhedron. And what is that? Well, Iíd have a bunch of inequalities. I have my polyhedron. And what I wanna do is I wanna find the center of the largest ball that fits inside. So another way to say it is thatís the point that is deepest inside the set. I can define the depth of a point in a set as the minimum distance to a complement of the set. Itís basically how far you are from the outside. Thatís the depth of a point. And what I would want to do is I wanna find the maximum depth point. I wanna find the point deepest in this set. Now by the way, you know youíre drawing pictures with spheres and balls, the whole thing says find me a Euclidian ball, but that means that Ė thatís not gonna trigger the linear portion of your brain. It shouldnít anyway, right?

So if you talk about linear things, flat is what Ė the portions that light up when you think about flat things should be flashing. But when I start talking about things like ellipsoids, and balls, and round things, that should just suppress all of the brain activity in the lobe that handles affine things. It turns out thatís wrong, so watch this. So Iím just saying that when someone tells you please find the center of the Ė when you see like ball, you should think round. Round should suppress the linear, and you would not at that point imagine itís gonna turn into an LP. Well, it is and the reason itís quite straightforward is this. aiTx = bi for all x in this ball centered at xc. Thatís easy to do. I simply maximize this thing over the ball and see whether that is less than bi. But this is nothing. Itís aiTxc. Thatís the center. Plus, the sup of aiTu when u is constrained to be less than r, and itís equal to this. Okay? Now so this Ė the inequality that you should stay on one side of the half-space is in fact here, and shockingly itís linear in xc and r. So you can do Chebyshev centering this way. You maximize r subject to aiTxc + r||ai||2 = bi. Now watch out because the norm should also be triggering your nonlinearity detectors. However, the norm is applied to data and it has no effect. The variables here are xc and r, and these are linear inequalities in xc and r. Question?

Student:I donít see why this thing has to be unique, especially with like a regular [inaudible].

Instructor (Stephen Boyd):Why what has to be unique?

Student:The Chebyshev center.

Instructor (Stephen Boyd):Right. It doesnít have to be. Youíre right. It does not have to be unique, so thatís a very good point. However, what this does is if itís not unique, this LP will construct n optimal solution. And by the way, I should mention essentially thatís the semantics of an optimization problem.

So if I put an optimization problem down on the table in front of you, all that can be asked of you is to produce a solution. So if someone then comes back later and says, ďHey, wait a minute. Iíve got two different solutions. Thatís not cool,Ē itís only not cool if one of them is either infeasible or if one has an objective better than the other, in which case you have to get rid of the person who produced the alleged Ė who alleged that their non-optimal point was optimal. But in general Ė By the way, if it really matters to you that it should be unique, then thereís actually ways where you can add a secondary objective or something like that to this. But thatís a very good point, and in fact for a Chebyshev center, it doesnít have to be. Just take a rectangle, and itís completely clear. Take a rectangle. You can fit a ball in it, but that ball can move. But any of those points technically solves the Chebyshev centering problem. Now for yield maximization, you would further want that ball to be in the center, right? But just to give you Ė but thatís a very good point. Yes?

Student:Kinda like the [inaudible] LP?

Instructor (Stephen Boyd):No, you could Ė this would be easily generalizable to other norms, very simple. All that happens here is if thatís a norm, thatís the dual norm. So no Ė here, Iíll just do it right now. There. Done. Now itís general norm. Now itís Ė you canít call it the Chebyshev. Itís the general norm maximum depth point ball in polyhedron problem or something like that anyway. You could probably bet some good acronyms out of that, but we wonít do it because weíre out of time.

I think the only thing I will say about this is this is a really simple example, but itís actually very cool. Let me tell you why itís cool because these are example Ė like the piecewise-linear minimization and Chebyshev centering, these are sort of Ė It should be giving you a hint that a stupid problem family like linear programming thatís been around since Fourier, was fully understood by Gauss Ė actually, I donít know that but it would appear to be a safe statement since everything I think was fully understood by Gauss but that whatís cool about it is this kind of very innocent looking boring thing that if you go to the OR department, youíll find out about diets and some other stuff Ė Whatís cool is it actually has lots of Ė and a lot of them are not Ė theyíre not obvious things, right? I mean itís not obvious that finding the biggest ball that fits inside a polyhedron is a linear program for the reasons I mentioned before. So thatís Ė weíll quit here. I should remind you that at 11:00 today, we have the section that shouldíve been, would have been yesterday.

[End of Audio]

Duration: 80 minutes