Instructor (Stephen Boyd):Right this far Ė weíre on? So we didnít get very far. We just started talking about the basic ideas of barrier methods. Now, but weíll really do it today, so weíll look at how it all works.
Okay, so the idea actually is very simple. If you can go down to the pad here. You start by rewriting the problem. You take the constraints, and you simply put them this way. Itís f0 plus Ė there we go. Plus, and then this. I minus is the indicator set of negative reels.
And so i is this function thatís zero, and then goes to plus 8 if youíre above zero. And this is, of course, this encodes the semantics of a hard constrain. It basically says you have no irritation if itís less or equal to zero, and if itís positive, youíre infinitely irritated, meaning, that that x is completely unacceptable.
So this Ė I mean, we can then say look itís just an equality constrained problem here, but if, of course, this is just rewriting things. And it doesnít really help anything, because this function, it is true the only constraint now is an equality constraint, but itís function is, of course, is highly non-differentiable and because weíve put this here. Thatís exactly the point.
Okay, so what weíre going to do is weíre basically going to replace this indicator function with an approximation of the indicator function thatís smooth. So and this is being many ways to talk Ė a lot of people talk about this. Itís something like this. When Iím making fun of people who propose things like this I usually say this is the same as getting out some sandpaper and saying you donít like this corner here so you sand off the corner or something like that.
So we talked about this at the end of last lecture. So a very common approximation is the log barrier. So the log barrier goes like this. You have the sum of the logs of the negatives f. Obviously, if f becomes zero or positive this goes to minus 8, and with a minus sign, it goes to plus 8. Itís a barrier. And you put a one over t in front. T is going to be a parameter that controls, you know, how close this thing approximates this indicator function.
And itís important to understand the tradeoff here. The tradeoff is this. The larger t is, the closer this function looks to that. Thatís clear. But thatís both good and bad. It means that by solving this problem you get Ė well, we hope it means. Weíll see soon if itís true that youíre getting a closer and closer approximation of this problem. It certainly means that. But on the other hand, the larger t is and the more this function sort of looks like this one, where itís just basically this indicator function with the corner sanded down. Thatís going to mean this is a harder problem solve using Newtonís method.
Okay? So thatís going to be the trade off here. And weíll see how that works around it. So thatís the idea Ė is that weíre going to solve this using Newtonís method. By the way, thereís nothing wrong with solving this with Newtonís method. This we know how to solve that. Any value of t, you have a feasible point. We can actually, a strictly feasible point. Newtonís method will work perfectly on this. Itís just no problem.
Okay. So this function here, which is the sum of the minus log of the minus constraint functions is called the logarithmic barrier function associated with the constraints. So thatís the log barrier. And this minus log minus f. Thatís convex function by one of these composition rules, because minus log minus something is an increasing function and so on. Increasing in convex. And then f is convex as well, of course, so thatís convex. Itís a log barrier, and you can get a rough idea of what it is. For example, another way to say it is this. Itís the log; itís minus the log of the product of the minus fiís.
So these are slacks, because thatís literally how much. Thatís how much extra you have in the inequality fi of f(x) < 0. So minus fi is the slack. And this is negative log, the product of the slacks. So thatíll come up a bunch of times.
Okay. For the record, weíll work out what the derivatives are. So the gradient is simply this Ė because thatís just a composition rule for the log. And the Hessian looks something like this. Itís got these rank one terms here, and then itís got these things with the scaled version of the Hessians of the individual functions, okay? So thatís the Ė these are the gradient Hessian. Weíll come back to this. Weíll actually need the gradient very soon.
All right. Now, when you minimize f0 + 1/t fi(x). Itís the same as minimizing t(f) 0 + fi(x). Itís the same thing. And when you minimize this, weíll actually refer to the minimum of this as x *, the argent of x *(t). So thatís going to be the minimizer of this function, subject to that. And weíll call that x(r)(t).
Weíre just going to assume here that thereís a unique minimum, and weíll call it x *(t). There Ė well, weíll get into that later. Actually the sublevel sets are bound to this unique minimum, but weíll just leave, weíll just say this is the minimzer is x * (t).
This is a function of t, of course, and it traces out a curve. A path. And thatís called the central path of this optimization problem. So you call that the central path, and the intuition is something like this. As t goes to infinity, you would sort of guess that x*(t) is going to go to an optimal point of the original optimization problem, because as x*(t) goes, as t goes to 8, the difference between this thing and this. I have to include the minus sign there. The difference between this and this basically goes away. Because as t goes, certainly element wise, point wise Ė well except if youíre at zero. But element wise itís fair to say that this thing converges to that as t goes to 8.
So you sort of guess that this is the case. And hereís an example for an LP. This is an LP and r2, of course, so itís kind of silly. But hereís youíve got some inequalities. I guess one, two, three, four, five, six inequalities like that. And then these dash curves show you the sublevel sets of the barrier function. Thatís this thing.
Like that. Thatís the sublevels of the barrier function. And you can see the one in the middle, thatís with t = 0. Thatís, you minimize the log barrier. By the way, thatís the same as maximizing the slacks. The products of the slacks, or the geometric mean of the slacks. So thatís actually called the analytic center of this polyhedron. And thatís right there. And then as you increase t now, what happens in youíre incentivized to Ė well, letís talk about what phi does. Phi is a barrier and what this does, it wants to keep you away from the boundaries. Because if you get near a boundary, one of the fiís is going to be negative, but near zero.
That means one of the minus fiís is going to be small. One of the log of that is going to be big and negative. And with the minus sign, itís going to be big and positive. So this barrier function, if you can visualize it goes something like this. Itís got its minimum there. And then I guess it curves smoothly up. Itís an analytic function. It curves smoothly up. And then when it hits these walls it goes to infinity. So you should visualize that here.
Now when you crank up t, what happens is Ė by the way, youíll never find a point that is outside the interior of the feasible set. Because this thing, the domain of this is the interior of the feasible set. So no matter how high t goes, youíll always have a point that is interior. And by the way, thatís why one of the names for these things is interior point methods.
Weíll see later actually that modern interior point methods donít work this way anymore, but still the name has stuck and thatís why they are called interior point methods. Because every point you produce is in the interior. That used to be the case, anyway.
Okay, so what happens is you crank up t, and youíre going to move more and more towards. If t is this c, youíre going to move. This is the curve, this is the central path, and you can see that as t goes to 8 youíre going to approach this optimal point. Okay? So thatís the picture. Thatís how this works. By the way, so far you should be deeply skeptical of all this, because well, for the reasons I talked about last time, but Iíll talk about them again.
Letís see, the other thing I should mention is this is a sort of a famous method. Thereís a general class of methods that are worked this way and theyíre called homotopy methods Ė like this. And I can sort of explain a little bit about how these work.
A homotopy method works like this. You have a problem that you want to solve but itís hard. And you donít know how to solve it or something like that. What you do is you make a parameter. You then introduce a parameterized family of problems. Parameterized by, letís say, a parameter t Ė letís say. And you arrange for the problem to be easily solved when t has some value like zero. In this case, itís just Ė youíve already written code to do that like two homeworkís ago or something like that. So itís simple to find this analytic center.
Then what happens as you crank the parameter Ė the problem youíre solving is going to transform into the one you want to solve. So in this case that happens, roughly speaking, as t goes to 8. So the homotopy method works like this. You have a parameter value. You solve the problem. Then you crank up the parameter a little bit and solve it again, starting from where you started before. Hopefully that works or whatever. If it doesnít work, maybe you were too aggressive in your t update or something like that and you back off. I mean, you can think of all sorts of methods for this.
By the way, these methods are very widely used. I mean, all over the place Ė so thatís what this is. Iíll come back to that. Okay.
Now letís actually do some. Letís see what happens on the central path. Well, if you minimize Ė f you solve this problem, then it means that the gradient of this plus a transpose nu is zero. Thatís the dual residual being zero, and of course Ax = b. So it means that the optimality condition for minimizing this barrier-augmented problem is this. You have to have the gradient t grad f0 +, and thatís the gradient of the barrier, plus a transposed w = 0 and Ax= b. So thatís the optimality condition.
Now if you stare at this long enough you will look at it and realize youíve stuff like this before. And in fact, what you saw before is this. We can take this thing and we can divide by t. Thatís the first thing weíll do. Divide by t like that. And then you stare at this long enough and you realize these are just like positive numbers. So Iím going to call those just lambda i. And then youíre going to look at this and say wait a minute, this is grad f0 plus sum lambda I grad f I plus a transposed something equals zero. And you realize like wow, look at this. If you define lambda i this way, this equality means that if you are on the central path you actually minimize the Langrangian with this value of lambda. This value here. Okay? So thatís the picture.
Okay, thatís great. If you minimize the Langrangian, then it means you actually have a point thatís dual feasible. That's what it means. And that means immediately youíre going to get a lower bound on your original optimization problem. So letís see what happens. Itís actually kind of interesting. It says if you minimize this, which, by the way, you can do very easily for any value of t. This is Newtonís methods to minimize this. But if you minimize this, then actually whether you like it or not, you will actually get a dual feasible point for the original problem. Thatís whatís going to happen.
So youíre going to minimize that, youíre going to get. When you finish youíre going to get this. Youíre going to define lambda i as minute 1/t, fi(x), like this. That will be lambda i. And youíll get a dual feasible point. And then of course you have a dual feasible point and you want to evaluate g. Thatís the dual function at that point and you work that out. So Iím going to plug this lambda into g.
Now g is nothing but this. Its f0. And I donít have it written here, but Iíll write it out. Its going t be f0(x)+= lambda i, fi(x). Well, and then its plus nu transposed Ax- b, but thatís going to be zero. So Iím just going to plug in the lambda i I found. This is going to be g of lambda. Itís going to be that. But the lambda i, this thing is - 1/t, fi(x), okay? This f cancels that. Thatís a sum over the constraints, so this is equal to f0(x) +. Here thatís all the same number, so itís actually Ė letís see, of the lambda i is that. So I get Ė actually, it looks like to me, - m/t period. Okay?
Now, thatís very interesting. It says that if x minimizes this thing, thatís something you can do by Newtonís method. Then, I also inadvertently calculate dual feasible point, and that gave me the following lower bound.
And that tells me the following, that the optimal value of this problem is clearly as the following property, certainly itís less than f Ė I have a feasible point x, so the optimal value is clearly less than that. But itís bigger than g of lambda, but g of lambda is this, f0(x) Ė m/t. Well, itís beautiful. You know what that says? It says that if you minimize this function here Ė which is a completely smooth function. So thatís Newtonís method here. If you minimize this, the solution will be no more than m/t suboptimal for the original problem, okay?
So now, you have the hard proof that, in fact, as t goes to 8 you actually compute an optimal point. In fact, itís much more. The truth is that as you solve this problem and vary t you can say much more. You can say the following. When you solve this problem, you will produce not just a primal feasible point, you will produce a dual feasible point. And youíll really produce a primal dual feasible pair. And the gap associated with that pair, the duality gap, the difference between the primal value and the dual value will be exactly m/t. So thatís what this means. You have a question?
Student:Should lambda be the minimum of the right insight instead of f0(x)?
Instructor (Stephen Boyd):Whatís that? You had the right hand? Should the?
Student:Should the lambda be the minimum of right insight over x, so that you?
Instructor (Stephen Boyd):Right, sorry, but the reason x, my x minimizes the Langrangian. You know how I know that? I know that because this is the optimality condition for minimizing the Langrangian for this choice of lambdas. So I do know that it minimizes. This is the, if I plug in my x*, it does give the emphemom over x of the Langrangian. I donít know if that made sense - it was a.
Student:Where did m come from?
Instructor (Stephen Boyd):M is the number of constraints. Right, so this is very Ė this is pretty cool. And by the way, thereís a method for solving the original. We already have our first method, by the way, for using Newtonís method to solve a constrained problem, and in fact, itís got a name. Weíll see why itís called that later. Itís called Ė so the name of this method is called Bount. Weíll see later, because thatís a takeoff on a name that weíre going to see. Sount, which is sequential, so Iíll tell you what this is. This is sequential, unconstrained minimization technique.
Oh, and I like the name. Iíll tell you why. Itís kind of a retro name, because it points out that everything weíre doing so far was actually known in the 60s. And so when you mention this name itís sort of a reminder because you can type this into Google. Well actually, if you type this into Google youíll probably get a lecture I wrote, but below that youíll find whole books on this in the 60s.
That irritates the people who imaged that all of this was done in the last 15 years so Ė and thatís why I like to use the term. So this is the unconstrained minimization technique, and it works like this. This is really dumb. Watch this. You want to solve this constrained problem. Minimize f0(x), subject to fi(x) < 0 and x = b. The Bount method says this. Pick t and solve the following problem. Iím assuming here Ė by the way, you have a strictly feasible starting point. Weíll talk about how to get one if you donít. It says minimize this. F0(x) + 1/t + times minus sum minus, whoops, log minus fi, okay? That is a smooth function. Newtonís method will minimize it Ė well I should say certainly in theory. I mean, so will the gradient method by the way, in theory, minimize it.
But the point is you can minimize this. When you minimize this you will have a point that is feasible here, and actually it will be strictly feasible. And you will Ė you will absolutely guarantee that you are no more than m/t suboptimal. So you might say done. Thatís it. Weíre just done. Pick your tolerance. You want one e minus six tolerance. Pick t, whatever it is. Ten to the sixth times m or something like that. End of story. In theory thatís actually correct, so Newtonís method just one shot does it.
Weíll actually see. Itís more of when you combine this with the homotopy idea that you actually start getting methods that are a, both practical and b, pass muster with our complexity theory friends, okay? So that describes the s.
By the way, you shouldnít make fun of the Bount, because thereís plenty of practical applications where itís entirely appropriate just to pick a value of t, minimize that, end of story.
Student:On the previous yellow sheet where you rote the dual function, I was wondering how you got Ax Ė b(0)?
Instructor (Stephen Boyd):Well because, look. Ax = b is part of the optimality condition.
Student:Right, but that is going to zero, right?
Instructor (Stephen Boyd):No, itís not. No, itís zero. No, itís zero. Yes, I know. I solved here. I minimized. Here it is, there we go. I solved that problem, and the constraint is Ax = b, so I assure you that Ax = b. Ax = b is not negotiable here. We have a point that satisfies Ax = b, so. Okay.
Everybody got this? So this is Ė itís also pretty simple. Weíre up to about 1967 here. Yes?
Student:Ax = b there also?
Instructor (Stephen Boyd):To where? Oh, over here?
Instructor (Stephen Boyd):Well - I should yes. You mean if I want it to be true?
Instructor (Stephen Boyd):Yeah, sure. Whatever, okay. Sure there. Sure. These are details. You know you donít have to do that in a 300 level class, right? You can look, itís an official rule. I mean, I can switch a minus sign. Thereís some number above which youíre not supposed to have, but itís expected. In fact, you wouldnít even want everything to be right here, would you? It would be insulting. This is a 300 level class where, you know?
So Ė but thanks for pointing it out. Okay, all right. So letís actually interpret this stuff via KKT condition because you can, right? You can actually interpret it lots of other ways. Hereís another one. And in fact, this one is also like quite cool. Itís this. Letís take Ė this is the point on the central path. This is xr(t) minimizes wherever it is. Iíve lost it. Here it is. It solves this problem here. Lambda * is the lambda defined by that. Let me point one thing out. When you fix t, you donít know lambda * until you calculated x(t). So you donít know what dual point youíre going to end up with until you do this.
And curiously, though, you do know exactly what duality gap youíre going to have. Youíre duality cap, your gap is going to be exactly m/t period. So thatís Ė in fact, one way to say it is when you solve this problem Ė which appears to be a primal problem whether you like it or not, youíre actually going to calculate a primal, dual pair for the original inequality constrained problem with a duality gap exactly equal to m/t. Thatís what youíre going to do.
In fact, you can even think of t as a parameter, just, well, m/t is clearly simply the duality gap. So you can literally dial in whatever duality gap you want be done.
Okay, and that would be the Bount method. Okay. So letís look again at what it means to be central. Oh, I should say if you solve this problem you would refer to a point like that as central, or on the central path, or centered would be another one. These are the names you would use for that.
Okay. Well, hereís what these points satisfy. Well, in fact, this one I can make stronger. Youíre actually strictly feasible obviously because, in fact, the objective that defines x *(t) has log minus fi. So this is strictly par.
And in fact, these are strictly positive. Okay, because those are 1/t time s - fi, which is strictly, so these are positive. Okay?
Now hereís the cool part. The optimality condition. Well, this is our definition of what lambda i is. We actually get - lambda i fi(x). This is if youíre on the central path is equal to 1/ t. Now whatís cool is the following. So we write down these conditions, and the optimality conditions for the original problem inequality constrained are really, really simple. And I will draw them right now Ė ready? Itís just that.
Now thatís true complimentary slackness is this, right? It says that lambda i, fi is zero. So if, in fact, you have x and lambda that satisfy. Well, now I do have to change this back like that, okay? These one through four with this zero here. That is the exact KKT conditions for the original inequality constrain problem. Okay?
If youíre on the central path you satisfy all four of these, except three is modified form exact complimentary slackness to approximate, and this is just 1/t. So thatís another way to say it. To say that when you have a point on the central path, youíve actually found a set of points that satisfy three of the four KKT conditions and the one where they come up short is complimentary slackness. And in that one, instead on lambda i, fi being like minus lambda i, fi being small. I mean, zero, sorry, theyíre all equal to 1/t. Theyíre all equal and theyíre just 1/t.
So thatís another interpretation. This is actually the most common interpretation these days of these methods. So if you look at books and things on these methods theyíll just start from KKT.
Okay, now you can also think of this in terms of thereís a nice force field interpretation Ė and thatís this. Forget the equality constraints to make it easy. What we do is we have a force field here Ė well, we have a potential. So if you think of this as a potential, and Iím going to think of each of these with the minus sign as the potential contributed by each constraint. So each constraint right now has a barrier function, but Iíll think of that as a potential, which means a force field and the gradient of the force field is going to be - Iím sorry, the potential is obviously the force. So thatís the way I think of it.
Now, what happens is this. The forces balance at Ė I mean, when you minimize the potential the forces balance. And that means here that the force contributed by our objective here. So this is a force contributed by this is the potential. This is the thing you want to minimize, is actually balanced by the forces of the contributed by the barriers.
Now, the barrier forces have actually a beautiful interpretation. I mean, especially when you have linear constraints. Thatís the easiest thing, so Iím going to do that. So let me describe how it works there. So here, for linear constraints Ė all right, that. Hereís the way it works. My force field, my force field on top of this is simply in the direction c.
And by the way, you can make this gravity. So this can be g times whatever down, okay? So thatís what this is. So itís simply a constant force field in the direction minus c. Thatís what this force field is, thatís the objective. So hereís our point were c Ė c is apparently this direction, so the force contributed by the objective is simply a constant force in this direction. Itís pushing you towards values of low c transposed s, which is obviously what you want to do.
Now, the force field associated with the each log barrier term. You take the derivative of the log and you get one over something, and it turns out itís exactly this. Itís an inverse distance, and so 1/r force field.
So this gives you a beautiful way to actually picture this, understand exactly how this works. Hereís what you do. Take each constraint and spray it with some repellant, okay? And the repellant has the following property. Wherever you are here, this plane will put a force on you. It will point away from the plane. Thatís the direction it will point in, and its magnitude will be actually exactly one over the distance to that plane. Everybody got that?
So by the way, there are no such repellants or whatever. I tried thinking of like some case and some number of dimensions where you could spray positive charge. No Ė there are none. But anyway, imagine there were such a thing.
So you spray the Ė and this means, for example, if youíre here. Letís actually work out what the force field is there. You feel a very strong force from this plane, and you feel a very strong one from this one. Youíre about equidistant Ė oh by the way, what kind of force do you feel from these three? Much less. So basically the force that you feel at this point is Ė in fact, someone tell me which way to draw it. Like, this way? This way? This way? Well, itís like that, right? So basically it would be forced like that.
And by the way Ė thatís actually good. Thatís where the barrier force is not fighting the objective. Itís actually pointing you away, because basically youíre at the completely the wrong point, okay?
So and you can actually imagine all sorts of things. Like what if you get right up to one of these constraints? Which way is the force pointing? Well, itís basically pointing you away from the constraint, okay? All right.
Now, imagine Ė now, put no force on it. So in fact, what weíll do is weíll just make this in a plane. So this is actually this sitting level, and what happens now is the particle will settle to the analytic center. Thatís where all these forces balance. It will be like Ė you know, it will be a point there or there. I donít know. Somewhere in the center, right?
Now actually all I have to do is this. Thatís c. Let me see if I can get it right. Yeah Ė okay, so hereís what youíre going to do. Take this page and start tilting it, because thatís exactly what youíre doing, right? Because youíre putting more and more gravity force on it, and when you tilt it all the way, you have to be able to tilt it like, I guess. Gravity has to be way, way high here.
But as you rotate this thing, what happens is there is a gravitational pull, pulling this in this direction minus c. And it will move over a bit for each angle it will hang off down a little bit until the repulsive forces from these three counterbalance the gravity force. Everybody see this?
Now as I crank the gravity force up super high youíre going to settle over to a point here. You will Ė of course, you wonít be touching these, but youíll be very, very near, and youíll be pinned right up against the active constraints. Near, but not actually touching the active constraints. And there, there repulsive forces will balance the attractive force generated by the objective, so.
This, it goes without saying, none of this actually matters. This is just for you to have a feeling for how this works, okay? Actually, let me just ask you a quick question to make you do get it. Let me ask you, suppose there was, how many constraints I did here. One, two, three, four, five. All right, you know what? Iím going to put some more constraints, ready? Like this, ready? Tons of constraints like that. Like, letís say ten of them. Okay?
Now let me point something out. The constraints I just added, they do not change the feasible set. So whatís the optimal point? So of course, itís the same. Itís right here. What does the central path look like now? Does it move? We havenít changed the feasible set. Has the central path moved? Where is it? Okay, what happens is these constraints, though they have no effect, they do not in any way change the feasible set. In any way. They are felt very strongly by this particle, because 1/r is a very slowly decaying force field, and what happens is the central path, the central point is probably. And this thing probably sneaks up on the optimal point that way.
Why? Because itís feeling a repulsive force from all these things, even though theyíre outside the feasible set. Everybody see this? I mean, this doesnít matter, but itís sort of just to give you a feeling for this.
Student:So as you tilt the plane, what is telling you to tilt it in the direction that actually follows the part?
Instructor (Stephen Boyd):I want to put a gravitational force in the friction minus. Iím going to put a force in the direction minus c.
Student:So c is going to determine how that?
Instructor (Stephen Boyd):C will tell you how to tilt the table, yeah. So you tilt it along c. Okay? So thatís. Okay.
Now weíre ready to tell you about the barrier methods Ė so the barrier method is this. By the way, the other name for this Ė thereís lots of names for this. Sount, I already told you one. Definitely you should try to use this, because itís irritating to people who work on these things. And thatís always good, so you should try to use this term. And you should also drop things like saying, oh wasnít all that known in 1969? And Iíll tell you a story about this shortly.
Okay, so all right. So here it is. Itís the barrier method, otherwise known as the Sount method. Itís got other names too. It works like this Ė ready? Now, you start with a strictly feasible x. Weíll talk about how to get one later. And some additional value of t, which is positive and some parameter mu, and some tolerance. And hereís how it works.
You simply do centering steps so you can compute x*(t) by minimizing this. And this could be using Newtonís method. I mean, it doesnít really matter, but if you want a complexity method, and if you want this method to work, well, youíre going to have to use something like Newtonís method here, okay?
So you do this and you will start from the previous value. The previous x* found. Then you update x. Now, when youíve x r(t). Of course youíre duality gap is exactly m/t. If m/t is less than your tolerance you quit Ė and by the way, you would return. When you quit, you could actually return actually both x *(t) and lambda * of t. In effect, itís considered polite if you write a solver in convex optimization Ė at least, itís considered polite to not just return a nearly primal optimal point, but to also return a dual, nearly dual optimal point with a duality gap thatís less than some number, right?
Because that way, in fact, itís anyone can check. They donít have to trust you because Ė in fact if youíve asked for a tolerance of epsilon, they return a primal dual point. None of your business how they calculated x lambda nu. Totally irrelevant. You just check the duality gap. And so theyíre actually returning essentially the suboptimal point and the certificate proving its epsilon suboptimal, okay? So you could do that here.
And then you increase t. And you would multiply t by mu. Now, let me just say a little bit about this. So the stopping criterion of course is completely noneuristic here, right? Because, in fact, it returns not just a suboptimal point with some vague idea that it might be optimal if some unknown parameter was small enough for something like that, which is how most of these method works. Instead, it returns not just the a suboptimal point, but a certificate proving certifying its epsilon sub-optimality.
And here you should imagine a tradeoff in the choice of mu. So the tradeoff would be something like this. Mu is how much you increase that parameter t in each step. You should imagine that things like 1.05. If you have 10.05, hereís what should happen. Letís imagine you use Newtonís method. Because important to get the tuition money. If you use Newtonís method, if mu was 1.05 it should take maybe only two Newton steps to recalculate the new center, because basically youíre at some point Ė well, let me draw a picture here. Basically hereís what happens in Ė hereís your central path. Youíre right here. Youíve calculated that point. You crank up t by 5 percent, and you want to calculate this, but youíve just minimized the problem that was very, very similar. The truth is, where youíre starting is nearly optimal anyway. And so basically in two steps of Newtonís method you should be right back here, okay?
Now the problem is you only made 5 percent reduction in duality gap, because the duality gap is exactly m/t. If t went up by 1.05, your duality gap went down by 1.05. Of course, that means if you do it, I donít know, a hundred times it will work pretty well, or something like that.
So these are little mousy steps, and this is what youíd imagine. This is what a normal homotopy method is. A normal homotopy method you should imagine little mouse steps, because homotopy method says I have a parameter, I can solve this problem. I canít solve that problem, and what you do is you crank your parameter just a little tiny bit and you make each step. Not too hard. And you hope, you pray to the god of Newton, whoever controls Newton, whichever godís controlled Newtonís method Ė things like that. [Inaudible] conversion, a couple of steps. Thatís a standard homotopy method.
However, it turns out here. Itís going to turn out that the actual updates are very aggressive that can be used Ė oh by the way, this model of little mouse steps is exactly what the complexity theorists will tell you to do, so if you want. Later weíre actually going to bound the total number of Newtonís steps required and, in fact, the complexity theory will require us to take mouse steps there on the order of something like, you know, mu is going to be something like one plus and then one over d square root of the problem size, okay? Thatís what our complexity. And that choice of mu will optimize the overall bound on the number of Newton steps, okay?
In practice it turns out the values of mu can be very aggressive. They can be things like two, five, ten, fifty, depending on the problem. And these things will work very, very well. Shocking. Thatís really no homotopy step. These are just huge aggressive steps. So if you crank it up by ten, it means that youíre going from this point. Youíre next point is going to be here. And youíre next point will be even closer
So what happens then it might take you a bunch of Newton steps, so youíre going to go way Ė youíre going to have a whole bunch of Newton fun and youíre going to land up there. Okay? So thatís Ė this is not, you know, a normal homotopy, right, where normal homotopy is you just.
Okay, where Iíll give you an example of a homotopy. How many people here use Spice? Well, so thatís great. That just, for the record that was like no one here. Anyway, no itís a couple people.
Okay, to those people, so you know in Spice one of the things you have to do, you have a non-linear circuit and you want to calculate the bias, the equilibrium point Ė the bias thing. We have a set of horrible non-linear equations. I mean nobody, you know Ė you canít solve it. And you know already you can already have big trouble if you have, like, i-stable things like if itís got memory or stuff like that. Its multiple equilibrium points. But you want to calculate an equilibrium point. Hereís the way you do it. So addressed to those people. Letís take an amplifier or something like that, and you want to know what the homotopy parameter is?
This is fantastic. Itís totally cool. Anyone guess? Actually, I know a bunch Ė a lot of you people have been tortured with it, because a bunch of you people from EE. Now donít hide, donít think you can hide. I know a lot of you are in EE. Youíre in stat, youíre safe. That group over there, but the rest of you thereís no way. You know youíve seen this stuff.
All right, so the homotopy parameter is the supply voltage. Because take an amplifier and letís hit the supply voltage to zero. Anyone want to tell me what the voltage and currents in the circuit are? You know, thank you. That was easy. Okay.
And then I take the supply voltage, Iím ramping it up to Ė whatís your favorite supply voltage?
Instructor (Stephen Boyd):Fifteen volts Ė see that? No, no, no, no. I know what heís talking about. That dates him. I mean, youíre way too young for that. Thatís crazy. Thatís like I might say that, plus or minus 15 volts. I can say that, but why can you say that? You canít say that?
Instructor (Stephen Boyd):Oh, Cendren Smith, because you read old stuff. Okay, fine. All right, no. Your choices are five Ė depends on your age, okay? Or I guess the age of people whose stuff you read. Okay, so it could be five, 3.3, two, down. Now, if youíre really cool you could say like 1.1 or .8 if youíre super cool now, right? See, I know people knew and just holding out on. All right. It doesnít matter. So what happens is this. We take a big circuit. We set the supply voltage to zero. Everyone agrees.
All voltages and currents are zero, right? Okay, now you set the supply voltage to 100 millivolts. All right. What happens if you take like a big, horrible digital logic circuit and sent the supply volts to 100 millivolts? By the way, how ell does it work as a logic circuit? It doesnít work at all. Okay? However, can I calculate the Ė can I calculate the bias point? You know this. You know this, right? Come on, you held back on me on the satellite stuff, even through youíre from Harben and should know this, right?
Youíve had this circuit stuff? A little bit, okay, all right. So take a logic circuit. DDDís 100 millivolts. What does the circuit look like? Oh very, very sub-threshold, right? It looks like a resistor circuit. So we can solve it in one Newton step or two, right?
No problem. See, you get thee things because the transistors, theyíre not even on. I mean, itís ridiculous. Theyíre not even transistors yet. Theyíre resistors, okay?
Then you go to 200 millivolts, but you start from the one you just found and then you go to 300, 400, whatever and so on. And if you go Ė by the way, from one volt to 1.3 volts and it fails to converge, then you go, oh. Iíll try 1.15. Everybody got the idea? Anyway, so thatís a Ė there you go. Thatís a homotopy method. Thatís how Spice works, okay?
Student:Is there any downside to using a large mu, like an aggressive [inaudible] or something?
Instructor (Stephen Boyd):Here? Student;
Instructor (Stephen Boyd):Yeah. Iíll tell you what the tradeoff is. In a circuit problem youíre trying to find the equilibrium. The downside is real simple. If you too aggressively update VDD, Newtonís method, which is what is used to calculate the equilibrium position will fail. Itíll fail. Itíll just fail to converge, okay? So thatís the tradeoff there. For us, we cannot fail because we solve a convex problem with every step. We cannot fail.
You can crank mu up, t up to the ten to the nine, and at least the theory says we canít fail because we can minimize any smooth convex function with equality constraints, right? So we canít fail, but what can happen if you update t super aggressively. Remember our theory, we can tot out all our theorems and talk about numbers of steps and blah, blah, blah. Whatíll happen is the number of steps will be just absolutely huge. Actually both in theory and in practice. So thatís the problem.
Student:We can [inaudible], so.
Instructor (Stephen Boyd):We as a Ė yeah, Mantle doesnít do anything. Letís remember that. Okay. They do nothing. Remember, LA Pack is doing the work. Donít forget that, okay. Yes.
Student:The fact that the Newton method rolls over at some point in the transition?
Instructor (Stephen Boyd): Yeah, exactly.
Student:Do we use that to choose mu?
Instructor (Stephen Boyd):Thatís very interesting. There are methods. Thereís whole methods that actually do exactly this barrier method. And they choose mu so small that the can prove that when you update t by that amount youíre still in the region of quadratic convergence. So thatís called a short step method. There are lots of them, and thatís a lot of the proofs go that way. And basically it says you can prove that by cranking up t by the small amount, if you were in the region of quadratic, if you were in the region of quadratic convergence before, you will remain in it for the new value of t and so on. So thatís one way.
No, I can tell you how we pick the update. Well, I can tell you the reasonable way to pick it. Actually how you will pick it. Ready? Mu equals two. There. Here, Iíll pick another one. Mu equals ten. How about mu equals 30. These are sort of the numbers that work. There are much more complicated ways if you get into ultra high-end ones. By the way, they donít do a whole lot better, honestly, than these simple things. Thatís the embarrassing part. So this whole list Ė you can get 50 papers written on how to update this parameter, right?
I mean, some of these are quite fancy and effective, at least for some problems. So this empirically Ė this appears to be the right, you know, anywhere between two and 50 seems to be the right number.
Okay. And thereís lots of eristics for the t(0). By the way, this is something thatís discussed in the book. You should actually read about that. T(0) actually implicitly is the original duality, your original duality gap. So thatís what t(0). M/t(0) is your original duality gap. So if you had some reason to believe that your current point was suboptimal by five, ten, it says a reasonable value of t(0) is on the order of one over five m or something like that. That that would be a reasonable one. That would give you the same duality gap and youíd just center. Okay.
So letís do the convergence analysis. To show that it works is actually kind of silly. I mean, we know that, but we can say a lot of things. Now, the number of outer steps. By the way, the outer steps here are called Ė these are called centering steps, or outer iterations, or I donít know. Whatever it is, right?
So by the way, it should be clear that you donít need that this is, thereís lots of stuff that you could save here. For example, when you do a centering step, you donít need to, like, polish off. You donít have to compute x*(t) to 16 significant figures. Thatís totally clear because, in fact, this whole thing is just a method to guide you toward a solution, right? So you donít have to do full centering.
But, I mean, with Newtonís method itís cheap. Itís a couple of Newtonís steps, and the difference between getting a okay solution and an unbelievably accurate solution, an amazingly accurate solution is two or three steps in Newtonís methods so you know, thereís no reason to worry about it too much, but okay.
So thereís a lot of stuff here you can imagine is not, can be shaped up. But for an algorithm thatís very short to write itís amazing how well it works, okay. So how does this work? Well, the number of steps is Ė I mean, you can say the exact number. Itís very simple, itís just log of Ė itís log of m over epsilon zero. Every time the duality gap, after ht first step will be exactly m/t(0).
Then each time you do an outer step that shrinks by a factor of mu. So you can immediately just work out the exact number. Itís this. Thatís the number of steps itís going to take, and thatís ceiling of these things.
Okay? So thatís the number of outer steps period. Notice this has nothing to do with anything but t(0) and epsilon and m. Nothing else, okay. Now, the centering problem. Oh, and letís look at how this varies with mu. If mu is big, this number is kind of small. If mu is like a little mouse step, like 1.01, this number is big. Right? I mean, mu is 1.01, log mu is .01, so basically thatís 100 in the numerator here. Right? Thatís basically what this is. So whatís that?
Student:Mu is [inaudible].
Instructor (Stephen Boyd):No. Log 1.01 is .01, kind of. Very close, right, so. Am I wrong? No. Okay, all right, so okay. Okay, that correctly reflects the variation in the number of outer steps is monotone decreasing in mu. So mu is your update aggression parameter, aggressiveness parameter. And that says that the more aggressive you are, the more duality gap reduction you get per step, so you take fewer outer steps.
Now the problem is, the more aggressive you are in your update, the harder the problems youíre solving by Newtonís method. So thatís now Ė you can do this intuitively and just say therefore what we should do for our problem class is just try this out, adjust various, you know, solve giant families of problems.
Oh, by the way, Newtonís method cannot fail. So we have a proof that Newtonís method will converge and so on and so forth. And if you use the classical analysis, it would involve all sorts of constants that you donít know. But it doesnít matter. You would at least have, at that point, a conceptual proof of convergence, right? Which would be to say this method canít fail for something.
So thatís fine. But, if you want to get Ė I mean, the modern results come by using self-concordance. So if f(0) and the log barrier are self-concordant, then tf(0) is self-concordant for t bigger than one. And tf(0) plus phi is self-concordant for t bigger than one.
And weíll get to that a bit later, but letís just look at some examples and see how this works. This is the dual of the problem youíre solving on your homework, but itís an LP, with, you know, 100 inequalities and 50 variable. And this shows you the number of Newton steps. By the way, the computational effort is simply measured. The unit is the number of Newton steps period, so thatís the cost is Newton steps here.
So the number of Newton steps here is plotted here, and this gives you your duality gap. Now Ė and so itís plotted with one of these staircase things, and the staircase thing. It says that the width of a stair is exactly the number of Newton steps required to carry out that centering.
The height of a stair is exactly the reduction in duality gap obtained with you finish re-centering. Now, in this method, the duality gap simply goes down by a factor mu, so since thatís a log scale, the height of the steps for each of these plots is identical. And in fact itís exactly equal to log mu or something like that, or whatever it is. Itís something. Itís a factor of mu down.
So hereís a little mouse steps here. Theyíre not even that mousy. If youíre doubling, but you can see everything. You can guess all sorts of things here. You asked about the Newton quadratic convergence and stuff like that. You can sort of see down here. Youíre taking like one or two, probably one or two steps. I donít. Maybe these are two steps each, right? So basically you double mu, you down two Newtonís steps, double mu, double t, sorry. Two Newton steps, double t. Do two Newton steps. Thatís this. Now, youíre making pretty good progress down here, and you can see down here is what you get.
The weird part is here. This is mu equals 50. For mu equals 50, you see whatís happening is the width of the treads are much wider, because youíre taking more Newton steps to actually re-center.
But, of course, then the measure of progress actually is duality gap reduction per Newton step. Thatís really what you want, because when you start with a duality gap of 100 and you want to reduce your duality gap, which is your ignorance by a factor ten to the eight, then somehow you have to make progress. You have to reduce your ignorance by a factor of ten to the eight, and you have to do that in some kind of Newton steps or whatever.
So and you can see here a mu was 50. Itís actually down. By the way, that number is actually like 30 something steps. That actually probably takes a little. This is probably a good time to stop before we get into all the horrible details of mu and t and self-concordance and all that. Itís actually time to just take a moment and realize what this says.
By the way, you see this picture? It looks the same if there is a million variables. It looks the same if itís maximum entropy problem. It looks the same if this is a problem from finance. Signal processing, image processing, machine learning. They all look like that. They look just like this. Essentially no difference, okay? And itís completely independent the size of the problem. It just looks like this, okay?
Now this is really quite ridiculous because if mu equals 50, it means that with a total number on the order of 35 Newton steps youíve actually solved an LP to extremely high accuracy using Newton steps. Actually before Ė letís step back and think about what that means. First of all, itís quite ridiculous. The feasible set is a polyhedron defined by 100 inequalities and 50 variables. How many vertices does such a polyhedron have? How many?
Student: Two to the fifty.
Instructor (Stephen Boyd):Two the fifty. Okay, sure, yeah, sure. Thatís fine. The answer is Ė by that you meant a lot, then Iíll accept your answer. And thatís a good answer. Two to the fifty. Okay. The answer is thereís a lot of vertices on such a polyhedron. This is a baby, baby, baby problem. There are a lot of vertices on that polyhedron, right? Okay.
One of them generically is our optimal solution, okay? So what is insane is this method has done the following. Thirty-five times you stopped in the interior of the polyhedron, asked for directions and were told to go in some direction. You then tried a step of one. If you didnít like that you stepped off to a half and then to a quarter.
So somehow after 35 times asking directions you went from somewhere either middle of this polyhedron in our fifty to a point, which is essentially optimal. Everybody see what Iím saying? So the whole thing is highly implausible. Thirty-five steps, thatís smaller than the dimension of the problem if you think about it. I mean, thatís 50, thereís 50 orthodial directions you can go into and you ask for directions 35 times. Since this big figure looks the same when thereís a million variables, then it gets just totally beyond anything you can possibly imagine.
And it looks the same, which is ridiculous. It means basically to solve a convex problem with a million variables like 30-odd times you stop and ask for directions. And after 30 such steps you have an extremely accurate solution. Itís really quite ridiculous when you about it from that point of view, so.
Student:How would this compare to using [inaudible] method?
Instructor (Stephen Boyd):Simplex method would also be a very, very fast on this. So simplex method is Ė I mean, itís nothing weíre going to talk about, but simplex method is when you actually go along the outside of the polyhedron and you take an edge greenly. That actually works also stunningly well, so.
Student:So for each [inaudible] mu of, or mu squared? Instruction:
Oh, the cost per Newton step? Cost per Newton step depends on the problem, right? So in this case whatís the cost per Newton step for this one? You have 50 variables, 100 inequalities so you have to. Yeah, itís something like n cubed. But in fact what youíll actually find, believe it or not, is forming the Hessian. Itís more expensive. So it would be 100 times 50 squared.
Yeah, so thatís what it would be. So the cost of a Newton step is 100 times 50 squared. I donít want Ė does someone want to calculate that? I guess Ė is Jung here or someone? Does someone want to calculate what the cost is? How fast can you do? Letís figure out how long this actually would have gone down. Letís say written in c solving LA pack directly.
So itís like 50, itís 100 times 50 squared. Letís just say thatís what the complexity is. Should we just round it up and make it like 100 cubed? Sure, letís make it 100 cubed. Thatís four times mu d. Thatís okay. Letís make it 100 cubed. Weíll round up, so 100 cubed. How long does it tae to solve an equation with 100 linear equation, 100 variables. I keep coming back to this, and Iím going to keep coming back until people start answering the right way.
Instructor (Stephen Boyd):Thank you. Millisecond. Okay. Sub-millisecond, okay? So how long did it take to solve this LP? Thirty milliseconds, okay? So by the way, itís not a lot of people who appreciate this fact because, well, why not? Iíll go off on a weird tangent and why not?
Okay, so basically, you know, linear program has been around since 1948 or whatever, you know, and itís usually associated. People imagine Ė I mean, if you close you eyes you picture like a Cray computer and some guys wearing, like, white lab coats Ė and you know, its machine that costs a million. Itís an expensive thing, and you imagine you would use it for, I donít know, scheduling the airlines tomorrow, pricing a derivative, designing a nuclear weapon. You know, you think of these big iron Ė you know? You know what Iím talking about?
Thatís kind of what you think. Thatís how a lot people think if linear programming, especially like if youíre an older professor or something like that, and thatís when you learn about all this stuff, right? So thatís your picture of what solving an LP is, right?
And itís cool. You know, thatís how United schedules their airlines tomorrow. In fact, thatís how they do it. I mean, roughly, but okay. And, in fact, yes, thatís how you optimize portfolios and things like that.
But I think what happened since then is, well, letís say Mooreís law computers, I donít know, ten to the eighth times faster or something, and thereís also been algorhythmic improvements. So the idea that you can solve an LP of this size in milliseconds I think really hasnít hit people yet. People donít know it. People who do control donít know it, for example. They sure donít know it. They still think you do, like, you can carry out four floating-point operations like per control calculation. They donít know it.
And people do this stuff like over in our own department here, they donít think about doing things like this. So this is important for embedded applications. So all right, enough of that. Weíll move on. That was just a big aside. Okay, yeah.
Student:So what caused the mu to 150 in case it takes longer. Was that just because of all the haphazard?
Instructor (Stephen Boyd):You got it. No, no, no, no. The tradeoff Ė it makes perfect sense. If mu is too small, whatís happening is this. Youíre making not enough progress each centering step. The centering steps are easy, but youíre not decreasing your gap fast enough.
When mu is too large, youíve bitten off more than Newton can chew. And so whatís happening now is when you finish the centering step youíve made a lot of reduction in gap, but whatís happened is youíre taking a whole lot of Newton steps.
Now the cool part is that this parameter. These two trade off each other, and you get this huge flat thing like this. Itís not common to be something like this here. This flat Ė but the point is, thereís a big choice of Ė this is not a parameter that needs to be tuned, letís put it that way. Itís usually set and then forgotten. Another question?
Student:[Inaudible] to changing it over the problems?
Instructor (Stephen Boyd):Oh yeah, absolutely. Yeah, I know, you keep coming back. No, the truth is Iím showing and you will, in fact, code before next Tuesday and amateur version of this. It will be a very small number of lines.
No, fancy ones donít update by fixed parameter mu. Youíre exactly right. They actually have a very complicated way that, by the way, for LPs and QPs worked amazingly well. Itís called like a marrow trip predictor corrector thing, but the shock is that something this stupid works like really quite decently.
So yes, sophisticated methods will actually adjust, will crank up t by some adaptive scheme. So yes, thatís correct. There was another question. Do you have one?
Student:I was just going to ask [inaudible].
Instructor (Stephen Boyd):About what?
Student:About a pick? Like a small processor.
Instructor (Stephen Boyd):On a pick. I donít know Ė but, I mean, it wouldnít. I think it would be pretty. I donít see why we couldnít do it. I mean all you have to do is multiply the entire thing is that you have to multiply a matrix which is 100 by 50 by its transpose.
Instructor (Stephen Boyd):Or all you really have to do is a QR on it, thatís all, so yeah. You could put this on a , you know, arm or a pick or I donít know. I guess. I donít know that people have done that. That would be cool. Someone could do that next quarter as a project.
But some interior point thing on an arm processor or something weird like that for no reason, just because it would be cool. I donít know, okay. All right. So hereís a geometric program, a GP. So it looks the same. Same story. And itís just the same. And then hereís sort of what happens if you have a family of standard LPs, and then whatís happening here is youíre changing the size. So this -ism, what is this? Oh hey, this is your homework problem. What do you know?
Oh thatís interesting, because you know what? Since all the code Ė it means Google will find the solution to your homework, but it doesnít matter. We all know that anyway.
Anyway, isnít this what youíre doing? I think so, so yeah Ė fine, so. If you did it right hereís what will happen. Youíll make something that looks like this, and then this is the most amazing thing that happens. Itís this. Hereís the number of, what is m? M is the Ė what is m? Oh, a is a, r, m, by two m. So thereís two, this is the number of constraints, the number of variables is twice this number. Okay, so thatís 20 variables, ten constraints. One hundred variables, two hundred constraints.
Did I say that right? Yeah, I reversed those. Anyway, you get the idea. Thatís 2,000 variables, 1,000 constraints, and here you generate a whole bunch of random problems and then solve them. And this is the number of steps required to reduce the duality gap probably by ten to the minus six or something, or reduce it by a factor of ten to the eight.
When you look at this, you see something amazing, and that is that itís basically itís not growing. It just doesnít grow so and I can tell you a little bit. Iíll [inaudible] story about this. I guess when these methods were first looked at, weíll see very soon. I mean, next lecture well see that the theoretical computational complexity says that the total number of Newton steps is less than the square root of m, where thatís the number of constraints.
Okay? And by the way, with a number out front, thatís just like a number. Itís not one of these mysterious, useless convergence proofs that has, like, Lipchitz constant and all sorts of crap that you donít know. Itís just a number, so itís going to be some number in front time square root m. Thatís the number of steps it takes, period, for these things.
Now, if youíre a complexity theorists that youíd celebrate this because youíd say well, there you go. You see, it grows like the square root the problem size. Each step takes, you know, in the simplest case, you know, n cubed or something. And so youíd have an order n to the 3.5 , and youíd have a polynomial time party or something. A big celebration or something, right? And walk away.
Now what happened with that? And by the way, thatís where the theorists Ė thatís where they are right now. The complexity theorists will tell you itís the number of iterations grows like the square root, but after people see this on and on, and over and over again, actually somebody finally, timidly said no, it clearly grows less than square root m. And somebody finally said, I donít know, the fourth root. And then somebody said the log and finally. And whatís become completely clear is that it grows like this, o of one.
Okay, thatís become as an empirical fact that would appear to be the case. And in fact, to be much more specific, it appears to be between 20 and 80 steps for all problems. So let that sink in for a second because itís quite ridiculous, and itís actually the essence Ė itís why these interior point methods. And weíll look into them, and weíll look into centering and duality gap and complexity and all that, but I donít want the main message to be lost in all of that.
The main message is this. They take between 20 and 80 steps, the problem in finance, machine learning Ė doesnít make any difference. The size doesnít make any difference. Itís about between 20 and 80 steps, and I think I may have told you this earlier. Maybe on the first day I told you this, but now you know more about it. We had a talk last year by a guy named Jack Ganzio from Kenbrook, and he came and solved some giant finance problem with one billion variables, okay? It was dense, by the way. No, seriously. It was some big tree. It had some structure, but this was not a small problem, let me say that. It wasnít super sparse or anything like that. It had a billion variables.
So he solved it and we were like, how do you do that? He had some room where they had a blue jean Ė you know, they have the whole room full of like thousands of like processors and all this kind of stuff. And I said what algorithm do you use? And he goes, the same one everyone uses. And by the way, not quite this. He used an adaptive steps size, but itís not a whole lot different. Itís basically this.
Basically what youíre going to write. Its a billion variable problem, and I said well, how did you pick the parameter value? He said, just the same everyone uses. You know, it doesnít matter anyway.
And I said how many steps did it take? He said 21. So basically, thatís a single point. Oh no, I think I mentioned this earlier. It took 21 iterations. Now, by the way, each iteration took like several hours and the lights in Edinburgh dimmed when each iteration was carried out.
Well, I mean, itís solving a billion variables with, you know, a dense system blah, blah, blah. But the point is, in some giant room full of computers, right, with a big, you know, 12 kilovolt three-phase thing going in for awhile Ė but the point was this thing, at least we have one point layout here at ten to the nine provided by our friend, and Ė oh, well, this is using our unsophisticated methods. The sophisticated methods are down here at the 20 level, although that hardly seems to mater, okay? So question?
Student:This technique, like heís running in a roomful of computers, did this scale in [inaudible].
Instructor (Stephen Boyd):No, his was not a fully dense. His had structure Ė in fact, it was an event tree. He was doing stochastic programming for some finance problem or something like that. Expanded the event tree out by like 15 steps or something. Thatíll give you a billion variables.
So the question is how parallel. Letís talk about if you profile one of these what Ė in fact, if you run an interior point method. In fact, the one youíre writing right now. What actually will it be doing? I mean, yours will actually be spending a lot more time doing interpreted overhead than actually doing any actual computing. But if you scaled yours up big enough, you do 20 steps, what are you actually doing each step? What is it?
Instructor (Stephen Boyd):Youíre solving linear systems, youíre solving a Newton. Youíre computing a Newton step. Line search and stuff like that. It doesnít cost anything so, in fact, the correct thing to say is if you want to solve a convex optimization problem youíre going to solve 20 liner equations. All right, 80, whatever, you know. Something between 20 and 80 liner equations, period. Thatís all youíre doing.
And you can use any method. I mean, weíve just looked at the most simple thing, right? Where you use a Chilesky or something. I mean, if you see something, if thereís some structure right under our nose like banded or something like that, of course you should exploit it, but then thereís this whole world of solving like huge linear equations that we havenít even touched on, but exist and can be applied, so.
Okay anymore questions about that? That was just like a weird, I donít know, detour. So okay. Letís talk about feasibility in phase one methods. So in fact, youíll have to do this. So the feasibly problem is you want to find an x for which fi(x) is less than zero. Well, I tell you the truth, is you really want to find something like that. If you want to initialize it in a simple interior point method like the one we have. So this is what you want.
So thereís lots of ways to do this. Theyíre actually all kind of cool. The standard tricks, these go back to 1948 or something like that. I mean, a standard trick is to solve phase one by solving a similar problem. So hereís what you do. You solve this problem. This is the most classic one. You have a new variable s, and you say fi(x) is less than s here. Ax = b, and you minimize s here in this problem. Thatís a convex problem, too, however, for this problem I can always find a strictly feasible point.
Well, sorry, provided I know any x thatís in the domain of f and satisfies Ax = b. I choose whatever x I like, okay? Not feasible for the original problem. I mean, a lot of the fiís are way bigger than zero. And then I just find the max of the siís and I take s equal to that max plus one, and thatís now strictly feasible. Strictly feasible point. I applied a barrier method or Sount to this thing, and then I can put in a logic that says this, if s ever gets negative, just quit because thatís good enough. Youíve actually produced.
And then you would drop to phase two. Phase two would start from there and would actually then optimize the original objective, which would be f(0). F(0) is not in here. Everybody see this? Thatís the idea.
So this is the standard. Now, if youíre minimizing this problems and you minimize it, and s* is positive, you actually now have a proof that the original problem is infeasible, and in fact, the dual, by taking the dual point found from s* here by solving this problem actually says certificate establishing that the original system of inequality is infeasible.
Okay? And thereís one thing where the results are ambiguous, and thatís when p* is zero, okay? So thatís when youíre right on the boundary. It says that the problem is, letís see, it means itís feasible. Well, it could be all sorts of sick things could happen. It means fum could be feasible, but not strictly feasible, and so on and so forth. Okay.
Now, by the way, letís talk about how you imagine this is going to work. So letís do that. If the problem is feasible this will work perfectly. S will keep going down. At one point, s will become negative and you terminate. You put logic in there to terminate, so you call your barrier method with a strain that says phase one, and it knows then stop, not when your duality gap is low, but stop whenever s is negative and quit right there.
However, suppose itís infeasible and letís se what happens if something is infeasible. These are fi(x) here, and letís say I start with a problem like this. I donít know why Iím drawing them up like that. I donít know what that was, sorry. Do it again, all right.
So hereís zero, our critical point. Hereís fi(x), hereís a bunch of inequalities that are satisfied like that, and hereís a bunch that are violated, okay? Now, this phase one method puts s over here. It puts a bound above these guys. Thatís s, and it starts pushing this thing to the left, right? Okay, yeah. I was checking if that was the left, yeah. Thatís pathetic, isnít it?
Oh well, all right, so you start pushing this to the left, and of course if you ever push to zero, then theyíre all on the left. You exit and you jump to phase two, okay. Now what do you suppose is meant from infeasible. Youíre going to push and push and push and youíre going to stop because you canít find a pint that has all the fiís negative.
When you stop, what do you imagine is going to happen? Suppose when you stop, s* is one. What do you think this plot is going to look like?
Student:A bunch of functions [inaudible].
Instructor (Stephen Boyd):You got it. A whole bunch of constraints are going to be violated by one. Okay? So a lot of constraints are going to be violated, because when you start pushing points like this, right, theyíll pile up and an bunch of them will pile up and youíll keep pushing and itíll get harder. And then youíll come up against a wall and you stop.
Oh, by the way, whatís the force that you have to push with? The Lagrange multipliers, right? So some of the Lagrange multipliers. Mechanically thatís exactly what it is. So you push and push and push and push then you hit the wall. You hit the wall and a whole bunch of things are violated, okay?
Well, you answered your question. Youíre problem is not feasible, but there are other very interesting variations on this, and Iíll talk about one and then weíll move Ė weíll quit, actually. Letís see. So the sum of infeasibilities method works like this - I introduce a private. Oh by the way, you can think of s as sort of like a little, the interpretation of s here is, youíre looking for a point where fi(x) is less than zero. Thatís what youíre looking for. So you say well look, Iíll give you a little extra slack, a little extra bonus, you know. You donít have to do zero, you can make it less than seven, and then seven gets pushed down, okay?
So here though, everyone gets their own private little slack thing. By the way, if weíve seen this before. This is the beginning of the story on the support vector machine, okay? Actually, itís the beginning of the story on a lot of things.
So what you do is this, and everybody gets this own slack. If this slack is zero, it means that inequality is happy. Itís satisfied, and you want to minimize the sum of these infeasibilities. Now, already you should have all the neuroconnections before you even do this or anything like that you should have a feeling for whatís going to happen when this is infeasible.
Whatís going to happen in this case is this is an eristic for solving the problem. Please find me a point where that violates as few of the inequalities as possible. Okay? This is an eristic for it. This is how the support vector machine works. Thatís how all these things work. Okay?
When you solve this problem, hereís what happens. When you solve this, and if you have 100 linear inequalities and 50 variables infeasible, and this is what happens when you. I just plotted backwards I guess, but these are supposed to be positive. When you push these up here, hereís what happens. You get to the point and basically most inequalities are violated. So this is the basic phase one. We use some of the infeasibility, thereís only 21 in the end that are violated. Is 21 the minimum number of constraints that can be violated? Who knows? No one knows, but if the eristic for violating just a few.
So this would have lots and lots of applications. If youíre doing engineering design, for example, and you have a whole bunch of inequalities and itís infeasible of course the main message to tell somebody which is useful is sorry, but your specs canít be met. Thatís the most important thing.
But after that, someone says gee, thanks for the bad news, but can you tell me which ones I should relax, you know, which constraints do I have to go back to somebody, the specificer and actually beg and grovel for more from, right? Thatís what basically happens.
So in a normal phase one, you know, everything is violated. You donít really have an idea of which were the ones that were more violating or something like that. In this case you would actually, in this case youíd actually go back and zoom in on one step. I mean, you donít know this is the minimum sets of violations, but it gives you much, much more information, so okay. We will quit here.
[End of Audio]
Duration: 77 minutes