Instructor (Stephen Boyd):Great. Okay. Can you turn off all amplification in here? Thatíd be great. So let me start with a couple of announcements. The first is Ė letís see. Oh, Homework 3 is posted. I know youíll be very happy about that. Actually, Homework 3 start early. Do not start next Tuesday. Just a hint. Start early. In Homework 3, thereís numerical stuff, so although it should fit you. Youíll see the X. Although, youíre absolutely welcome to go the Python route. Actually, anybody here gonna go the Python route? There was at least one person. Typically, that would correlate with people who wake up at letís say 1:00 p.m., so itís not surprising that theyíre not here. They are in the class but sleeping right now. So you will use CBX. It should install in about 25 seconds or something like that. But just in case it does not, thatís why I recommend you start immediately. And the other thing is like anything else, you just wanna start early and get all the stupid questions out of the way and all that early. So please start Homework 3 early.

The other thing is that the Monday section, Jacob will give, and itís gonna be on using CBX, so youíre strongly urged to go to that for this. And of course, weíll post both the video and weíll also post the slides for Mondayís section. Letís see. I wanna make one other very general announcement about the homework is we had a lot of questions about what do we mean sort of in the homework, like when it says which of the following functions is convex. And we got people asking things like do I have to have a full proof and all that sort of stuff. So I thought Iíd say something about that. The main thing actually is that you should sort of understand it, and work through everything, and try to figure out which things are quasi-convex or quasi-concave or whatever they happen to be. Some of the problems we asked of course, these are not by any means easy, so if you have a headache trying to figure out if the quartile as a function of the probability distribution is whatever it is Ė if you have a headache from that? Thatís normal. Thatís as it should be.

Most of the applications wonít involve stuff that tricky, but some actually will, so thatís kind of the idea behind that. As for how much detail you have to give to justify this, thatís kind of up to you. Youíre welcome Ė we donít need to see some long proof or something like that, but actually some justification would be good unless you have developed some kind Ė well, I guess you all know about the concept of like an idiot savant. These are people who sit in institutions, drool most of the time, and then about once a week look up and spit out a 32-digit prime number. Everyone knows about these people. Apparently not. So I havenít encountered it yet, but if in fact Ė maybe not such a severe case, but if it turns out you can look at a function, just think about it, make no calculations, just look at it, smile and say quasi-concave, thatís fine. Actually, if thatís the case Ė if you find that you have that ability, Iíd like to speak to you. Barring that however, the important part is that you should have some kind of a justification. If you happen to have had a class on analysis, or if your background is strong in math, feel free to give us a nice proof. If itís not, donít worry. We just want some kind of a justification. You wanna look at the sublevel sets or something like that. So any question about last time? If not, weíll continue right along. You should be reading of course Chapter 4 by now, and in fact should probably be finished by around today or something like that. So between the homework and the lectures, you should be okay.

If you can go down to the pad, last time we looked at linear fractional program. Thatís a very famous Ė and maybe the most famous quasi-convex problem. There are lots of variations on it, and a variation Ė itís a generalized linear fractional problem is this. You minimize a linear fractional function here. Now always in a linear fractional function, you have to decide which sign the denominator has, and by convention this would be positive. That technically has to be specified. You have to specify the domain. So hereís a linear fractional function. You minimize a linear fractional function subject to linear inequalities and linear equality constraints. Now thatís a quasi-convex optimization problem, and you can solve it by bisection. Thatís easy to see because if you wanna know is F0 of X less than T, thatís the question. So youíre really asking is the optimal value of this problem less than equal or T. Can it even be achieved, the objective value T?

All you do is you take this thing which is positive, and you multiply it out, and it turns out thatís equivalent to the linear inequality like this. Now thatís a linear inequality, and therefore youíd call it an LP feasibility problem or something like that. To check if the optimal value Ė if thereís an X that satisfies all this and has F0 of X less than T, to solve that Ė to determine that feasibility problem, you would solve an LP feasibility problem. And then you could bisect on T and solve this. Okay? So thatís one way to solve this. It turns out though, some quasi-convex problems can actually be transformed to convex ones. It hardly matters frankly, but in this case itís good to see how this works. And let me point something out. If you were to take this approach, youíd have to solve an LP for each bisection step. Now the number of bisection steps youíd probably have to solve would be somewhere between 10 and 20. 20 would give you two to the minus 20 in bisection, and you would reduce your ignorance Ė that is the difference between a known feasible point and a known bound Ė youíd reduce that by a factor of a million. So youíd have to solve by 20 LPs.

By the way, for most things it wouldnít make any difference at all. And itís even more than that. Later in the class, youíll understand that you can actually do something called warm start, which means you actually solve one LP, which is very close to the LP you just solved, and it can be done very fast, so the effort is not even a factor of 20. Itís probably a factor of four. Nevertheless, itís interesting I think theoretically, and itís actually interesting in practice that this quasi-convex problem can be turned into actually just an LP. Yeah?

Student:That second line, should that C transpose X plus be on our right side, should that be a [inaudible]?

Instructor (Stephen Boyd):Should it be C transpose Ė in fact, what on earth should it be? Should it be E transpose like that? Are you happy now? Good. Youíre happy because what I had was wrong, and this is right. Thatís good. Thank you. It was supposed to be the denominator. It just didnít work out that way. So letís see how to do this. Itís very interesting, actually. It turns out this is nothing but the perspective transformation. Or some people call this homogenization, so letís see how that works. What you do is you write down this problem. And letís actually see what happens here. What you do is you write down something Ė well, here. Itís equivalent to this LP, and let me go through the argument, or at least Iíll give you some of the argument. The full argument you can find in the book.

Here what you do is you introduce a new Ė I start with the original problem. Let me just start with the original problem up here, and Iím gonna introduce a new variable called Z scalar. I hope thisíll work. So I wanna minimize C transpose X plus D divided by Ė this time Iím gonna get it right Ė this. I wanna minimize that. Note the period. Everythingís cool here. Subject to Ė thatís st, but it means subject to of course Ė H and AX equals B. Now you know when you do things like you have a quadratic function and you add a new variable thatís equal to one to convert it to a quadratic form, I guess that general principle is called homogenization. You make the problem homogeneous by introducing a new variable. So the way weíre gonna do that is this. Iím gonna introduce a new variable called Z, and Iím gonna say itís one. So I havenít hurt anybody, and I can most certainly do this. I guess the Z shouldíve gone in front of the H, but you know itís okay. In loose parsing, you can put a scalar after a vector. Everybody cool on this? I mean how could you not be? I just multiplied a bunch of things by one.

However, if you stare at this problem long enough, without the Z equals one you realize the following: itís completely homogeneous. In other words, scaling X and Z up here by a positive number has no effect whatsoever on the inequalities, none, has no effect on the objective. That means in fact I can normalize Ė my current normalization just equals one, but I can normalize it any way I want and simply divide out later. It will make no difference. So Iím gonna renormalize this problem. And Iím gonna renormalize it Ė now I am cheating. Iím looking down here. Iím gonna normalize it so that this Ė Iím gonna comment this out, and Iím gonna add this constraint in. Thatís just a different normalization of the same thing. Itís just fine. Now Ė whatís that? Z has to positive. Thatís correct. Sorry. Thank you. Actually, technically Z has to be strictly positive, so in fact Iíll even write it that way like that. Now if this is one, the denominator goes away. That goes away. And I am left with an LP here that I have to solve. Thereís a question.

Student:Sorry. Whatís Y?

Instructor (Stephen Boyd):What is Y? Itís nothing.

Student:[Inaudible] even shows X?

Instructor (Stephen Boyd):No. Itís just nothing. Itís supposed to be X. Itís just one of those days, I think. I have a feeling where yeah, fine, good Ė keep me on my toes here, not apparently that itís gonna do any good today, but weíll see what happens. I shouldíve had that second espresso. I think that was the problem. I didnít. I meant to, but I didnít. All right. This is now an LP here. And in fact, if you replace this with this, you get all sorts of information out of this. I wonít go into the details, but for example if you were to solve this and Z comes out positive, then by simply renormalizing here, you get this solution. Thatís the first part. If Z turns out to be zero, thatís like this one case where this thing is unbounded below or something like that, but thatís the idea. By the way, this is not a simple trick, but itís actually one that comes up a couple of times, and itís pretty cool that you can do this. By the way, for most quasi-convex problems this trick wonít work, but for some it does.

Now we get to the idea of a generalized linear fractional problem, and thatís this. You wanna minimize the maximum of a bunch of linear fractional functions. And thatís on the polyhedron where all the linear Ė well, itís at the intersection of the domains of the linear fractional functions. Thatís quasi-convex. Thereís no way to convert that to an LP in general, so that youíre gonna solve by bisection. And hereís an example thatís actually quite interesting. Itís the von Neumann growth model, and it works like this. Weíre gonna optimize over X and X plus the following. So X is gonna be a set of activity levels. I guess we have to have X positive here too, or something like that. X is a bunch of activity levels at one period, and X plus are the activity levels in an economy at the next period, so thatís what these are. Then we have matrices AX and AX produces Ė this gives you the amount of Ė AX tells you the amounts actually produced by an activity level X, so this could be for example in our M, if the N activity levels produce say M different classes of goods. So if you have an activity level X, AX is a vector telling you how much goods Ė the vector of goods produced by that activity level here. Thatís AX. Now when you operate the economy at the next step, thatís X plus Ė are the activity levels. BX plus Ė thatís a vector telling you how much is consumed. And then you simply say that the amount consumed at the next stage here, it uses some Ė it has to use the goods produced in the first stage, so youíd have an inequality that looks like that. Okay?

These describe Ė itís a polyhedron, of course. Right? The set of activity levels current and next that satisfy these inequalities is a polyhedron. And now what we want to do is we wanna maximize over these the minimum activity growth level. Thatís XI plus over XI, so thatís what we want. I guess we donít need this because when you write this thing, X positive is implicit here. To maximize this is the same as minimizing the negative of this, which is minimizing the max of XI over XI plus, each of those, so thatís the same thing. And thatís exactly of this form. Thatís a quasi-convex optimization problem, and you can solve it directly, so thatís one. What you wanna do is you wanna set the activities in order to maximize the growth rate of the slowest growing sector, so thatís the problem here. And thereís lots of variations on this.

So that ends up sort of problems that are linear and related to linear. Now we move on to quadratic. By the way, I can tell you what the year is on some of these. The linear programs as I mentioned earlier go back to Fourier at least, but the year maybe generalized linear fractional stuff is maybe the Ď50s or something like that, already widely used in the Ď50s. Quadratic programming is the same. This would be mid-Ď50s. So in a quadratic program, you have a convex quadratic objective. Thatís here. So P is positive semi-definite, and you minimize this over a polyhedron, so a set of linear inequalities and a set of linear equalities. By the way, if you just said P equals zero, you recover an LP, so this is a strict extension of LP. And the picture now is this. Hereís your polyhedron which is your feasible set shaded, and your objective function now instead of being affine or linear is in fact quadratic and convex quadratic. So the sublevel sets are ellipsoids, and so this is showing you the sublevel sets of a quadratic function. Of course, the gradient points down the slope, and is the normal to the tangent hyperplane at the level surface. So for this particular problem instance, thatís the solution there. One thing you see immediately that you didnít have for an LP Ė itís a basic fact, although not of particular importance for us, that in a linear program you can always take a solution thatís at a vertex. This is very basic. Itís kind of obvious geometrically. Itís also completely obvious analytically, but it doesnít seem actually to have too much use for us and in most cases, although itís made a big deal of I think mostly for historical reasons.

But you can see clearly here that the solution does not have to be at a vertex. It can be inside a face. Again, it doesnít make any difference to us or whatever, but thatís Ė so this is a quadratic program. Weíre about mid-Ď50s here in where we are historically. So letís look at some examples. The most basic one would be just least squares. Thereís no constraints. Thatís basic linear algebra. The solution is just given by Ė or sorry, a solution I should say in the general case is given by this pseudo-inverse or Moore-Penrose inverse. Thatís just A dagger B. But immediately for example, you can take a least squares problem, and a very common thing Ė I mean this is really dumb, but you just add some linear inequalities. In fact, [inaudible] balance. So thatís a quadratic program. By the way, what youíre going to see later in the class is that for example a least squares problem with bounds on the Xs, that can be solved basically as fast and as reliably as just a least squares problem with a very small modest factor, totally reliable.

Let me actually say something sort of culturally about how the course goes. Thereís a lot of people that know about least squares, okay? A lot. I guess not a huge number, but you know thereís a lot of fields where least squares is widely Ė a lot of people know about least squares in signal processing, statistics, communications, machine Ė it goes on and on. A lot of people know about that, okay? You just throw in something as dumb as this, ranges on the Xs, and that number of people who recognize that as basically a problem that is completely solved is Ė just technology. We can just solve that, period Ė drops by at least a factor of ten. What you will find instead Ė do people have things like this? Sure, they do. But of course, what they do is they do heuristics. They either invent their own algorithm thatís impossibly long, might actually work, but itís fantastically complicated, or they make some iterative thing with regularization. You wouldnít believe what happens when people encounter things like this. And people make a big deal out of it. Thereís no reason for it. So I think the zeroth order thing to do is to look at a problem like that and say QP, therefore weíre done. Itís just done. Thereís almost nothing interesting to say about it. Youíll write code that solves this. Itíll be 30 lines. Itíll be totally reliable. So itís not a big deal, but at the same time it has huge practical uses.

So if you wanna do parameter estimation but you have bounds on the Xs Ė hereís a stupid one. Suppose youíre estimating powers or intensities. Well, these are generally speaking non-negative, so thatís just like non-negative least squares. So for us, thatís just a QP. Iím telling you now, thatís already Ė youíve moved into a much smaller group than the people who know about least squares. For example, a whole huge thing in statistics Ė this is just amazing actually to me, but whatever. So thatís the constraint that the Xs be ordered. So of course thatís a set of N minus one linear inequalities. I think you had it on some homework, like Homework 1 or something. I donít know. Write us a set of linear inequalities. So believe it or not, thereís a whole subfield called isotonic regression or something. Thereís books written on this, literally on minimizing Ė on doing least squares with that. Why is there a book on it? Because thereís no analytical solution like there is for just ordinary least squares. It comes up enough anyway. For us, monotonic regression? Thatís below the level I would even assign as a homework problem. No, I would go that low. But it would be one of a Ė when weíre counting up the amount you have to do in a week. That would count for like none, very little anyway, and whole books on it.

So Iím just mentioning these things are simple, but actually so far this has not diffused out to a lot of application areas because people havenít gotten the word yet, which is very strange to me, that you can do this and it is completely straightforward. Okay. Weíre back on track here. Hereís an interesting one is linear program with random cost. So thatís another Ė letís do a specific example. If you wanna make a specific example, letís do diet Ė the infamous diet problem. So you are Ė X represents the amounts of a bunch of foodstuffs youíre gonna buy, something like that. And there are constraints on it. There are things that have to balance out exactly, and thereís inequalities. Of course, this can include lower bounds and upper bounds on for example nutrient levels and things like that. So our constraints become a polyhedron in X here. Now what weíre going to Ė the problem now is weíre gonna formulate this diet now, but weíre gonna implement it next month, and we donít know the prices of the foodstuffs, so itís not as simple as saying just minimize C transpose X which would give you the cheapest diet that would meet the requirements because you donít know C. You should know something about C. C for example has a mean, C bar, but it also has a covariance sigma, something like that. This doesnít matter here. There are many things you could do. In practice, Iíll tell you what would probably be done would be that. You would simply ignore the variation and just work with the expected values.

Now itís not hard to imagine that that could be a seriously bad choice because itís very easy to do. All you do is take one foodstuff, or one component, or whatever and you make it slightly cheaper than another one that has similar nutrients, but you make it have much more price volatility than the other one. Everybody see what Iím saying? The LP will Ė I mean if you ignore the risk Ė this is price risk Ė if you ignore the price risk, the LP of course will use the cheap stuff based on this expected value. So thatís an extreme example, but thatís the kind of thing you could do. So what you really wanna do is trade off mean price and price risk. This would be done Ė youíll see weíll do more of this later Ė by simply minimizing a weighted sum of the expected cost and the variance here. Gamma is a risk aversion parameter, and thereís lots of ways to figure out what it is. For that matter, you would probably solve this problem for a whole family of gammas and get a tradeoff of expected price versus the variance in the price. And then you select based on your risk aversion where you should operate on that curve. So thatís the point. Okay. But if you work this, take a look at it, itís nothing but a Ė this is a QP. Thereís one condition. Gamma has to be positive here. Otherwise, this is no longer convex. So whatís interesting about that is that you can do risk averse design by convex QP, but you canít do risk seeking. If this were negative, it would be risk seeking. Youíd actually prefer if thereís two diets that have about the same mean cost, meet all the constraints, in risk seeking youíd actually prefer the one that has higher variance in cost. Fortunately, thatís not something you wanna do. So it turns out once again what you wanna do lines up with what you can do.

I should make one other comment here, and thatís this that in general Ė by the way, not entirely always Ė if P is not positive semi-definite, so if itís got just literally one negative eigenvalue or something like that, this problem goes from being extremely straightforward to NP-hard. So if you know what Iím talking about, thatís fine. If you donít, I will just say this. It goes from a problem that is in theories Ė actually multiple theories would assert that that problem is now very hard to solve, and also in practice. Youíd go from a problem where you can solve the exact solution if X can be huge like 10,000 by Ė thatís just no problem. The 30-line code youíll write to solve this for example will easily solve that. If this becomes indefinite, oh, you can solve these problems, but for example with X in R20, youíd have to have a lot of luck and a big old cluster, and you might get the answer later that day. Youíre just in a completely different regime. If X is in R100 when P is indefinite, this problem essentially becomes impossible. So convexity makes all the difference with quadratic programs. Okay.

Now another generalization Ė I mean itís kinda silly but Ė is a QCQP, thatís a quadratically constrained quadratic program. So here the constraints can also be convex quadratics. Now here, if these are zero, you recover a QP. And of course, if all the Ps are zero, you recover a general LP. Here, these things Ė if P is positive definite, of course this represents an ellipsoid, so here the feasible set is an intersection of Ė well, ellipsoids and degenerate ellipsoids. Now, a degenerate ellipsoid is one that can have a flat face and be infinite. So a half space is actually degenerate ellipsoid, for example. But for example, if P is positive semi-definite but only has Rank 3, this is sort of a degenerate ellipsoid. It means that in N minus three dimensions, itís sort of infinite. In three dimensions, itís got curvature. For example, it would be a cylinder in R2 for example Ė R3, sorry. A cylinder in R3 would be a degenerate ellipsoid. That would be degenerated by a Rank 2 Ė a three by three matrix P thatís Rank 2, and this inequality would then be not a cylinder, but it would be an ellipsoid like this, and then it would have an infinite extent in another direction. Okay? So thatís it. Itís a convex problem and can do this.

Now we get to the first thing. Again, this is maybe Ė weíre still in the mid-Ď50s. Now weíre jumping ahead. Now we get to Ė well, it depends how you wanna count the history, but youíre into the Ď90s now, so this is new roughly. Although, actually thereís papers written in the Ď50s that refer to this. They knew how to do it when there was one inequality like this, but not in general. So letís take a look at this. A second order cone program Ė this is SOCP. There was a confusing while by the way in the Ď90s when these were referred to as Ė some people were trying to call these QPs and it never stuck. But there was a brief period with a small group of where they tried to get this to be called QP, but it just didnít Ė fail. And it seems like SOCP has won out as the term. So here it is. You minimize the linear function subject to Ė and the constraints are quite interesting. Itís a norm of an affine thing is less than or equal to an affine thing. Notice Ė very important here, it is not norm squared. If this was norm squared, this problem would be a QCQP. It is not norm squared. This sort of the square root of a convex quadratic. Thatís what this thing is. Okay?

And itís called second-order cone programming because each constraint is nothing but this. You have an affine mapping that maps X into AIX plus BI, thatís a vector, comma, then a scalar Ė CI transpose X plus DI Ė and that that should be in the unit second-order cone, the Lorentz-cone in RN plus one. Okay? So thatís each of these constraints. Now this is more general than everything weíve seen so far. It includes Ė well, not the linear fraction. It includes linear programming. Thatís obvious because you just make that zero here. It includes linear programming. You can rewrite quadratic programming and you can rewrite QCQP in this form. But itís more. There are problems you can write as an SOCP you cannot write as any of these, so this is a real extension. Thereís also something very interesting to point out here. If you write the Ė if you rewrite this in sort of our standard form, thatís AIX plus BI minus Ė I guess youíd write it this way, right? Minus CI transpose Ė minus DI is less than zero. So this would be FI of X. Okay? That would be our standard form to write this. Hereís a very interesting fact about FI of X. Itís not differential. This is the first problem weíve seen that has that feature. It looks differential. I mean Ė well, it actually Ė this function Ė of course, thatís affine. Thatís differentiable everywhere. By the way, when is the norm differentiable? The two norm? Where is differentiable? Where is it not differentiable?

Student:[Inaudible] is the origin.

Instructor (Stephen Boyd):Itís not differentiable at the origin. How about everywhere else? Itís analytic everywhere else. Itís got derivatives of all orders, right? And in fact, just think because of the graph. The graph is like this ice cream cone. Itís perfectly smooth, no problem except for that tip.

So by the way, you might say then, ďOh, come on. That one point makes that much difference?Ē This is like what are we in the math department? Only a mathematician would worry about that Ė these are the kind of things you would think but not say. You would say what kind of a human being would worry about this one point where itís non-differentiable out of all RN. What do you think of that? Letís not worry about it. Well, you know obviously that argument is totally wrong. It turns out what is interesting about second-order cone programs is that of course the solution is very often at the point. Obviously, if you have pointed things and you optimize things like linear functions, of course you end up at the point. So yes, these functions are non-differentiable at only one point. Well, okay at the whenever AI transpose X plus BI is zero, itís non-differentiable. However, it turns out thatís where the solutions lie. Thatís what makes this useful in applications and so on. So itís not just something Ė well, people in math are correctly interested in the non-differentiability. So this is a non-differentiable problem.

By the way, it would be completely unknown how to solve this efficiently letís say in 1988 or something like that. There were some people who knew about it by 1988 in Moscow. That was about it. So now itís completely routine, so itís as if itís always been here, so people do this all the time. By the way, I think itís in Excel now, so to give you a rough idea of whatís happened. Well, a lot of these things actually are. Okay. Letís look at an example of second-order cone programming. So very interesting is this Ė letís start with a linear program. Letís minimize C transpose X subject to AI transpose X less than BI, and what weíre gonna do is weíre gonna add the idea that thereís uncertainty in C, AI, and BI. And in fact, you know what? Weíll worry about uncertainty in the A and B. By the way, earlier in the diet problem, weíre worrying about cost variation. Iíll just make C fixed, and I wanna worry about variations in AI. By the way, if you wanna go back to the diet problem and figure out what that would mean, what would it mean that there would be variations in AI? I mean in terms of the diet problem. Whatís the implication? What would it mean? What does it mean.

Student:Maybe thereís variation in how much nutrition you absorb from certain food.

Instructor (Stephen Boyd):Exactly. Thatís exactly right. So the point is when you get some foodstuff and you analyze it on Monday, and Tuesday, and Wednesday, itís close but itís not the same. And therefore, if you compose a diet actually, and you ask for so many calories or whatever, and you form it this way and you just use the mean, you might not actually on a sample by sample basis actually have the number of calories you want, in fact the number of calories in distribution. So thatís exactly what it is, so it might be nutrient variation or something in the foodstuffs. Okay. Actually, pretty much any LP thatís ever solved, hereís the way it works. The coefficients in AI, the actual numbers in it Ė by the way, often they are zero. And by the way, when theyíre zero, they probably really are zero. It basically means if the third entry of A1 is zero, it means that X3, the third variable, doesnít enter into the first constraint. And the ones by the way often also have the flavor of being actually one because youíre sort of adding something. And typically the minus ones, too. Actually, when you see a minus one, it typically really means minus one.

Any other number you see is probably Ė probably traces back to Ė if you trace the provenance of that number, it will go back to something which is not known. It will go back to some analyst or some estimate of a price. Itíll trace back to some geometry and some Youngís modulus or whatever. Itíll trace back to all sorts of crazy things. But for sure, any number like that has some variation. You might know it within 1 percent. You might even know it within 0.1 percent. It depends on the field and application, but itís not at all uncommon that you would know it to like 10 percent for example. So thatís just to point out that this is quite realistic to have this Ė to model variation in data in a problem. Now you could do this lots of ways, and later in the class weíre gonna look at this in much more detail, but this is just for now just an example of SOCP. Two common ways are this, and itís strange people argue about which is better, and the answer of course is it depends on the application. So in a deterministic model, hereís what you would do is you would say something like this. You would say that each of the AIs lies in an ellipsoid. That would be something like an uncertainty ellipsoid. And you would insist that the constraint hold for any AI in the ellipsoid. And so the other names for that would be like a worst case or model or something like that.

By the way, the people who would do this would go and give you long philosophical arguments. Theyíd say the greatest advantage of their approach is that they donít assume a probability distribution because there arenít any and blah blah blah. They go on and on with this. Or itís a worst case, and it doesnít matter. Detractors would say thatís way too conservative because youíre assuming the worst thing that when you buy all these foodstuffs and put them together and blah blah blah Ė anyway, it all makes no sense. And then the other model of course is stochastic. So here youíd say AI is a random variable, and this is called a chance constraint. Youíd require that the constraint be satisfied with some reliability. So [inaudible] might be 0.9, more commonly 0.95, 0.99. You might even see 0.999. And by the way, the embarrassing thing is that these two are not that far apart because for example if thatís 0.999 and you have a three sigma ellipsoid and you put that over here, youíre very close to doing the same things, but it doesnít keep these people from Ė they still fight with each other. So thatís the Ė and itís very strange because it doesnít make any sense at all. It depends on the application, whatís the cost of violating the constraint, and so on and so forth. It all depends.

So [inaudible], both of these turn out to be SOCPs, second-order cone programs, and the way you do that is this. Letís parameterize the ellipsoids as a center plus the image of the unit ball under a matrix PI. So this is one parameterization of an ellipsoid. And then the robust LP looks like this. It says minimize C transpose X subject to AI transpose X is less than BI. Thatís for all A in this ellipsoid. By the way, some people make a huge big deal out of this. They call this constraint here a semi-infinite constraint. Got it? Why is it semi-infinite? Because you see this represents a single linear inequality for each element of an ellipsoid, and thereís an infinite number points in an ellipsoid. Did you know that? So thatís why this is a semi-infinite thing. Big deal. Right? Although it doesnít keep people from writing whole books on it. Iíll go on. But this is easy to Ė when is it true that AI transpose X is less than BI. Letís fix X. You fix B. A varies over this ellipsoid. When would this hole for every element in the ellipsoid Ė you have to check whether AI bar plus PIU transpose X Ė whether thatís less than BI for all U of norm less than one. Okay? But thatís easy to do because this is nothing but AI bar transpose X plus Ė and then Iíll make it this way. Iíll write it as U transpose times PI transpose X like that. Now if I wanna maximize this over all U with norm less than one, thatís a constant. This is an inner product of U with a vector. And so obviously by the Cauchy-Schwarz inequality that this thing, the largest that number could be is the norm of PI transpose X. And by the way, the worst U would be PI transpose X divided by norm PI transpose X. Thatís the worst U. Okay?

So I plug in the worst value and I get this, and I insist that that should be less than BI. And that is right here. Now if you stare at this closely, youíll realize thatís an SOCP because you have just what you Ė you have a norm, two norm, of an affine function, and a linear, and a constant. Itís SOCP. By the way, let me tell you a little bit about what the implications of this are. People can solve SOCPs now basically as fast as LPs, so almost as fast. Certainly for all reasonable Ė for huge numbers of situations, itís the same speed, same reliability, everything. That means people can solve robust LPs basically at no additional cost Ė little additional cost over an LP. That has lots of implications. By the way, it is Ė that fact or observation is propagating out, so it has now hit finance. So thatís why SOCP solvers are in Excel now because whenever you solve an Ė basically, whenever you solve an LP, then you ask the person how well do you know the data. If they donít say perfectly Ė and if they do, theyíre probably a liar Ė but if they donít say perfectly, then you should say then you should be solving SOCP. And actually, a lot more people these days will say thatís exactly what weíre doing.

So this actually has important Ė lots of practical consequences. Itís not fully diffused, but itís getting there. Letís look at the stochastic approach. So in the stochastic approach, weíll assume that the constraint vector is Gaussian with mean AI bar and covariance sigma I. Now when you form AI transpose X, thatís just a scalar Gaussian random variable. It depends on X of course, but weíre fixing X. Itís got mean AI bar transpose X and variance sigma X transpose sigma I sigma. And so the probability that you meet your constraint is equal to the CDF. Thatís a normalized random variable of course. Right? Thatís BI minus AI Ė this is the probability. Thatís the normalized random variable or something like that. This one. And then you put BI here, and this gives you the probability that you meet this constraint. Okay? By the way, these are called chance constraints, and problems that generally involve constraints that involve probabilities of something are Ė thatís a whole field called stochastic programming. This one actually is [inaudible] enough to actually in my opinion deserve a field name. Thatís not true of some of the others, but this is actually complicated stuff, and Iím okay that thereís whole books written on this.

But in this case, in this simple case we can write this out analytically as this, and itís an SOCP. Actually, itís interesting. Itís an SOCP provided the inverse CDF of a Gaussian evaluated at eta is positive. That happens for eta bigger than 50 percent. So actually, itís extremely interesting. We can do chance-constrained linear programming. In other words, I can have a linear program with Gaussian random constraint vectors, and I can impose the constraint that each constraint is satisfied with a certain probability. The important part is that probability has to be more than 50 percent. By the way, which corresponds exactly to risk aversion versus risk seeking. So the one weíre really interested in probably is eta equals 0.95, 0.99. These are the ones weíre really interested in. Weíre probably not interested in 10 percent. Thatís risk seeking. Anyway, that doesnít matter because we canít solve a problem for 10 percent risk aversion anyway. Okay. Our next topic is geometric programming. By the way, for all of these thereís no reason Ė you should have read the chapter, so this should not be the first time youíre seeing these things. You should not be following everything Iím saying because Iím going fast, but I assure you all of these problems you will encounter multiple times during the rest of the quarter, so you should just be kinda getting a rough idea here. Youíll see all of these again. So donít be concerned if Ė Iím going fast. Thatís all Iím saying.

Geometric programming Ė this is an interesting special class of Ė weíll see they transform to convex problems. This also by the way has a long history. It goes back into Ď70s. Actually, this goes to the Ď60s, including the first book written on it. Itís super cool. Itís this book where they Ė itís a whole book on geometric programming. Iíll say what it is in a minute, but in the last paragraph in the book it actually says Ė literally like in the last paragraph it says, ďYou know, itís entirely possible that some day these problems could be solved using a computer.Ē No, I mean it. It says that. The whole book is about these cases where you would solve it by hand by various ways, but itís very cool. By the way, they had it exactly right. So this has a long history. Actually, probably elements of it were known in statistics in the Ď60s. And it comes up in a lot of other fields. Itís a weird thing. When you first see it, youíll think, ďI have never ever heard of such a thing,Ē but once you start looking for these, theyíre like everywhere, or in certain fields theyíre everywhere.

All right. Hereís what it is. So unfortunately, thereís some deeply unfortunate nomenclature here, but it came from the Ď60s and nothing we can do about it now. Just for the record, right Ė a monomial Ė people in math have been using the word monomial for like I donít know 400 years or something like that? And it always means the same thing. Itís a product of a bunch of Ė itís a single term polynomial. It has always meant the same thing. Thereís absolutely no idea as to what monomial would mean, but they decided they would take this term which everyone agrees on has a meaning and extend it. So in the context of geometric programming or GP Ė you have to be very careful with GP, though. If you type GP into Google, youíre going to get several things. Youíre going to get geometric programming, but youíre also going to get something called genetic programming, very different. Iíd better not say anything about it. Weíll just move on here. So in the context of GP as in geometric programming, you have something called a monomial function. Itís a local definition, so donít ever say this outside in public or something like that. Donít ever say that like X to the one half is a monomial because if thereís anyone Ė I mean unless youíre only around people known to be talking at that moment about geometric programming because youíll sound like an idiot. Itíd be like changing the word polynomial or something like that, and saying, ďWell, thatís what you call a polynomial, but I call a polynomial this.Ē

So hereís whatís a monomial. Itís a positive coefficient times a product of variables, each one to a power, and the power can be anything. It can be an integer, it can be irrational, and it can be negative. So thatís a monomial. This does come up in a bunch Ė I mean obviously itís clear this comes up in a lot of engineering. People call this a scaling law or something like this. It depends on the field youíre in. Itís a scaling law or something. Now again, donít look at me. This was not my idea just for the record. They came up with this thing called a posynomial, which first of all sounds stupid, No. 1, and it also is stupid because itís supposed to combine positive and polynomial. Thatís okay fine, but the point is these arenít even polynomials because the exponents here can be like minus 0.1, and they can be 1.1, so anyway I guess languages are living, and thatís a stupid name and it stuck, so posynomial. There it is. Itís a sum of these things. So you know an example would be something like this, okay? There. Thatís a monomial. Okay? Thatís a monomial, and hereís a posynomial. X1 X2 plus square root X2. There you go. And thereís a posynomial.

Hereís a GP, a geometric program. You minimize a posynomial subject to some posynomial inequalities less than one, and a bunch of monomials are equal to one. Okay? Now you might ask why the one as zero had been our standard before. Thereís a real good reason. The theory of a GP with right-hand side zero is a very short theory because monomials and posynomials are always positive. So the branch of GP where the right-hand side was a zero here didnít go very far because all such problems are infeasible. So thatís a GP. Now by the way, this is not remotely a convex problem. For example, you could say minimize square root X and finish with some stuff down here. That would be a GP. Obviously, itís not convex. Square root X is concave. Okay? So these are not convex problems. However, these can be changed by a change of variables. These can be converted to convex. And this is it. Itís actually quite interesting, the conversion, and itís not at all unexpected. So if the variables are positive, you should have at least a slight urge to take the log. That would be normal. So if the variables are positive Ė weíll just take these variables YI so that XI is E to the YI. And then letís see what happens. Weíre also gonna do the same thing Ė weíre gonna minimize. Well, you can put log in front of anything you minimize because log is monotone, and I could put log FI less than or equal to zero, so now our friend zero is back on the right hand side. Okay?

But letís see what happens. What is [inaudible] zero? And Iím gonna write this in kind of a Ė Iím overloading some notation. Thatís the vector whose entries are E to the YI. So the question is what is this thing. Well, letís take a monomial, and letís take the log of this monomial. Well, you get log C. Thatís B. Then you get plus, and then you get the exponents. You get A1 log X1, but log X1 is Y1, so when you take Ė when you change variables by this logarithmic transform, and you take the log of a monomial, you get an affine function. Thatís good. Now take a posynomial. So here I take the log of a sum of these things, but thatís the same as log sum X of an affine function. Okay? Because each of the Xs I replace with E to the YI like this. Then I take a sum of those things, and I take the log. Thatís log sum X. Thatís this thing. And if you look at this, this function is convex in Y because itís log sum X of an affine function of Y, so it is definitely convex. Thatís not remotely obvious by the way here, not even remotely obvious. I mean it is once youíve seen it and all that, but Ė in fact, articles on geometric programming that didnít understand that it converts to convex programming by a change of variables persisted until like the early Ď80s. So this is not so obvious. So what that means is this GP over here which is not remotely a convex problem Ė I mean also look at the equality constraints. The equality constraints would be things like X1 to the 0.3, X2 to the minus 0.7, X3 to the 1.2 is equal to one. Well, that has no place in a convex problem. The only equality constraints you can have in a convex problem are affine Ė are linear. But it transforms to a convex problems like that, so it just transforms right to a convex problem, and is actually it turns out Ė by the way GP comes up in lots of fields. It comes up in statistics and machine learning all the time. It has to do with exponential families and maximum likelihood of estimation. Weíll also see that itís intimately connected to entropy type problems. Weíll get to that. It comes up in information theory, but it also comes up in circuit design and seems to be completely ubiquitous. And it comes up in power control and wireless. Iím just mentioning this, and weíll see a lot of these later.

Student:How did you get the monomial F to transform into the log is equal to [inaudible] in between?

Instructor (Stephen Boyd):Can you go over to the pad here, and Iíll go over that real quickly down here? So the question was how did I get Ė if you go down to the pad here, Iíll just work this out. Okay. Iíll just describe it. You take the log of a monomial, and the first term Ė oh, you can see it. I see, but maybe itís not on Ė okay. Thatís fine. If you can see it, thatís fine. But for those of you who are sleeping right now, this is what happens when you sleep through the class. There we go. All right. What weíre gonna do is weíre simply gonna take the log of this. Thatís log C plus A1 log X1 plus dot dot dot, plus A2 log XN, but log X1 is Y1, so I get exactly that. Thatís how that transforms. And what I was saying was that it comes up in a lot of applications. Once youíre sensitized to GP, youíll start seeing it all over the place. In fact, kind of in any problem where the variables are positive, they represent powers, intensities, or something like that, densities, you can start thinking how GP might come into play, and weíll definitely see lots of examples of these, of GPs. Actually, itís quite interesting because theyíre quite natural. For example, in wireless Ė and weíll see examples of this, but for example in wireless power allocation or something like that, the Xs are gonna be the powers of transmission, or they could be like the powers allocated to channels in some big multichannel system or something like that, or different frequency bands. And what this transformation says Ė I mean if someoneís doing optimization, they say you know what itís better to work with the log of the powers, but thatís what engineers have been doing for like decades and decades because these are called decibels. So you work with Ė so it says you should work with decibels. No one would be surprised by that.

And then the constraints would be things like if signal to interference ratios should exceed like some factor. When you take the log of the constraint, youíre basically saying that you should write your constraint this way, that the signal to interference ratio should exceed eight decibels. Again, thatís how they would do it. People donít the SIR has to be bigger than like 2.8. They say itís gotta be bigger than you know 8 dB or something like that. So itís actually interesting that exactly the way people who would actually work in these fields would use these things turns out to be Ė corresponds precisely to the convexifying transformation of variables. Weíll see lots of examples of this. Letís look at a mechanical example, and itís design of a cantilever beam. So here you have a bunch of segments. This one just has four, but you know obviously youíd have lots more. And what weíre gonna do is this. Itís gonna be Ė so itís gonna stick straight out, and a force will be applied to the tip, and two things will happen. Of course, this thing will deflect. Weíre not gonna do the huge horrible deflection. Weíre gonna do the small deflection so that we can use linearized approximations. So itíll deflect substantially, but weíll assume weíll use linear models and things like that. It will deflect, and thereíll actually be a strain Ė a stress Ė well, both on each segment. Okay?

And what we wanna do here is obviously if I design Ė the thicker I make the cross-sections, obviously the stiffer the beam. Yeah. Stiffer meaning if I apply a force here it deflects less and the stress is less. So if I make it a big thick thing all the way out Ė and you can easily Ė and you probably have a pretty good idea intuitively of what you want here. You certainly wouldnít want a beam that looks like this, that tapers out like that because now youíre putting a whole bunch of weight over here. Actually, youíre just making it much worse. Youíd probably want one that tapers like that. Thatís just completely intuitive. So the question is gonna be to get the optimal tapering of a beam. Thatís a cross-section. Weíre gonna design the shape of a beam, cantilever beam. Okay. So weíll minimize the total weight subject to Ė and what weíll do is weíll Ė and weíre gonna actually Ė these are not Ė these have a width and height, so thereís a height. I know very little about mechanics except I do know that sort of the strength of Ė well, from carpentry that the strength of thing goes up like the cube of the height, for example. Beyond that, Iím sure there are people here who know a lot more about this than I do. So Iíve got lower bounds on the width and the height of the segments. Weíll limit the aspect ratio so you donít get like an infinitely thin thing like this. Weíll upper bound the stress, and weíll upper bound the vertical deflection at the end of the beam, which is actually where the maximum deflection occurs. So thatís the same as Ė Now, this is actually quite a complicated problem. Itís highly nonlinear and so on and so forth. So but letís say itís actually a GP. To see how that works, we look at the total weight. Now, these are variables. By the way, what kind of function of W and H is that? Again, outside the context of GP, if someone walked up to you on the street, what kind of function of W and H is that? It is a function of W and H, and I wanna know what kind it is. Linear? No. Itís linear in W if H is fixed. And itís linear in H, so this is not Ė well, maybe I wasnít clear. There wasnít two questions, so Iíll say it again correctly. What kind of function is this of left paren W comma H right paren? Whatís the answer?


Instructor (Stephen Boyd):Itís quadratic. Okay. Is it convex quadratic? Hey, wait a minute here. You should not hesitate.


Instructor (Stephen Boyd):No, itís not. It couldnít possibly be. What kind of quadratic form does this have?


Instructor (Stephen Boyd):Thereís nothing on the diagonal. Diagonals would be things like WI squared, HI squared. Thereís nothing on the diagonal. This is block like zero diag diag zero. Now, do you really have to think whether a matrix like that is positive semi-definite? I donít think so. Positive semi-definite things need non-negative on the diagonals. Of course, this has zero on the diagonal, but thereís a rule. If you have a positive semi-definite matrix and itís zero on the diagonal, that row and column is zero. So this has split eigenvalues. So this is a quadratic function of W and H, but it is not remotely convex. Itís got split eigenvalues Ė or concave. However, itís a posynomial because each of these is a monomial, and itís a sum with positively weighted blah blah Ė itís a posynomial. So that part is cool. Now the aspect ratio and inverse aspect ratio are monomials, so if I set them less than a number, I could divide by it, and Iíd get a monomial Ė well, theyíre posynomials, so I can make an inequality of them. Thatís fine.

And the maximum stress is given by this, and thatís a monomial here. Now the vertical deflection and slope are very Ė itís quite complicated to actually work out how much this thing Ė the tip deflects as a function of the width and height of each of these beam segments. Itís very complicated. Okay? But itís given by a recursion, and the recursion looks like this. It says you take VI, and thatís 12 times I minus a half here. I think this Ė in this case Ė this is for I equals one to four Ė no, it says for I equals one to N. Sorry. Everythingís cool here. The VI is this thing. Itís F over Ė thatís a constant Ė E is Youngís modulus, WI Ė these are the variables, so this term Ė hopefully, I can get this right. Yes. Okay. What is that term in GP terminology? Whatís that?

Student:Monomial [inaudible].

Instructor (Stephen Boyd):Monomial. Thatís a monomial, right? Itís got a positive coefficient in front because I is one is the smallest thing here, right? F is positive. Itís a positive force on the tip. Youngís modulus, positive, and then itís actually W to the minus one, H to the minus three Ė HI to the minus three. Okay? So thatís a monomial. And now you add this one thatís gonna go from the tip back plus this. So actually recursively here, the first one is a monomial that Ė sorry, the last one is a monomial, and going backwards you take a monomial and you add it to the previous one. So by recursion, the VIs are monomials Ė sorry, posynomials. Theyíre all posynomials. Theyíre sums of posynomials all the way to the front. Then you get the YIs this way. Well, each YI is the next one Ė well, in the first one itís gonna be this plus that. Thatís in fact in the last case itís actually a monomial, so the last YIís a monomial. And the second one is posynomial, and all the others are posynomial. So what that says is that this is quite Ė these are very complicated expressions. You wouldnít want to see them written out, but they are indeed posynomials, all of them. So thatís actually quite interesting. Itís not remotely obvious. And let me say some of the things that it implies. Thatís not obvious, and I really doubt Ė actually, I think I can say with extremely high probability thereís absolutely no one in mechanical engineering including someone with super good lots long design history stuff like that would know this. And let me just say one of the things you can conclude right away.

It says that if you have two designs, two cantilever beam designs Ė so imagine two. One is quite wide and high, and it makes no difference what they are. Iím gonna form a compromised design from the two beams. However, what I know to do is the following. I know I should take a log of everything. So the compromised design, instead of taking like WI and WI bar, and taking the average of the two Ė actually, my compromised design is gonna be the geometric mean because I know I really should be working Ė the problem mathematically is more natural in the log of the variables. So if you have one log width, you have another, I should take the average of the log widths and then take the X of that. Thatís the geometric mean. So the first thing it says is the way you should blend two cantilever beam designs is by geometric mean. By the way, that fact alone is not obvious at all.

And then thereís a stunning conclusion about it. It would be this. If weíre solving this problem Ė if you have two designs that meet all these specifications Ė these are fantastically complicated. When you start talking about the tip deflection and stuff like that Ė two designs that do it. If you take those two designs and form a compromised, a blended design given by the geometric means of the other ones, I can say something about the total weight of that blended design. I can say that itís less than or equal to the geometric means of the weights of the two designs. Okay? So for example, if those were two different designs that satisfied the constraints and had the same weight, that blend could only be better than either of them. Everybody see what Iím saying here? By the way, in other areas Iíve always tried this. I go to a field. I find someone who alleges to be a good designer with lots of experience and stuff, and I talk them through what the implication of geometric programming is in their field. And in only one case Ė it was Mark Horowitz who actually said yeah, that makes sense. He didnít say it right away either, by the way. Actually, I believe him. This is in the context of circuit design. I really believe him. He says now that makes sense. Thatís how you blend two circuit designs and so on.

I think there might be someone who does enough mechanical engineering that this would at least not sound stupid. Actually, it never sounds stupid. The question is whether or not Ė because you could make very strong statements about what happens when you make these blended designs. Thatís just one of the implications of what it means to be a GP. So how do you write this as a GP? Well, thatís a posynomial directly, and you write out all the inequalities, which I wonít write out. I wonít go through this. I think you can go through this yourself. Let me look at one more example of a GP. Itís a more generic one, also very interesting example. I donít know any practical uses of it, but there might be some. Actually, Iíve been itching to find one for years, never have. Here it is. Letís take a matrix thatís element-wise positive, so a square matrix element-wise positive. So you might know about something called Perron-Frobenius theory if youíve had a class on Markov chains or something like that, but Iíll tell you what it is basically.

Itís this. It says that the eigenvalue of A Ė Aís not symmetric, so it can have complex eigenvalues. But it says that the complex value of largest magnitude Ė that itís spectral radius is that magnitude Ė is positive, No. 1, and also the associated left and right eigenvectors can be chosen to be positive. So thatís Perron-Frobenius theory. I wanna see how many people have seen this in some context. Iím just sort of curious. Thatís all? Markov chains. Somebody took a class on Markov chains, right? Whatíd they say about that? Really? They didnít mention this? Okay. How do you know that the steady state equilibrium distribution is positive? Or in this case positive. This is the strong form. Where did you hear about it?

Student:A Markov chain class.

Instructor (Stephen Boyd):A Markov chain class? Here? No. Itís not looking good for Stanford, is it? Where was that?


Instructor (Stephen Boyd):University of Washington. Oh, okay. All right. And this is called the Perron-Frobenius eigenvalue. Itís positive. And of course, what it does is it tells you the asymptotic growth rate of A to the K. So if you take powers of the matrix, if itís a dynamical system with positive entries, it says itíll tell you the growth rate. So for example, I mean this would come up in lots of dynamical cases. This would be for example an economy. It would tell you how an economy Ė or it might tell you how a multispecies system would work. Actually, the condition that A should be element-wise non-negative means that everything is I guess you would call it excitatory and nothing is inhibitory. I donít know if those are words, but you know what I mean, right?

So it means you start with a positive X. These are [inaudible] concentrations of something, and the concentration of anything does not inhibit the growth of anything else, so it would just lead to it. It would increase it. It would be excitatory. So in that case, A to the K tells you how this dynamical system would propagate, and the asymptotic growth rate, which could be negative by the way, is simply completely determined by this number. If this number is 1.05, it says that by the time you do this 100 steps, youíve grown like 1.05 to the 100. If itís 0.96, it asymptotically decays as 0.96 to the K. So thatís what this is. Fantastically complicated function of A, right? Because you calculate the eigenvalues of A, which is to mean you form the characteristic polynomial, which as a function of the entries of A you donít wanna see written out because it has something like N factorial terms. Thatís No. 1. Youíre gonna have to write Ė youíre gonna have to find the roots of that polynomial, which again you canít do if N is 5 or bigger because there is no formula for the roots of a polynomial of degree 5 or bigger.

And then, once you get these N roots, you wanna take the absolute value and take the maximum of them. And I think by now you should be convinced, this is a fantastically complicated function of the matrix A. Okay. Well, another way to characterize this is that the Perron-Frobenius eigenvalue of a matrix is itís the smallest lambda for which AV is less than or equal to lambda. By the way, for some positive V Ė it turns out here the first thing you actually end up showing is itís that. It turns out you never get inequality here. Itís always equality here, which of course makes sense because now lambdaís an eigenvalue like that. But whatís amazing is this Ė is you can now minimize the Perron-Frobenius eigenvalue of A where the entries of A are themselves posynomials by making a problem that looks like this. You minimize lambda subject to A of X sum Ė and these are all monomials in the variables, which are X, V, and lambda, and these are posynomial inequalities. So basically, log sum X comes out of this. Thatís how that works.

And you can make up some fake applications of this, I suppose. I think we attempted one in the book or something like that, but you can Ė itíd be nice to actually find a real one somewhere. Thatís a challenge, by the way. So I think what weíll do is weíll quit here, but let me say just a few things about where we are. Weíre sort of marching through all these problems. You will get to solve in a theoretical and in a numerical and practical sense every type of problem weíre talking about multiple times. So weíll be doing all this for the next seven weeks actually, weíll be doing this stuff. So this is just your first introduction.

[End of Audio]

Duration: 73 minutes