ConvexOptimizationII-Lecture09

Instructor (Stephen Boyd):We should make sure the pile of projects is somewhere. Whoís got them?

I guess weíll start. I guess the most obvious thing, I guess most of you have figured out by now is we have looked through these preliminary project proposals, and theyíre floating around so make sure you get yours. Do not throw those away because we didnít copy them and we wanna make sure thereís progress, so when you resubmit them, we want to look at the old one again too; and then judge whether enough progress was made to justify our even looking at it again.

So let me make a couple of Ė the big announcements, and then I have a list of complaints. I just happened to have read all of them this morning. Again, this morning in order, so; but let me say a couple of things about it. They were okay; we marked up a lot of things. You might not be able to ready what we have so you have to be in touch with us on these, so theyíll all be resubmitted.

A few were in really good shape and some of them we just sort of, we didnít even get it, so, or thereís bad notations, so we just kind of marked it up and then stopped. So what weíll do is I think itís next week, nominally, is that correct? The end of next week itís the mid Ė what do we call it? The mid-term something or other, so by the end of next week, youíll resubmit. By the end of next week, it should not just be a rewrite. First of all, a lot of them just need serious editing. Some of those, by the way, you could get to us earlier. That would actually not be a bad thing. But by the end of next week, it should actually start having Ė I mean, there should be a problem. You should have tried some of these things. I mean, these days with things like CVX, very, very easy to actually do these things. Itís faster to actually write code than it is to write a Lay-Tech and English description of it thatís coherent. So by next week you should have done that. So the way thatís gonna work, and weíll announce this on the website, is that youíll be turning in your Ė I guess the next round of these things next week. If we didnít like your typesetting or something like that, turn that in immediately to us because thereís just no reason for us to read things that are sloppy, so okay.

So what Iím gonna do now, I know this is weird, but Iím just gonna rant for about five minutes, then Iíll be fine and then everything will be okay and then weíll move on and stuff like that. So here are things. Iím gonna say them once and then I donít, I turn it back in, I donít want to ever see any of these again and anything that comes in from anybody in this class. It actually drives me nuts if I see it from anybody. All right, so Iím just gonna say it once just so itís said and official and on the record. I have a list. So the first thing is capitalization. We found this completely random capitalization. Sections, half capitalized, half-lower-case, unacceptable. So choose a consistent style and stick with it. We donít care what it is, there are many choices. Titles; title of a paper is generally all caps. By the way, you should use our template. We didnít put it up there to not be used. So I know some of you decided to typeset things your own way, donít. Typeset things our way. Your way, itís okay, there are other styles but in this class, youíll use our style, period. Itís just easier for us and for everybody involved. So the capitalization is silly. Look at section titles, I mean, and subsections, and there are; this is the standard, and so if you decide that your capitalization is this for a section, the firs word is capitalized and nothing else, fine. Make it that way for every section, okay. I mean, but itís totally unacceptable to have these, and youíll see us mark these things in these things.

The other thing is problem format, so a lot of people did use our templates. Other people decided to typeset things themselves, donít do it. Use our method, so I donít want to see Macs and then S-key and some weird thing where it was right justified or something. I donít want to see your creative Lay-Teching. It should be our format, period. So basically if your project, if the thing you produced doesnít look like the materials we produce for this class, weíre not gonna read it. So just know weíve marked these Ė I donít want to see your Lay-Tech things. For Lay-Tech, there are some really basic things I want to talk about. I mean, these are horrible crimes in Lay-Tech. One is to typeset a mathematical symbol. I mean, to use a mathematical symbol and to put it Roman. Like to say yeah, so the variable x, which is just the letter X. Come on, this is amateurish. Some of you did this, not many. Clean this up instantly. We donít want to see it. I mean, thatís like so basic. Now the flipside, Iíll explain a little bit. So the flipside is this; when you have a subscript or a superscript or something like that and if it represents or derives from English or any other spoken language, then it is not to be typeset in math, okay. So for example, if you want to have a Ė for some reason you want to talk about some f as some variable and you want to talk about the sub-optimal point, chosen by some scheme youíve come up with, and you want it to look like this, letís say, okay. You want it to look like that. That is math, that is English or derives from English and therefore, the only correct way to do this in math mode is this. Thatís Ė and then this is math RN; so thatís the only way to do it, okay.

So thatís the only way to do it. So, and Iím gonna say these once and I donít ever want to see these again. I want to see these looking right. If you look through everything I write and other people who write well; if you look at that, youíll find these things are followed flawlessly. And by the way, if you want to see an example of a good, perfect style, look at Kenuth here, because thatís a perfect, perfect Ė there are no exceptions of any kind in anything heís ever written. Itís not exactly, itís not the format I use, but it is flawless. Just take a look at it. You will find zero inconsistencies. For example, he has a rule which I try to follow, which is this: no sentence starts with a mathematical symbol, for example. Letís see, a couple of other things. Okay, if we go to Ė for the typesetting just follow our format. It Ė full page, 12-point article. I donít want to see cube column and your fancy thing. I donít want to see a line and everything. I know how to make an H-rule too. We donít want to see it, so just use ours. Itís generic and thatís fine; and also we have a rule that when you declare a problem, you must say what the variables are and what the data are.

Also, I guess just some basic things. Oh, let me mention one other thing, sorry. Punctuation in equations, so this is something thatís actually not done right in a lot of cases, but thereís no reason you shouldnít do it. You cannot just have an equation out in the middle of nowhere. You canít finish a sentence and then all of a sudden write y = ax + b. You just canít do it, it makes no sense. Itís not English, so the correct structure is something like this. If you say well, we have an output y and we have state x and blah, blah, blah and they are related by, then you can have an equation that says ax = b. If thatís the end of the sentence, thereís a period after that thing because the type of ax = b, the syntactic format is, I mean, in that case it is itself a little phrase and it has to have period after it. I mean, if you want to say they are related by, then equation ax = b, you have a comma, and you can say where a is the blah, blah, blah, blah, blah, thatís fine. There has to be a comma there, so this is something Ė itís just, I mean, I donít see any reason why you shouldnít just write this way. Thereís just no reason not to.

Okay, oh, back to the Ė when you state a problem, No. 1 use our format, that will come by using, just following our template. So, youíll follow our template, but then you must say what are the variables. Oh, by the way, another of the most amateurish possible Lay-Tech error is to put a Ė I mean, this is just complete amateur stuff Ė is to put a blank line after an equation, which will, of course, in Lay-Tech, generate a paragraph, okay. So thereís no reason for us to see things like that, right. Thereís just absolutely no reason for us to things like this. To see y = ax + b, something like this, blank line and then this, okay. Thereís just absolutely no reason. This is just like, this, is Ė in fact, here, let me make it worse. Thatís the most Ė weíre not even gonna accept this, right, because this is Ė I mean, thatís wrong, that needs to be like that. This blank line is completely wrong; this blank line has a meaning. Itís interpreted by Lay-Tech; itís not what you want and all that kind of stuff. So things like this we donít want to see. Oh, the other thing is just like spelling errors. No reason for us to accept spelling errors, right. So Bib-Tech, use Bib-Tech and use it right. So, not only must your references be in Bib-Tech, but it must be done correctly, and I want to mention a few things here, because these are actually things that are reasonable. Let me say what they are. You must use Bib-Tech, donít trust if you go to Google Scholar or something like that or Site Seer and you import Bib-Tech.

A lot of those are amateurishly done and incorrectly done. Read the manual, I mean, read the first page of the Lay-Tech stuff. Read about Bib-Tech. There are mistakes when you get them off of like Google Scholar, and there are some conventions that depend on different, like the Commonwealth English speaking or whatever. So let me show you what some of those are because theyíre wrong. This is not right, so this is not right; nor is this with the periods, although in British or U.K. convention thatís okay. Itís not okay in this class. So that actually should be this, AR. The other thing, of course, but again you have to look at this, use this. The other thing is in Bib-Tech you need to know to escape things; otherwise theyíll be made lower case. So Bib-Tech will ignore your Ė itís gonna ignore your capitalization, itíll enforce its own because it knows it can do it more consistently than you can, but it wonít know to capitalize things that you donít tell it. So for example, if thereís a colon or something in a title and you want the next thing to be capitalized, then this has to be escape, okay. Like that, if you wanted that to be, youíre gonna have to escape it. So, okay; so these are Ė Iím just going over, for those of you who came in late, Iím just ranting because I read 30 projects in a row this morning to review them all. Okay, now, for some of these we never found out what the problem was. We got a lot of background, we heard about this, that and the other thing; we never found out what the problem was. So what we want is we want to get quickly to something that says, hereís the problem, and it should be a mathematical format. Say, you can have some stuff; weíre looking for this, that and the other thing. The next Ė we want to see quickly what is the mathematical problem. The idea is that that problem should be understandable by anybody who is in a mathematically sophisticated field. That includes math, applied math, computational mathematics and engineering, anyone in electrical engineering, computer science, statistics. So you want to write high BBC style math, right, thatís understandable. Nice accent, understandable by everyone across the world who does math. So it means you canít have Ė itís not like youíre telling a story. Like you say well, we did this, then we added that and oh, and I forgot about this. Did we tell you about this? And also we want this to be smooth. Oh, and also thereís these other constraints.

No, youíre gonna get to the problem and say Ė so you can say you were doing portfolio optimization. You can have a paragraph thatís English; I view this, I see it in formal XML. So I see an English description that says we want to do portfolio, you have a number of investments, notice this is English, a vary and mean return and the riskiness of the return, theyíre correlated and blah, blah, blah, and you say the problem is to allocate, to make an investment that whatever, maximizes return subject to some down [inaudible] risk. Thatís English, then it starts. Let x denote, thatís if you following the Kenuth rule that a symbol does not start a sentence. You say, let xi denote the amount invested in asset i, but there should be a problem there that anyone in math, a random person in the math department you can grab and say, read this and they should look at it, read the first part and say, I donít know anything about investments; but they should read that second part, read your thing and know exactly what the problem is. Everyone see what Iím saying? So thatís the high BBC style. It shouldnít have any jargon from some field or something like that. Itís just readable by everyone. So okay, in a line we never really Ė in a bunch of the proposals, we never really got that crisp statement. That crisp statement that I can take to, for example, Brad Osgoode or one of my friends in statistics or anybody I want on campus who, or actually throughout the world, who is sort of, is trained in any mathematical field, and say do you understand it? Theyíd say, well look, I donít know anything about finance, but I sure understand Ė that problem I understand perfectly. There should be no undefined symbols and so on. Okay, the other thing is that we want Ėa lot of times we got the problem, but we didnít really get the approach. Thatís okay, because that was a preliminary proposal for your project, so but when you resubmit, weíd like to hear a little bit more, we want to hear about the approach because you should not be at this point Ė you should have thought a little bit about approaches.

Oh and then Iíll get back to something. Oh, and the other thing I want to say, please never mix the problem statement method and approach, ever. So these must be separated by paragraphs, or even better like sections. In fact, a very good way to do it is to label each paragraph with, in fact, the paragraph label in Lay-Tech. This is actually a very good way to do it, in developing a document. Thatís typeset in bold, itís a paragraph title, itís like a sub, sub, sub-section, and you write what the paragraph is about. You can later come back and comment all these out as you generate the final product, but if you canít write something like this for each paragraph, itís not well written; and you could say something like the approach, our problem, background or other methods. So, I often start by writing everything this way and then these get commented out in the end. No one knows theyíre there, but this is a very good way to develop stuff.

So I think this may finish my Ė that finishes my ranting and ravings. Actually, we think some of the projects are actually going to be Ė I mean, a bunch of them are gonna be really good, so it should be fun. Just come and talk to us. These minor things, like I said, like Bib-Tech, all that, putting it in our format, misspellings, you deal with that; and then if you want, bring the clean version back to us and then weíll talk. But do that quickly so we can give you some real content. So if we stopped reading because we didnít like your typesetting for example, which I think is a legitimate reason to stop reading, then come back and Ė fix it and then come back to us and weíll talk about the content and stuff like that. So okay, so thatís it. Okay, so that finishes my rant. Now, actually some serious stuff; we finished last night two lectures that may be useful to some people. Theyíre preliminary, we havenít posted the code yet, but we will. One is on sequential convex optimization, so; and weíll post the code as soon as itís clean enough. So this is for people doing non-convex problems. I know thereís a couple, Mike for example, or you guys, so read those, and honestly for you guys it would have been better if weíd gotten that lecture done earlier, but its fine. But go and look at that, it should give you some rough idea about the ideas of sequential convex optimization and we have another one thatís on MPC, model predictive control. So, and I donít know, Iím not sure that any projects are directly using that, but you might. You could, if you wanted to go the stochastic case, so, but itís related. So okay, okay, donít mind me. Weíll continue.

Today is my big office-hours day, so Iíll be around all day and please come by, and if you want to talk first, if you want to get me today for content style feedback, before you go and fix all of the Bib-Tech, Lay-Tech, all that kind of stuff, thatís fine. Okay, letís continue with decomposition methods; so, okay. So decomposition methods, we looked before. Itís sort of a generalization of this idea that if you had a problem that was completely separable like this, you would solve the few parts separately and the idea in a decomposition method is you should imagine a problem where the problem is almost separable, meaning that you can imagine x1 has dimension of 1,000, xq has a dimension of 1,000 and y has a dimension of 5.

And so the idea is you have a problem with 2,005 variables, but itís got a weird sparsity pattern. The sparsity pattern is that only five variables kind of couple across these, so you should see a graph with like maybe two full graphs and like five little edges or something in the middle or five nodes in the middle that kind of connect to both sides. So, and thatís a canonical ones. None of the mathematics requires anything, one thing to be one size or another because it all works no matter what, but thatís sort of the picture and the question is how to solve this in that case. And we looked at primal decomposition, primal decomposition is simple. You simply fix this coupling variable and you say, and you minimize the first one and the second one; but now they can be done separately because once you fix the y, like I like to think of it in terms of that graph. When you fix y, you are actually sort of removing those nodes as variables and then the graph disconnects and youíve got two, letís say, dense graphs and those can then be solved separately once it disconnects. Now the problem is that if you fix y, and you minimize that, minimize that, thereís no reason to believe thatís the global optimum, but if you optimize over y, you do get the global optimum.

So primal decomposition works by this method, you simply broadcast what y is as a complicating variable. You have a question?

Student:[Inaudible].

Instructor (Stephen Boyd):Yeah, right.

Student:It can be a very complicated variant, right?

Instructor (Stephen Boyd):Right, so thatís a very good question. How do you get expressions for fi1 and fi2? So in some applications you really do get expressions for fi1 and fi2. In the vast majority of applications, thereís no expression for fi1 and fi2. Theyíre not Ė in fact, as we move forward, the interesting thing to find out is, how will be access fi1 and fi2? So weíre not generally looking for formulas for them. The access to fi1 and fi2 is going to be the following.

Weíre gonna evaluate fi at a particular point, so it requires only that we evaluate fi and thereís one other thing, thereís one other method we need to implement, is we need to implement fi.subgradient, okay; so we have to get a subgradient of fi. So in this case, if Ė letís suppose that this represents something like a linear program or something like that.

So this is a linear program, then in fact this function is piece y, is like piece y is linear or something like that, but itís fantastically complicated, right. Even a linear program with like ten variables and 50 constraints, the optimal solution is indeed piece y is linear, or a quadratic program even. Quadratic program is piece y is linear, but the number of sort of regions it has, so if you want to write sort of an expression for it would be like 1030 or something, but youíre not gonna write an expression for it.

What you need to do is, you need the ability to evaluate fi1 of y and a subgradient; so itís very important to think about. We write the symbol, but the question is what do you actually have to implement to carry out the methods. So, and the answer is you have to evaluate and evaluate a subgradient. Those are the two things we have to do.

Okay, so thatís a distance, so you see, hereís what actually happens. You actually have to minimize this function in calculating subgradient, so thatís what you have to do. In particular, you do not have to form an expression. Now, there are cases where you form an expression. I mean, if for example, this is like quadratic problem Ė by the way if you have a quadratic, pure quadratic unconstrained problem, convex, and you minimize over one set of variables, whatís the result in the remaining variable?

Student:[Inaudible].

Instructor (Stephen Boyd):They have a quadratic function. Itís this problem exactly right, so this function is quadratic jointly in x1 and y, and then Iím asked to optimize over x1. You minimize a quadratic, which you can do by linear equation, and what is the remaining function? If you minimize over a quadratic, itís quadratic and what is the quadratic form? Now, for a quadratic you can give an expression, because you can give a matrix. What is the matrix? Sheer complement, exactly.

Okay, so that would be a case where thereís an explicit form for how to do this. You minimize this thing and you actually form a sheer complement and stuff like that. That pretty much wraps it up. Thatís pretty much the only case where thereís an explicit form.

Okay, this is final decomposition and we looked at an example and we got the dual decomposition and then this is where weíll start today. So, this is how dual decomposition works. It is very interesting. It turns out itís different, but Iíll tell you what it is. Again we want to minimize f1 and x1y plus f2 of x2y and what we do is we now split y into two variables, but we insist that theyíre equal. Now, this is the kind of thing you do in duality all the time, where you make some incredibly stupid transformation and you think, how can a transformation this stupid actually lead to anything good? And anyway, we can see then it does.

So hereís what we do. This is, by the way, if you like to think of it, I like to think of y1 as a local version of the complicating variable, y2 as a local version and then this is a consistency constraint. So, if for example, you are sharing area and power, for example, an area and power budget in designing an integrated circuit and there are two subsystems here, then y1 might be the amount, the number of milliwatts that this guy gets. Letís jus make it power, and then y2 over here, is actually also the number of milliwatts this guy gets. So this guy gets some power budget minus y2 or something like that.

So basically what they do is they would both have a version of what this is, and this consistent, this means consistent. Okay, now we do this, we form a Lagrangean and you get this first function plus that. Thatís the objective plus new transposed times the quality constraint and now what you realize is that L is separable. It splits into x1,y1 and x2,y2 because these separate precisely because we split y into two versions, here local versions and then this thing is linear and so itís separable.

So it splits, and what that says is that if you are asked to minimize L, you can actually split it and do it independently. At two processors, two locations, however you want to visualize why itís good to do it separate. These separate and so g1 you can calculate separately from g2, like this, and we can actually have all sorts of nice meetings of what new is here. We can think of new as a price, for example, here. New is a Ė weíll see later if we can think of new as a price.

Okay, now the dual problem then is to maximize this thing, and now I have two functions that can be evaluated separately and so in dual decomposition hereís what happens. You distribute not a resource allocation to the two subsystems, but you publish your price; new, and then each one uses the price to calculate an action and then thatís what it does. This is actually an Ė this says if you like, if thatís a cost Ė the news by the way can be either sign, so they can be prices or subsidies. Whatever you want, but what this basically says is, go ahead and use as much y1 as you like or as little. Please, go ahead, but you have to account for it in your accounting and for example, this new transposed y1 says these are the prices, and this says basically youíre paying for it and then it becomes a subsidy for the other one. So thatís how that works, and the mater, the dual problem manipulates the prices and you can see clearly what happens. If you crank a component of new up, thatís gonna tell this guy, that component of y1 is more expensive and then when it resolves this problem, presumably itíll arrive at a point where thereís less y1, less of that component of y1 being used. So thatís the idea, okay. Now subgradient of either of these is actually nothing but the residual and that comes from stuff weíve done before, but hereís dual decomposition, and this is using subgradient method, but you could use ellipsoid method, analytic center cutting-plane, anything you want; any method for non-differential optimization. So it works like this, you solve these dual sub-problems, then you update the dual variables. This is the subgradient update and what happens in this case, is at each step you get a valid lower bound on the optimal price, but you donít have feasibility. You only appreciate feasibility in the limit in dual decomposition; so, which is interesting. Actually, if youíre ever feasible here, you stop, because then youíre off. So, okay. So let me say a little bit about how you find feasible iterates. When you have Ėat each iteration you maintain an x1 and a y1 and an x2 and a y2, and itís not feasible in general, meaning that you donít have y1 as y2. So basically your two systems have sort of designed themselves separately. They have a variable that is supposed to be the same, but theyíre not the same and so the idea is to adjust prices to drive the inconsistency to zero, to drive the two to consistency. However, if they stop for a minute, I mean, if I have one electronic sale driving another, for example, this one might design itself assuming itís driving, for example, a capacitive load of whatever, 10 femtofarads or whatever.

This one might design itself with the requirement that it present a load that is 12 femtofarads, or whatever, so theyíre off. And if you put them together it wonít work, because this guyís driving a higher capacitive load than he was designed for. See what Iím saying? So what you do is you just start, you adjust the prices to adjust these so that these 10 and 12 femtofarads, so the loads actually come into consistency. When they hit consistency, theyíre done, theyíre absolutely done. When this guy is driving a 10.3 femtofarad load, was designed for that and the other one has designed that sell to present a 10.3 femtofarad load, youíre done. Thatís the picture, okay. Now, what you can do, is when youíve got an iterate here, a very good guess of a feasible point is sort of the average, like this. You take y bar, and this is essentially the projection of this pair of infeasible points onto the feasible set, because the feasible set is just y1 = y2. That gives you an upper bound, but it might be this could be infinity in general cases, in which case itís still an upper bound, itís just not a really informative one. So, okay, now you can also do this. If you have a method that says f1, if you have a primal interface to f1 and f2, you can actually do things Ė like a prima interface says Iím fixing your interface variables, please optimize your internals to minimize your costs. So, if I make the call to f1ís primal interface, if I say, please fix Ė primal interface means this is exactly the interface youíd use if you were running primal decomposition.

Primal decomposition says Iím assigning you this variable. Now you go and optimize your private variables to deal with it. Dual decomposition says; use whatever you like on these interface or public variables. Iím charging you for it for subsidizing before. Thatís dual decomposition, so if you are actually able to minimize this over x1 with y bar, thatís a primal interface, then you can get a better upper bound here. You have to do better because this one kept x1 and x2 the same. Okay, so this is the same old example we did last time. I mean, itís a trivial example, but here it is. So this is now a function of the Ė this is the price, or the dual variable, and the first system as a g that looks like that, a dual function and the second system looks like this, or this is the first. It doesnít matter, and you add them up and you maximize and I guess the optimum is somewhere around in here like that. So thatís the picture. Okay, and this would be dual decomposition using bisection, and letís look at dual decomposition at every step always gives you a lower bound. Looks like the optimum value is a little bit more than 1.7. This is that crappy lower bound where you simply run dual decomposition, average the two yís and see what the objective is and you can see, by the way, that by here you could terminate with a pretty Ė with this known duality gap.

If you actually have a primal interface to the subsystems, then you can see that actually you could stop after Ė I guess thatís three steps. Essentially, after three steps, you have done two things. You have actually produced a feasible solution to an objective of whatever that is, 1.71, letís say; and you have produced a lower bound from duality thatís like 1.705 or something like that. So for many problems thatís called quitting time. So, thatís how this works. Now, this one would require both a primal Ė this would require a primal interface to the two subsystems and this requires a dual interface, so this is how this would work. Okay, so the interpretation I think weíve already gone over this, but hereís one. You can think of y1 as a vector of resources consumed by the first unit and y2 is the one generated by the second and you have to have something like supply equals demand. Then you set new as a set of prices here and the master algorithm adjusts the prices at each step rather than allocating resources directly. Thatís what primal decomposition does, and the reason you get the minus and the plus is that one is a supplier and one is a consumer. So, you go back over here to look at what happens. Here you go. If you want to know why the sign difference here, like this, itís because one these is going to be interpreted as a supplier of the goods or the resources and the other as a consumer. So one gets revenue by selling it, and the other gets Ė pays to use it and thatís how that works. So this is sort of the Ė and as you can imagine, this is all very basic in economics and stuff like that. Itís not this clearly stated, by the way, in economics, but oops, Iím not supposed to say that. Sorry, itís just a different language. Okay, so thatís that, okay.

Okay, now weíre gonna look at what happens when we have constraints. And itís not much different here, so letís see how that works. Letís imagine that we have Ė now, the constraints we can group into two things. These are to be thought of as private constraints, local to subsystem one, and these are private constraints local to subsystem two. And then we Ė now these are the constraints that couple. Now by the way, in this problem here, these two problems are actually completely separable. Theyíre only coupled through these constraints, because here, I mean, you could make a more complicated, general form where theyíre coupled through objective and constraints, but just to make it simple, weíre gonna couple only through constraints. So this one is literally the shared resource problem. So for example, thatís subsystem one, or one unit of a company or something like that. Thatís subsystem two, another unit of a company, or thatís like cell one in an integrated circuit, cell two or something like that or in some system and these are shared resources. This says, for example, the total power has to be less than 30 milliwatts and the total area has to be less than a millimeter square or your total warehouse space has to be this, or some regulatory thing that covers everything. So thatís it, and then the real question then is to figure out how to allocate these shared resources between the two systems, right. So for example, the first component of this says that the sum of two numbers has to be less than zero, and the question is well, how do you do that? Do you make this one -1 and this one +1? Do you make this one +5 and this one -5? That kind of thing, thatís the question of the resource allocation. Or do you make them just zero and zero, so thatís the question.

Okay, all right, so this is the picture, and primal decomposition again, itís kind of obvious. It basically says the following. You introduce a variable t, which is a fixed resource allocation and you simply say, t is the number of resources youíre gonna throw at h1. Sorry, at the first subsystem. So you say, okay, you get this much warehouse space, this much cash, this much this and this much power or whatever you like, it doesnít matter. These are your fixed resources and then this sub-problem one says with these resources fixed by the central office, you go and operate yourself as optimally, given these resource allocations. Well, if you give t to this guy, you have to sort of take it away from over here, because the sum are zero, each component. So here it means you put, you take it away over here. Well, take it away if the entry is positive. If itís negative here, youíve taken it from this guy and youíve given it to this one. So you say minimize this second system subject to this and the t and Ėt guarantees that if you add these two together, itís less than zero. Thatís how youíve done; you have an explicit decomposition of the resources. Youíve said you get 30 milliwatts; you get 70, just period. Now it says deal with it. One possibility, of course, is that these are infeasible, because this thing needs more than 30 milliwatts, letís say. Thatís no problem, because this will simply return the value of fi1 to infinity. So thatís simple enough, and the master problem is to minimize the optimal value of this thing plus the optimal of that one over t, and this is sort of the idea here and the cool part is that once youíve fixed t, these can be done separately. Okay, so final decomposition works like this. You have a value of t, you solve the two sub-problems, and you have to find the Lagrange multiplier, Lambda. To find a Lagrange multiplier Ė sorry, a subgradient. Iíve already given it away. So here, if you wanted to find out a subgradient for this problem Ė itís absolutely classic. If you stare at this long enough you realize that t is a vector here.

This is actually something you study in 364a. Itís the optimal Ė it gives you the Trainoff curve of optimal cause as a function of these resources. Itís convex in t, thatís clear; but the other thing is that if you calculate the Lagrange Ė if you solve this problem privately and then look at the Lagrange multiplier here, for this thing, if you call that Lambda one, that gives you exactly a subgradient of this function. Here you get a Lagrange multiplier and because thatís a minus sign, it switches the sign. Itís minus. Okay, so find Lagrange multiplier and Lambda one and Lambda two, and Iíll give an interpretation of what these are in a minute, and then it says you update the resource allocation, and so that basically says you update t. This is in pure subgradient, you could use anything else. You could use ellipsoid, any of these other things, smooth subgradient, all these things. This updates that. These are usually the easiest to describe, if not the one that works best, but thatís it. So, okay, now this gives you a subgradient of the sum and in this case all the iterates are feasible. I mean, thatís assuming that these are finite, so if you have a value of t, that works and this is, these are Ė yeah, when the sub-problems are feasible, itís finite.

Okay, so here we go. This is two problems. Letís see. Thereís two quadratic functions, the ci are polyhedra and theyíre sharing two resources here. There are two resources that they have to divide up and this shows the progress of the cost minus p*. So this is as the iteration goes, you can see that theyíre already Ė itís quite close to optimal in terms of sharing, and whatís being worked out is how these two resources are split between these two subsystems. And hereís the resource allocation. They just start both at zero and then immediately the subsystem one, thatís the blue one, starts getting more of the resource. By the way, itís beautiful to ask what Ė I mean, thereís a beautiful interpretation of this. Letís make this scalar, just make it just scalar. So t is a scalar, a single resource is to be divided between two subsystems, then Lambda one has a beautiful interpretation. It is the Ė if it were differentiable, itís the partial derivative of this optimal cost, with respect to that resource. Thatís what it would be. So if Ė letís say youíre doing minimization and suppose that the Ė generally, most minimization problems, the actual cost when you really work it out, is dollars. So letís say youíre Ė the units of fi1 and fi2 is dollars and t1 is power, it doesnít matter. Iím dividing power up between two subunits, right. Then Lambda one has a beautiful interpretation. Itís literally Ė it tells you if itís like five, that tells you Ė letís see if thatís right. Would it be five in that case? It would be Ė sorry; it would be five, and that means that I can do, for every watt of power Iím given, Iíll do better by $5 in cost. Itís the partial derivative of the optimal cost with respect to the resources allocated.

It tells you the marginal value of power. This one here calculates the same thing and suppose itís three. So if I have two subsystems, you design something, you design something, I allocate power and then I ask if you have to design yourself Ė please, I donít even ask you to reveal your design. Thatís x1 and x2, because your competitors and you hate each other. You just happen to be designing subsystems that are going on at the same system. So all you have to do is report back to me what the cost is in dollars of your design, and your marginal utility for power and if you say itís $5 per watt and you say itís $3, then this tells me how I need to shift resources. So, if you can use it more than you can, thatís what the five means, at the next step Iím gonna say, okay was that 30, 70? Because now youíre gonna get 32 and youíre gonna get 68. I donít know if this makes sense, but this is the Ė I mean, these things, once you think about them Ė I mean, you have to understand them on many levels. You have to understand them at the convex analysis, convex optimization level, but itís very important to understand these methods also just make complete sense intuitively. They just make sense. By the way, if I overshoot, whatíll happen is these will switch and now you need more and then Iíll shove back more and thatís actually what Ė thatís what these are. Thatís me going back and forth, basically shifting microwatts back and forth between you and then at each step, your marginal utility is higher and now, but weíre out of the third digit, so actually, in that example this is what we call done and then Iíd be shifting power back and fort between you. But at that point, your marginal utilities are equilibrated more or less, and thatís how we know weíre done. Okay, and this is a resource allocation, so the optimal resource allocation turns out not to be Ė those add up to zero. Do they or Ė? Thank you, theyíre two. Sorry, thank you very much. Thatís two, thank you. I confused myself. Here thereís two resources and then the example I was Ė the verbal example there was one, so, sorry. These donít have to add up to one. This is, you give .5 to this one and then thereís a corresponding down here, the one that looks like itís negative and then this one goes like that. Okay, okay.

So letís do dual decomposition for constraints. That works like this. Thatís very easy. You simply for a Lagrangean and everything just splits out. Itís completely separable. That involves x1, x2 and in this case, these are prices and so again, it works like that. If we go back to our scalar example, youíre designing a subsystem, youíre designing a subsystem. Instead of allocating you 30 milliwatts and you 70, I now say, make a fictitious internal charge in your design of so many dollars per milliwatt, and so Ė but use as many as you want. Youíre gonna use five watts, go ahead, but hereís the price, so thatís it. Oh, so thatís how that works, and then what happens is I publish a price for power, and you guys each design your thing, taking this internal accounting into account, because thatís what this is and then, I add up and then report to me how much power you use. If itís more than 100 milliwatts, thatís my limit. I crank my power up, because too much power is being used and then presumably I then send back you a message which is a dual update message that says, Iíve updated my prices, redesign yourself. You do that, and then you come in. I overshot on the price, now together youíre using 90 milliwatts and so I now reduce the price a little bit and I keep doing that until between the two of you, youíre using 100 milliwatts. Thatís the price update, thatís dual decomposition here. So thatís that.

Okay, so hereís dual decomposition, and itís quite simple. By the way, this is the vector of resources used, and the + is the violation. So this is a projected subgradient method, because here the prices have to be positive. So this is the projected subgradient method, and it works, and of all the other stuff weíve done, it just kind of works, so, and itís actually quite cool. Itís a price adjustment method and so on like that. So, and here the coordination is being done, so you can coordinate through prices and you can coordinate through allocation and so some people talk about market economies and controlled economies, and this was very popular in the Ď60s or Ď70s or something like that; so, to interpret these things in economics. Okay, so again hereís our same example, but in this case, weíre adjusting resource prices, and this shows the prices on the two resources and we started them off, I donít know, I guess we started them off up here or whatever, and they shot around. It looks like theyíre converging to whatever it is, .25 and .35, wouldíve been the right places. So if that were the prices, so then Ė so the idea is if you get the prices right Ė I mean, this is sort of the essence of duality. If you get the prices right, then independently, the market will clear, meaning it will use Ė we will not use more resources than we have, No. 1 and No. 2, youíll actually be optimal. So okay, and this the Ė you get the same sort of thing that in a handle of steps, in dual decomposition you get a lower bound and then this gives you the projected feasible allocation here, and you can see here that by this step, youíre pretty much done, for all practical purposes. Okay, and again, this is the same Ė I wonít go through this. So, okay.

So now what weíre gonna do is weíve talked about all of this with two subsystems, which is actually a little bit boring. Itís the easiest, itís the simplest thing to look at, these two subsystems and you have to understand everything for two subsystems first, but now weíre gonna talk about general decomposition structures, where itís much more interesting, and next lecture weíre gonna do a lot of applications in network flow and I forget what other areas weíre doing. The most interesting ones is when you have much more general decomposition structures. So, hereís what weíre gonna do. I should add that thereís no real reason to force economical form. So the way I view all of this decomposition stuff, is it consists of a set of ideas, so like primal decomposition, dual decomposition, and itís best just worked out on the fly. So when you have a problem, if youíre doing networking or wireless or cross-layer optimization or something like that in communications, or whatever application youíre doing this in, itís best to sit down and simply use the ideas, not the actual formulas, right. Maybe the formulas will fit your case, but generally speaking itís better to just do it, to actually take the ideas and derive it all again from scratch in whatever your particular problem is. But what weíll do here, is we actually will take economical form, not because you should do everything by putting it in economical form, but because weíll actually look at some of these higher order ideas about Ė and actually, itís where decomposition gets really interesting.

Okay, so hereís what weíre gonna do. Weíre gonna have multiple subsystems, not just two. Weíre gonna have variable and or constraint coupling between subsets of systems, so itís not gonna be described by a graph. Itís gonna be a hypergraph; so for example, Iíll draw a hyperedge if there is a constraint that links a subset of the system, right. So weíre gonna have a hypergraph here, not a graph. Okay, and a hypergraph by the way, an undirected hypergraph is nothing but Ė hyperedge is nothing but a subset of the nodes. So it comes up in a bunch of variables, but thatís all it is. Itís subsets of nodes. Now actually, what weíre gonna do is weíre gonna assume is that all coupling is via consistency constraints. In other words, if theyíre resource sharing, if theyíre Ė any kind of equality constraints. Weíre gonna make them all coupling constraints and the way you do that is simple. If you have like two, if youíre sharing your resource, you simply take the copy of the resource constraint, build it locally and then you have to work the consistency. So, without loss of generality, all constraints are consistency, right. So, and it would be something like this. If you and I are sharing 100 milliwatts, right; then you would have a variable, which is how much of that you think youíre allowed to use. I would also have a separate copy of that, which is how much I think heís allowed to use. I will then use 100 minus that number and youíll use that number. If theyíre consistent, it means that weíre actually correctly divvying up the power. So everything comes down to consistency, so all we have to do is drive towards consistency.

Okay, so letís do Ė a simple example would be something like this. Itís already more complicated than the ones weíve seen. So I have three subsystems here and I have private variables like x1, x2, x3 and I have a public variable y1, y2, y3 and y4. Okay, so here the private variables are inside these pins and thatís to use the electronics analogy. The pins are associated with public variables and an edge actually is Ė weíll see later what an edge represents, but an edge is gonna represent a constraint, a consistency constraint. Okay, so in this thing you have two simple edges and this structure represents this problem. You have f of x1, y1, so thatís the cost function term here. Thatís the cost of this one, this cost function depends on both this variable and that variable and then the last one is this and then the constraints are, thatís constraints local to subsystem one, constraints local to subsystem two, and constraints local to subsystem three, and then they have Ė Iíve introduced private copies of the variables and here we insist that y1 should equal y2, and y3 should equal y4. These are the consistency constraints. So without these consistency constraints, everything separates, okay. By the way, the electrical analogy is very interesting and for those who know about electrical engineering and circuit analysis and stuff like that, itís actually quite useful because you could think, and this is again if you donít know about this itís fine, but if you do, you can think of like y1 and y2, if you thought of that as a wire, if you hook a wire between two things, whatís equal on the two pins? What potential or the voltage? So you should think of y1 as like a voltage.

Oh, and what can you say about the currents flowing in here and the currents flowing in here? They sum to zero, exactly; so the first one is like ADL and the second one is KCL. Okay, so the currents, in this case youíre gonna have to do with the Lagrange multipliers, right, because weíre going to expect that. So again, for those of you who donít know about circuits, no problem. Okay, so hereís a more complex example, itís quite interesting. Youíve got like five subsystems here and what happens here is one constraint says that these three variables have to be the same. This one, this one and this one Ė oh by the way, what can you say about the voltages on these pins in this case? Theyíre equal. And what can you say about the currents entering subsystem one, two and four? They sum to zero, thatís [inaudible]. Okay, so weíll get to that. So this is the idea. By the way, this now a quite complicated system, right. Itís not a simple thing, itís not given by a block diagonal with little arrow things like that. Itís a more complex structure here; interaction structure, thatís it. Okay, so the general form is gonna look like this. Weíre gonna minimize a sum of local objectives, subject to some local constraints, but a local constraint and a local objective can mention two things, it is allowed to mention private variables and it is allowed to mention public interface variables. In fact, the excise donít make any difference at all. They actually just go away because you minimize over them. I just draw them here to remind you that there are private variables, but in fact, you donít have to, you just get rid of the xís. You donít even need them. In fact, at some level, theyíre just confusing things.

Then you have this, the only thing that couples. This is utterly separable, except for this, consistency constraints, and consistency constraints say this. For every hyperedge I introduce, Iím gonna have a vector of variables, which is in size, equal to the number of hyperedges here and then ei is gonna be a matrix, itís gonna give the net list or hypergraph and itís gonnna tell you each entry, each pin, which hyperedge itís in. So these eiís are matrices whose rows are unit vectors. I mean, this is kind of a silly data structure for it, but still, itís good enough. So, I mean, itís fine, actually. Ití just tells you, itís a note that tells you which Ė on the y5, the fourth component, which mesh is it in, or which hyperedge itís in. So primal decomposition is gonna look like this. Iím gonna minimize this, Iím gonna fix these things. If you like, Iím gonna fix the voltages, actually, here. Iím gonna fix the voltages and what Iíll do is you distribute the net variables to the subsystems, thatís what this is. This looks like a matrix multiplier, but itís not, it just distributes the given levels, then you optimize the subsystems separately and you get the Ė each of these is gonna give you a subgradient here. Then what you do is you collect Ė if youíre working at what the Ė if you want to get a subgradient of this sum here, you simply do this. You sum Ė yeah, I transposed gi, itís silly. Thatís actually Ė it looks complicated, but in fact itís nothing but summing over the edges. And by the way, this has a Ė these are the currents flowing into these pins, if you want a circuit analogy. This is the sum of the currents on that subnet. That should be zero, thatís KCL. So g is in fact the current, the KCL residual. Itís the amount by which KCL doesnít hold. Thatís what g is, and then it says you update the net variables. So you will update the voltages, like depending on this current. This is the current residual, like that. Again, if you know electrical engineering, thatís how that works.

Dual decomposition Ė and Iíll draw actually a picture in a minute that makes this look like a circuit. So dual decomposition looks like this. Instead of fixing the voltages at the pins, Iím gonna actually fix the currents flowing into each pin like this, but the currents are gonna be consistent, so theyíre gonna be Ė Iím gonna allocate currents, so when I have a network of three things coming together, Iíll say thatís +1 amp, -2 and then whatever, letís do +1, here, and they add at zero. So thatís the idea, Iím gonna just Ė Iíll make something thatís KCL compliant, but now what I need is something like a KVL residual. The KVL residual is Ė youíll set the currents at the pins, and youíll collect the voltages and if theyíre all equal, youíre in KVL compliance. Actually, in that case, your primal feasible and youíre done; but instead youíll actually calculate the average value of the public variable. This looks complicated, it looks like a pseudo inverse, but e is this matrix, itís just got ones on each row. Itís a unit vector, so this is just the matrix formula. It is projection; this is a projected subgradient method. Actually, all it really is, is averaging the public variables over each net and then you update the currents that way and Iím gonna draw a picture and weíll see how this works. So this is this bigger, more complicated example and just showing how these things work and I think I wonít go into all of this because I guess itís not Ė I mean, obviously it works, right.

Thatís kind of Ė thatís obvious, but is kind of cool to actually put these things together and see that they work. I should mention all the source code for this is online. This is actually running, this is a dual decomposition method, and here, this is the residual, this is the total residual, the error in your KVL or whatever, your voltage consistency, or itís just the error in your voltages and thatís it. Okay, so what I want Iím gonna do now, is actually, I want to draw the pictures for what these updates are, and then I think that will make it clear. Four people in electrical engineering, it will make very clear. So let me draw that picture. So letís do this, letís have three pins like that and letís say that Ė letís stick them together like that, okay. So primal decomposition works like this. I fix a voltage on this, I fix a voltage, I enforce a voltage on this net, okay. Now, a voltage says thatís basically fixing the value of like y1, y2 and y3. I just fix a common voltage, I say itís 1.1 volts here and tell each of these to deal with it. So they deal with it and then they will each report a current, letís say new1, new2 and new3, okay. Now, if the sum of these add up to zero, youíre actually optimal, the whole thing is optimal.

And by the way, in the circuit case it means youíre in equilibrium, okay, so thatís what that means. Okay, so instead what happens is Iíll add up the currents here and, for example, if theyíre positive, I will do the following. If these are positive, I will decrease the common voltage, okay, because then less current will flow. I mean, by the way, if you want we can do it both circuits and we can do it in economics. We can do either one; letís do circuits for a while. The circuit says, and in fact, if you want to know what this is, Iím gonna show you what it is. Itís that, okay. So, letís work that out of why itís this. Letís take a voltage on Ė well, thatís a wire, the capacitor; thereís a common voltage y on all of these things, okay. As a result, a current flows here, here and here and the total current that flows out of the capacitor is minus the sum of these currents. If thatís zero, done, everybodyís in equilibrium, okay. However, this current Ė actually, letís propagate forward in time some appropriate small step. If you propagate forward in time, it simply detriments the voltage off of here.

So primal decomposition is basically running this electrical circuit until it comes to equilibrium. When it comes to equilibrium, you will have the following. The voltage here, here and here will be the same. Well sure, because thereís a wire connecting it, but more importantly, the sum of these three Lagrange multipliers will be zero, and thatís KCL, so for those of you who know electrical engineering, this is kind of the picture. Makes sense? So thatí the picture, okay. You can also do it by economics too. Economics basically would go something like this. Iíll simply from the headquarters of my firm announce what y is, just say y is this. Each subsystem, deal with it. They each deal with it, but they come back with a number that says how much better they could do if they were given a little bit more of that resource, okay. Now, if the sum of those numbers doesnít add up, it means I could change the amount of resource, I could change the resources and actually do a little bit better, so thatís sort of the idea there. Anyway, so this is the primal decomposition. Dual decomposition Iím gonna do in a different way. Iím gonna do it for the tube case, because most people donít know about multiple terminal inductors, so if you donít mind. Now weíre gonna do dual decomposition. Dual decomposition is this picture. In dual decomposition, remember, y1 and y2 are not the same. So hereís dual decomposition, thatís dual decomposition. I claim, and letís see how this works.

Well, what is fixed in dual decomposition is a current like that and itís a positive current for this guy and a negative current for that one. So I distribute a current new and one of these subsystems has got a minus sign and the other one itís a plus sign, okay. Now what happens is they get a current and then they develop a voltage here y1 and y2. If the voltages are equal, weíre done, everything is in equilibrium. Not only that, if the voltage across an inductor is zero, then the derivative of the current is zero; so the current is stationary. Otherwise, what will happen is this; y2 might be bigger than y1. In which case, in a small time later, this one will actually decrease. It will actually decrease. It will have new dots; l new dot is y2 Ė y1. Thatís the current equation. So in dual decomposition youíre actually integrating forward this system and youíre waiting for equilibrium. At equilibrium, you will get a constant current, which is the optimal Lagrange multiplier and the voltage across here and here will be what at equilibrium? Zero, yeah. So thatís again a way to do it. So in this case, youíre forcing a current and youíre waiting for the voltages to equilibrate, and in the other case, which would be in the two terminal thing, this Ė the voltage is fixed across them, you have primal consistency and youíre waiting for the currents to equilibrate, which is your waiting for dual feasibility or something like that. So I donít know if this makes sense, but this actually a very, very good way to think of these things, okay. Okay, so I think this finishes what we Ėnext time we actually are going to do instead of abstract decomposition, weíre gonna do specific applications of decomposition methods and things like that; and as I say for those of you who came in after or during my rant or something like that, I should say that weíll be around.

Please get in touch with us about your projects. Some of them were very, very clean, but think about the things I said, or go back and watch my rant and make sure that none of it applies to your project, but come and talk to us because we now know what theyíre all about and weíll have constructive comments for all of them. So weíll quit here.

[End of Audio]

Duration: 70 minutes