Instructor (Stephen Boyd):Today weíre gonna do Ė last time we finished up a bunch of abstract stuff about decomposition, but today weíre just going to look at two or a handful of applications in details to see how decomposition actually works, or where itís actually applied. So the first one weíre gonna do is rate control in a network. Actually, this is sort of a big topic right now, so if you were to look at this stuff, you would, if you were to go to Google or something, youíd find zillions of papers on this topic. Is that bounce in your Ė whatís that?
Instructor (Stephen Boyd):Oh, itís good for you. The monitor was just confusing me. Thatís okay. So letís just jump in and look at rate control. So rate control. This is a very, very basic problem. Itís non-dynamic; itís completely static here. So the question is, how do you assign rates to a bunch of flows in the network? So hereís the way itís going to work. Weíre going to have n flows. The routeís fixed and theyíre not going to be split. Although, as youíll see in a minute, itís going to be fine to do multicast or anything like that. Even split flows, that sort of things. But this is just to make it simple. Once you understand how to do this, you can understand how to do all these extension. So the variables are going to be the flow rates on a different flow. And then each flow rate is going be associated with a concave increasing utility function. So that tells you how much benefit you derive from flowing at a given rate. These could be all sorts of things, the utilities. Iíll talk about them in a minute. The traffic on a link is the sum of the flows that pass through it. So these are flows. They flow on links and multiple flows go through the same link. You add up all the flows on that link and that gives you a total of traffic. And the typology of the network is given by a routing matrix. So Rij is one if flow j, thatís the second index, passes over link i and zero. And the link capacity constraint is the traffic. This is a vector of traffic on the links has to be less than the capacities of the links. So this is the limited resource right here, the edge capacity. By the way, once you understand this, you realize that this exact framework here will instantly be multicast. So in other words, if you have a flow that passes over a bunch of links, splits, and then goes out to a tree, which would be the multicast thing, this handles it instantly. Itíll also do split flows, too, if you care about it. You can do things like put a number less than one here.
None of this matters. So hereís the rate control problem. You want to maximize the total utility derived by running the flows. Thatís going to be this. And thatís separable subject to respecting the traffic, the capacity limits on your edges. So thatís the picture. Itís a convex problem, obviously, because this is concave and this is a set of linear inequalities. So itís obviously a convex problem. In fact, the interesting part is here, since weíre taking the objective is completely separable. In fact, thatís the hint. When you look at a problem, the things that should start lighting up the part of your brain thatís associated with detecting structure Ė by the way, youíll see itís the same part of your brain that does structure as does targets for decentralization. So let me just ask you a couple questions for fun here. If you were to solve this by an interior point method, how would that go? Just roughly how would you write an interior point method to solve that? Letís start with the objective. What is the Hessian of the objective look like?
Instructor (Stephen Boyd):Why block?
Instructor (Stephen Boyd):So diagonal. Itís diagonal. So that should get your attention instantly. And why? Because itís separable. So immediately, you should look at this and separability should light up two parts of your brain, the part associated with decomposition, but really the same part of the brain. And the other part is structure in the Hessian. So the Hessianís diagonal. And of course, theyíre coupled by the constraints. So what that should do when you see something like that, where the objective is separable, but theyíre coupled only by the constraints, is things like dual decomposition, primary decomposition just come to mind immediately. So letís look at dual decomposition. Youíd form a Legrangian. So here you form a Legrangian. And what weíre gonna do is thereís two ways to do maximization problems. One is to keep in mind everything switches or you can put a minus sign in front and make it a minimization problem. So I guess thatís whatís done here. The Legrangian for minimizing minus U is minus U, the objective, plus lambda transpose Rf minus c. And these are going to be positive here. Now of course, when you do this, a very cool thing happens. You write this as R transpose lambda quantity transpose f. Thatís this thing here. Thatís the jth row of that. And then you have this minus c that comes out over here. So here, these lambdaís are associated with the link. Each link capacity constraint. So lambda 4 is associated with the fourth link. And the capacity constraint. And in fact, weíre gonna see Ė itís gonna be clear, anyway, but weíre gonna associate lambdai with a price per unit flow. So itís a tariff or a rate. Itís a price rate for using that link. And in fact, we can actually interpret all sorts of other stuff as this. Rj transpose lambda, R is a matrix that has a zero and one and tells you what links connect into which. So R looks like this. Itís got here, these are the flows.
And then here you have zeroís and oneís like this to tell you how if this is link one, this row, and then itís got oneís in the entries, if that flow passes over that link. So this is that. This is R. And if you look at R transpose lambda, it basically says now you just transposed this thing and you go down the column instead of a row. If you look at a row here, youíre looking at a link. And the oneís tell you which flows are on that link. If you look down a column, this tells you exactly a flow, actually. Because, for example, the first column is a bunch of oneís and this gives you the first flow. It tells the list of links that flow one goes over. So thatís what this column is. And if you take this column, transpose lambda, thatís actually a beautiful thing. Itís exactly the sum of the lambdaís over the route taken by that flow. By the way, those lambdaís have an interpretation of a rate. And then you can even interpret this beautifully. This right here, this thing, thatís actually the total rate, because if you go over a link that costs you, whatever it is, $1.00 per flow rate, per megabyte per second, and you flow over another one thatís $2.00, another one thatís $0.50 or whatever, then altogether thatís $3.50 per megabyte per second or whatever. Thatís what this thing is. So this is actually the total tariff per flow rate on that route. So this is if youíre charged by these carriers to carry your traffic. Thatís what this is. This is, of course, the negative utility. Or if you want, you could flip the sign on it and say itís the utility minus this, and then it makes perfect sense. This is the amount of utility you derive from having a flow rate fj, and this is the net utility, because you subtract off the cost of running your flow over all those edges. So this idea of the lambdaís being price, itís obvious for anything Ė thatís exactly what is it. And this is negative net utility. I want to back up and before I go into this, I did want to mention a few things like what the rate functions might look like Ė not the rate functions, the utilities. So let me say a little bit about that. They have to be concave and increasing. So hereís your flow. And then hereís your utility. Simplest utilities are something like this. That basically tells you something like it violates weíre assuming here U is strictly concave, but letís just ignore that for the moment. This utility just says the more flow you get, the happier you are. And getting another ten kilobits per second, even when youíve already got a lot makes you just as happy as the first ten kilobits per second.
So here thereís no satiation effect. So you donít Ė the marginal utility of additional flow is constant here. So thatís this one. This, by the way, might be a good model if that flow represents email or something where youíre just paid by the amount of stuff that kinda goes through. The more the better. But you can have all sorts of other things. You can have this. Thatís a very interesting utility. What does this utility say? This basically goes up to some f0. Youíre happy if you get to f0, and then above that it doesnít matter to you. And of course, you can do lots of other things. You can do things like this. That says with a utility like this, youíre basically saying, look, the more flow I have, the happier. I really appreciate flow at whatever this slope is, up to some amount of flow here. At this point, I sort of get sated, and this says Iím still happy with more. But the marginal utility goes down. So this is one picture. You can also do this. You can have something that goes to minus infinity like that. And that says I will get infinitely unhappy if you fail to provide me with a minimum flow. Actually, this is kinda stupid, because if you have a minimum flow on a flow here, remove it, and just decrement the capacities of all links it goes over, and the problem remains the same. But still, this is the idea. So these are pictures of flows. Here are some that people use that are very common. One is a log. Thatís extremely common, a log utility. And this would come up in networking, all sorts of other area. And whatís interesting about this one is it says basically whatís equally attractive to you is a percentage increase in flow. So every time the flow you get goes up by 10 percent, it makes you equally happy. Thatís going from one kilobit a second to 1,100 bits per second, one megabit per second to 1.1 megabits per second, equally happy. Thatís log.
And thereís all sorts of other things in between, square roots and all sorts of things. Thatís the picture on the utility. Okay. I just wanted to mention that. So hereís our Legrangian. And the thing to notice about the Legrangian now is this thing is separable. And of course, when this is linear here, and this will work for linear constraints, of course, whenever you have linear constraints and you make a Legrangian, everything splits. So you have perfect Ė this is completely separable. If I ask you to minimize this over the vector f, it can be done at each flow completely independently. It can optimize itself, because itís a sum of functions of the individual flow rates, and therefore, it can be done completely separable. And itís actually quite interesting. If you think about what actually happens when each flow minimizes this function Ė or maximizes Uj minus that, thatís the net utility, is youíre basically doing this. Youíre publishing prices on all the edges, on all the links. So this link, or tariff, youíre publishing prices on the links. And then you tell each flow, you know what, donít worry about the link capacities, but charge yourself, actually really or fictitiously, a certain amount for your flow rate on every link. And then just optimize your net flow. And thatís how this works. Weíll get to these. So the dual is easy to work out. You just minimize this over each fj. Thatís a single variable problem here. And of course, you can write this. You recognize it. Itís a minimum of a convex function plus another thing. And that can be written in terms of a conjugate function easily. So itís negative Uj, thatís a conjugate function. Thatís, of course, the conjugate. Thatís convex. Thatís the conjugate of it. And itís evaluated at minus Rj transpose lambda. So you get this. And the point is, this things here, these things are just scalar functions.
So you get the following dual rate control problem. And actually, the interesting part is the primal rate control is from the point of view of the users. Basically says calculate flows that wonít violate the resources available on the network. Thatís the edge capacities. And make sort of the sum of the happiness of everyone operating on the network maximized. The dual problem is this one, and the variables actually are the link prices. So the dual problem Ė this is often the way it works with primals and duals, as you know now. The dual problem actually takes the other point of view, so you could argue the dual problem is actually the one faced by whoeverís running operating the network. Basically, the dual problem involves prices, and it says how should the network price all the links? Youíre gonna price them and youíre gonna price them so that actually the users are gonna back off and youíll come in under Ė if you adjust your prices right, theyíll come in under the capacity. So these are to be interpreted at link capacities. This is a pricing problem. And what you want to do is you want to price the whole thing in such a way that you minimize the net happiness. Sounds kinda evil. Thatís actually exactly what this is. How do you get a subgradient of the negative of this, which is, I guess, lambda transpose c plus? How do you get a negative of this? Itís very simple. Itís stuff weíve looked at now several times. Itís this. If you have a flow, you have some lambdaís, you evaluate the optimal flow under this pricing. Optimal means that flow optimizes its net utility. Thatís fj here. And you optimize this and you go back here and you look simply at R (f bar), thatís the actual traffic on the network, minus c, and thatís an element of the subgradient set for this dual function, the negative dual. So thatís what that is. And this makes perfect sense. This thing here is actually the traffic violation or something.
If itís positive, it basically says that the pricing Ė well, if itís positive, it tells you that the subgradient of this thing, if you increase the price, somethingís going to Ė whatever it is. Itís all mixed up now, because Iím maximizing and minimizing and squished it all around the middle. But thatís the picture. I should add, by the way, g is strictly convex and differentiable. Sorry, if the Uís are, this is actually differentiable. But Iíll still write this, because then Iím gonna cover cases like where you have kinks in the utility. By the way, if you have a kinked utility that goes up linearly, and then transitions to a new slope, what do you think the optimal solution is going to look like? What would you expect when you solve a utility? On a network, a utility looks like this. Like that. Letís suppose thatís one. What do you think?
Student:Itíll be bunched up or the derivative is discontinuous.
Instructor (Stephen Boyd):Where else are they gonna be bunched up?
Instructor (Stephen Boyd):Yeah. Look. This is like your old friend the exact penalty type thing here. And then thatís another kink. And so, in fact, if you optimize a network with, whatever, 10,000 flows and 50,000 edges, when all the smoke clears, we expect the following to be true. A bunch of flows will be sitting right at one. Some might be out here, because there might be some light traffic areas where itís not really congested. If a flow goes through a really congested area, or multiple really congested areas, where do you think that optimal flowís gonna be? What do you think? Zero. Thatís this thing. And by the way, if that happens, then you can go back and brag and say things like, ďWow, Iím using convex optimization to do admission control.Ē Because thatís actually admission Ė if the optimal flow rate is zero, it says, actually, any amount of flow I allocate to you will lead to a decrease in the public good. Thatís the sum of the utilities. Youíre too expensive. And that would be a flow that goes through Ė if you go through a whole bunch of congested areas, then youíre not worth admitted to the network. So thatís admission control.
Student:Does it have to do with the fact that itís a linear program?
Instructor (Stephen Boyd):No. It has to do with the fact that the slope here is Ė so if I did this, like that, and then these are both curved, it has to do with the non-differentiability here and here. So this could be x to the 0.8 over here, and then this could transition to x to the 0.5. Youíll get exactly the same thing. So itís the kink that matters there. So hereís dual decomposition rate control. It works like this: you start with an initial link price vector. For example, you say, look, everybody is $1.00 per megabit per second. Initialize that way. It doesnít make any difference. Then it says, each route, you sum the link prices along the route, thatís calculating this thing. And then thatís the total price on flow j. So each flow walks along Ė you can even imagine some protocol for this. This is obviously not how real flow control protocols work. But actually, a lot of people have done a lot of work on back interpreting real flow control protocols, as in TCIP, back to this framework. But nevertheless, you can imagine a protocol where you go along the thing and as your packet moves through across each link, somebody, like the operator, goes into the header somewhere, finds a price, and then increments it by its price. Then it gets to the end, and then it returns, you look at the header, and it tells you the total price on that route. So this says, collect the prices on the routes, and then it says you optimize the flows completely separately. So this is completely independent. So every flow just independently optimizes itself. And what is does is it maximizes what it really wants, thatís this utility, minus, and then this is either a fictitious or it could absolutely be a real charge, and it subtracts the charge for using f. Thatís what this is. This makes perfect sense. What would happen here if one of these were zero? Whatís the fj then?
Instructor (Stephen Boyd):[Inaudible] plus infinity. No problem. These are increasing. And if itís costless, youíll just say, not a problem. And youíll open up your flow control, youíll open up your valve to the widest thing it can be.
Student:Are there routers that make a decision which way to send the packet?
Instructor (Stephen Boyd):No. In the problem weíre looking at right now, the typology, completely fixed, the routes are not changing. The next thing weíre gonna look at is gonna do exactly that. Could you combine them? Yes. Once you get the idea on these things, you can solve any of these problems. Routes are absolutely fixed. Everybody got this? So thatís the picture. By the way, thereís no reason to believe that when you set the flows this way your link capacities will be respected. Thereís some obvious things you can say. Let me ask you this, what if you get this to step three? Step 3 says these are basically flows calculated with no direct knowledge of link capacities. It doesnít tell you whether youíre over or under or anything like that. Just says thatís the price. Please deal with it. Please crank your flow us until the marginal utility you get is equal to the price. Thatís all youíre doing is getting U prime equals lambdaj. Then you calculate this and letís actually just figure out some things here intuitively. If s turns out to be quite positive, what does it mean? Itís a slack on the link capacities Ė theyíre all positive and theyíre all substantial. What does it mean?
Instructor (Stephen Boyd):Yeah. It means that the flows are not large enough. The flows are not large enough because the prices are too high. So it basically, you told them some ridiculous price, they all said, ďOh, thatís expensive,Ē and they all backed off to some slow rate, just dribbling stuff down the network. And youíre way under utilized. So in fact, if this is positive, it says that the prices should be lowered and good. By the way, itís a 300 level class, if the signs came out wrong, it wouldnít bother me in the slightest. Thatís your homework to figure out. But the story still goes. So in this case, what would happen is this is the dual subgradient update. It says take the old prices, subtract from that some positive multiple of the current slacks, and then you take the plus part of that. By the way, can a link charge, a lambda, can it become zero? If this becomes negative and you take the plus part, wouldnít this become zero? Is there a pathology there? When you solve an optimal flow problem, can an optimal lambda be zero?
Instructor (Stephen Boyd):Sure. No flow. Any other cases?
Student:If youíre gonna pay people to use it.
Instructor (Stephen Boyd):Weíre not allowed to do that. Lambdaís gonna be positive.
Instructor (Stephen Boyd):When would that happen?
Instructor (Stephen Boyd):You means flattens out? This is simple. If at the optimal flow solution, that link capacity has positive slack, then the optimal Legrang multiple is zero. Thatís all it is. It just tells you that. So how could you get a link that is Ė in fact, someone tell me, what is the simplest case? Show me a case where a link capacity is not active at the optimum? It doesnít sound right, right, because the more flow you get, the more Ė can someone give me an example?
Student:If one link with high capacity was surrounded by links with a low capacity?
Instructor (Stephen Boyd):There you go. Perfect example is just Ė simplest one. Your example was exactly right. But hereís the simplest one. The simplest one would be this: two links like that. This one has a capacity of ten. And this one has a capacity of one. And this is one flow, like that. Done. So obviously, the optimal flow is one here. I donít care what the utility is, as long as itís increasing. Optimal utility is one. So the link capacity here is never active. And therefore, the Legrang multiplier on this one is going to end up being zero, and itís going to be completely determined by this one. So thatís the picture. Okay. So these are fun. Itís like all dual decomposition algorithms, they always fit a story. In other words, you can explain this to someone, you can explain this to undergrads, you can probably explain it to high school students easily. And then you might ask, so why did you sit through 364 and various other classes to get here? What would I say. Iíd say, ďYou know more. You have a more sophisticated view.Ē Or, ďThereíd be no reason to believe that that method would work, and now you know otherwise.Ē You know that it would work. Is that really worth sitting through all those classes? Iíll have to think about that. Iíll get back to you on that.
Student:Is it possible to have a price [inaudible].
Instructor (Stephen Boyd):Yes, it is. So Iím showing you the simplest possible dual decomposition method, straight subgradient. You could do all sorts of others. You could do a two-term recursion and update lambda depending on not just the current thing, but the last one. These things would work immediately much better. And in fact, for this particular problem, a lot of people have looked at you write the lambdaís as the x of some other variables, and you update those. And that leads to a multiplicative update. And Iím just trying to think. There is one minor hole in this, but it doesnít matter. What can happen is if you started with a two high step size here, then hereís what would happen. If you start with lambda large and alpha k large, lambdaís large, so whatís gonna happen is youíre gonna be way underflowing. Your surplus is going to be high. But alphaís large, so it means that in your next step, all lambdaís are going to go to zero, and now everybody goes insane at the next step, because everybody says, sorry, itís costless on my whole route. Not a problem. And you open up your pipe to the maximum rate. So thatís the only hole in it. Iím not gonna do all the details to fix that. You can do all sorts of things there. You could put upper bounds on the flows or whatever. Doesnít matter. Thatís it. Itís actually a quite beautiful method. And you can see this is very far from the silly little things we were looking at before, where you had the two problems and the two subunits of a firm, and theyíre sort of setting the prices and each optimizing separately. This centralizes across huge, vast things. I might add that things like this can actually be done and actually are done. So this is kind of obvious, and Iíve already emphasized it, but the idea is this is completely decentralized, utterly decentralized. The link only needs to know the flow that passes through them.
And the flows only need to know the prices on the links that they go through. So you can easily imagine, for example Ė not that this would do anybody any good Ė but you can imagine making an object-oriented implementation of all this, where very little information is available to anyone. In fact, each flow needs to know absolutely nothing except what is the current price. Whatís the sum of the prices on the links it flows over? So flow doesnít even need to know, for example, how many links it has. Totally irrelevant. All it needs to know is whatís the current total price. Each edge does not need to know anything. Doesnít need to know how many other edges there are in the network, doesnít need to know typology, doesnít need to know the idís on the flow. Doesnít even need to know the flowís going through it. It only needs to know how close is it to capacity. Thatís how decentralized this is. Okay. Now the iterates, actually, are often infeasible. But in the limit you do have this. Thereís lots of ways people sort of deal with this. By the way, thereís something that people do thatís a bit confusing, but itís widely done, and it works like this. They can fuse iterations with actual time. So here, in the subgradient method, the iterations donít actually flow at these things. So in fact, probably the best way to conceptualize this is to imagine tiny little packets that have no payload, just like header, and they fly along the route and are tagged. Each edge tags it with a weight. And it comes back to the process thatís going to run the flow. And then each edge is adjusting the price. And this goes back and forth and back and forth until basically somebody, somewhere, I guess thereís an all clear thing, Each edge finally says something like, all right, youíll never actually get under capacity, but youíll get close enough.
At that point, a little token goes back to every flow that says, weíre cool and weíre good to go. And then you turn on your flow. Everybody see what Iím saying? Thatís the right way to conceptualize this. Unfortunately, a lot of people imagine the iterations here to actually be done. So basically, what happens is, at each step you have the things, when you try a step in a subgradient method, it really means pump that flow down that tube. What happens, then, is actually kind of interesting. And it can be interpreted all via dual decomposition. What happens is this: if you pump too much down a link in this model Ė not this one, and I think this is the one that leads to all sorts of very strange and confusing things Ė if you pump too much flow, what would actually happen in a dynamic situation if I have a link with capacity one, but the total flow going through it actually is 1.1? What would happen in a real network. It depends, actually on exactly what the routers or whatever is supposed to do. One thing it could do is just take 10 percent of the packets and just dump them, just terminate them right there. The other thing is it could push onto a buffer. And if it pushes onto a buffer, then itís basically if youíre asking for a flow of 1.1 down a pipe but it only has the capacity of one, you pump at one, and your buffer is growing at a rate of 100,000 packets or bits per second. Everybody see what Iím saying? So then it turns out you can actually go back in that situation and you get a two-term dual subgradient thing. And this is this whole class of algorithms now in network flow control called back pressure. And thatís exactly what this is. And actually itís nothing but a dual decomposition method. I think I just confused everybody by warning you that there are interpretations of this that are deeply confusing.
Is anyone not confused, because I can confuse them separately. Good. We have stationary point in. Itís not like seven pages long, is it? It shouldnít be. Next. What do people think? Let me try one thing, then. Thatís nice. We can see the weather. Thatís good. Itís sunny outside. Thatís always good to know. Am I missing Ė letís just see. Did this get truncated? I donít know what happened here. I thought so. Thatís Ė Iím sure you know, thereís several things that scare professors. Thatís one of them. The other one is when someone runs up to you and says, ďMy God, youíre teaching in half an hour.Ē And youíre like, ďI am? What class?Ē And itís like, ďMusic history.Ē And youíre like, ďNo, no, thatís a mistake. Itís impossible.Ē And theyíre like, ďTheyíre all waiting there.Ē Anyway. This is the other one, if you walk in with the wrong lecture notes and itís three pages long. Here we are. Now letís see if everybodyís happy. Look at that. When you do this, as I said, whatís happening is only the limit to your flowís respect the capacity constraints. But thereís ways to produce a feasible flow if you have to from a proposed flow. And thereís tons of ways to do it. One is you could do proportional back-off and you can imagine a whole industry and thousands of papers to be written on this. And so it would work something like this, you could let etai be the traffic on the edge divided by the capacity. That, of course, should be less than or equal to one. But if itís less than one, it means youíre under capacity. If itís bigger than one, it means youíre over capacity. And you could do the following: if thereís a proposed set of flows, thatís the current one, and someone says, enough of this silly decomposition nonsense, itís time to actually put some payload in your packets and start flowing, then what you would do is this.
This is a very simple method for generating a feasible flow. It goes like this: you take the flow and you do a proportional back-off, you go over all of your links here. If the worst case Ė this is like the overload factor on your link. So you find the worst overload factor on your flow. By the way, if thatís 0.8, thatís great news, because then this is telling you, please crank your flow up by one over 0.8, like that. If this is like three, it means youíre way, way off. At least one edge is oversubscribed by a factor of three, and it tells you to back off. So this clearly produces a feasible. Thatís a feasible flow. Totally obvious. And in fact, this is actually a step in primal decomposition for this, but thatís often the case in these things. So letís just look at an example and see how this works. So I have ten flows, 12 links. I should add, obviously, this is not for that. Thatís totally obvious. I should also add, just to make sure this is completely clear, if you need to solve a flow problem with 1 million flows, 2 million edges, 5 million edges, whatever, by far the best method would be an interior point method. Period. Thatís by far the best method. By far. So the whole point about this dual decomposition is Ė thatís if you could collect all the data in one place, youíd be way better off doing it that way. The nice part about the dual decomposition is itís got all sorts of data hiding and bits and pieces are encapsulated and all that kind of stuff. Thatís the advantage, not speed, not quality of solution, nothing else, just the decentralization. Okay. So this is just a simple thing. I guess thereís 12 links, or the link capacities are randomly chosen, and weíll use the log utility. By the way, if you go to Google and type in ďlog utility network,Ē youíll find tons and tons of stuff. And a thereís some arguments you could make, like it delivers proportional fairness, or something like that, to everybody. By the way, if you put log utilities, you donít do admission control, and thatís kind of obvious. Because the utility associated with denying somebody entry to a network, a flow, setting a flow to zero in a log utility is minus infinity, which is basically means infinitely uncool and so it doesnít happen. Everybody gets a little bit in a log utility type thing. So the optimal flow, you can just get a formula for this. Itís very, very simple. You put a log minus this and itís a conjugate of a log.
I forget what is it, but itís something explicit. And set all the initial prices to one. And it will take a constant step size. Actually, this, of course, would not work or would work in a limited sense if these things were non-differentiable. In turns out, in this case, the dual function is completely differentiable, in which case, what you can say is this: if itís constant and small enough, youíre guaranteed to converge. Thatís the statement for differentiable case. And in this case, itís differentiable. Okay. Hereís an example. These are iterations running. And this shows you the utility derived here by the current thing. And this shows you g of lambda, which is an upper bound on this utility. So thatís whatís happening. And you can see here that in something like 50, 60 steps, youíve gotten the utilities about the right way. Oh, this is the utilities of the corrected ones, where you back off here. So that gives you a primal point, primal feasible, and that gives you a dual feasible point, which is the upside. Now as youíre running, you run a maximum capacity violation. And that looks something like this. I guess when you first start this violationís as big as by one, or something like that, roughly, 0.7 or 0.8. Something like that. Now, all the capacities themselves are between 0.1 and 1, so we can get a scale for this. It says that already, by about 20 steps, your capacity violations are pretty small. And certainly, by the end of this, your capacity violations are ten times smaller than the smallest capacity edges on the network. That gives you the rough idea. So now weíre gonna do Ė thatís odd. Thatís very odd. Now weíre gonna do a different problem. And these are just supposed to be two different problems in networks and stuff like that. And actually, youíll see that once you get the ideas, you can do zillions of generalizations of these and you can combine them and all that stuff. But these are sort of the two prototypes of these things. So this single commodity network flow, single commodity means itís something you should think of a flow or traffic on an arc. A single commodity means that you should think of electricity, water, gas, diesel fuel, something thatís interchangeable. And you donít mind what it is. By the way, in a communication network, a single commodity flow, that would be just packets all destined for the same place, or something like that. That would be a single commodity flow. So itís basically like a sensor collection network.
Then you have single commodity flow, something like that. So everything is the same, itís all interchangeable and so on. Okay. What youíre given is an external source of sink flow at each node. And the sum of the external sources or sink flows is going to be zero. Weíll see in a minute that that has to be. And weíll have a node incidence matrix and this is the graph incidence matrix. It just tells you which edge goes into and out of each node. And itís the traditional one, where you have the rows give you the edges and the column index, j, gives you the Ė did I get that right? No, nodes. Itís nodes by edges, like that. Flow conservation says this: itís Ax plus s is zero. So if you look at Ax the vector, let me just Ė I shouldnít have to Ė letís just write it down to get the picture. So weíve got a bunch of nodes, like this. And you have directed edges. And then each one has an external flow into it. By the way, these edges do not mean that the flow has to go from one Ė itís simply a reference direction, like in a electrical circuit. So when the edge goes like that, that says that if the flow is actually this way, itís positive. If it goes that way, weíre gonna call it negative. So itís just like an electrical circuit. You take a branch, you label the arrow, does not mean the current goes that way. Thatís the reference direction. And it merely means that positive current means youíre going that direction, negative current means youíre going the other way. Likewise, these sources that enter the network like this, they can be sources or sink, depending on the sign. And if you form Ax Ė are we using x for the flow? Yes, we are. So if x is a vector of edge flows, Ax is a vector and the height of it is the size of the number of nodes. And Ax tells you precisely the violation.
Ignoring these flows, it tells you precisely the net flow at that node where you add up everybody coming in and you subtract everybody going out. And whatís leftover, if you add in the s, it means that all the flow balances. So you can think of this as the basic conservation equation. These would be the exact same equations in an electrical network. This would be absolutely identical. So here the x is the current on a branch, s is an external current injected, and youíd have Ax plus s equals zero, in which case, this would be something like Ė this would be Kirkoff current law. And what weíre going to do is weíre going to have Ė this, of course, describes lots and lots of feasible Ė thereís lots of xís that satisfy Ax plus s. Well, not lots; it depends on the null space of A, of course. So any flow and then you have Ė oh, by the way, what is an x in the null space of A called? Iím just asking the English. You can almost guess what itís called. What does it mean to say Ax equals zero in a network like this? Letís say x equals zero, what does it mean? Iíll draw a picture and then you can tell me what the null space is. Weíre going to do a super simple network, itís going to have two nodes.
Instructor (Stephen Boyd):There. Iíve got two nodes and two flows. And Iíd like to know what on earth, in this case, what does the null space of A mean? Give me an element in it and please tell me what it means. Whatís an element in the null space the way Iíve drawn it here? Student:
Instructor (Stephen Boyd):Any multiple of one-one. Right. Everyone agree? If x1 is equal to x2, Ax equals zero, because thereís the net flow here is zero, the net flow there is zero. So you wouldnít be surprised to find out that the null space of A is actually called Ė any element in the null space is called a circulation. So in this case, itís literally a circulating flow. But in a general case, youíd call it a circulation, too. By the way, it does not have to be a flow that goes along the path like something like this. That does not have to be. Actually, those are in the null space, but you could have some weird thing where thereís all sorts of positive Ė anything in the null space is called circulation. Just to get the idea for how all this works. So there are lots of things in the null space. Another way to say it is this, find one feasible x thatís a flow that satisfies your flow requirement, your external flow, and then youíre really asking, to that, add something in the null space. So how to you optimally add circulation or whatever to that? And what you want to do is you want to minimize a flow cost function. And if you wanted a picture for this, it doesnít really matter. This can be a wireless network, and this can be the power required to support a level of flow. So in a wireless network, for example, if you say youíre transmitting a megabit per second, you calculate on your rate curve what power it takes to do that. And then this would minimize the total power on the network, for example. But thatís the idea. So this is single commodity flow network flow setup. The network flow problem, the single commodity flow, looks like this, you minimize these separable flows subject to Ax plus s is zero, and thatís convex, readily solved with standard method. I mean, completely readily solved. Weíll see the dual. The dual is going to be quite beautiful.
As you wonít be surprised, a flow problem here, the dual is going to involve a problem involving potentials. Again, if you know electrical engineering, you would suspect this. This would hardly be surprising. So weíll see that when we work out the dual, the dual variables are going to be potentials at the nodes. And in fact, itís more than that. Youíre going to find out that all that really matters is the potential difference across each edge. Iím just saying, you should be prepared for this. Itís going to happen, but I donít mind saying it now. Again, if someone just walks up to you and says, ďSolve this problem,Ē and thereís nothing else here, thereís no other reason or excuse for decentralization Ė by the way, the argument that itís cool doesnít really work. It is cool, but that doesnít matter. Youíd be way better off solving this using a standard method from 364a. So sparse matrix techniques would just beat anything, if you could collect the data in one place. Weíll see a decentralized solution. Itís going to look different from the flow one, but youíll get the idea. So we form a Legrangian. Thatís a separable function of x plus nu transpose Ax plus s. And of course, the constraints are linear, so when you form a Legrangian, everything separates. So this is just completely generic with linear constraints. And now, this function here is completely separable in the xís. Why? Because this is separable, starts that way. And this is linear. Everything linear is separable. Any linear functional Ė function is separable in all variables. I said linear functional, and actually, thatís correct. Linear functional is a linear function thatís real valued. And thatís actually very interesting, because this says that if I told you what nu is, these Legrang multipliers Ė by the way, the Legrang multipliers are associated with nodes.
The variables are associated with flows on edges, and not surprisingly, the Legrang multipliers are associated with nodes, because this vector has the height of the number of nodes. And in fact, it really tells you the node Ė Ax plus s is very simple. Itís the vector of KCL violations. Itís how much are you violated conservation of flow at that node. This is associated with nodes. And weíll see that theyíre potentials. Weíll see that very quickly. So this is completely separable in the xís. And letís actually look at what A transpose nu is. So the Ė let me try to get that right. Thatís gonna be Ė so we go A transpose nu transpose times x. Thatís the component in the Legrangian associated with the constraints, the linear component. And this has various entries, but the entries look like this. Itís Aj transpose Ė and that is nu. And the Ajís are columns that have a plus one and a minus one in them exactly. And it encodes for the jth edge the head and the tail node. So thatís what Aj is. In the A matrix, it looks like this. Youíve got each column has a plus one in it and somewhere a minus one. Everybody else is zero. And that encodes that for this edge, it goes into or out of Ė I forget what the convention is Ė one and then out of the other. Okay. This thing is beautiful. This vector is the difference in potentials across that edge. So if I have the jth edge and I form Aj transpose nu, I take this thing and I transpose nu. And it if this is nuk and this is nul, then Aj transpose nu is nuk minus nul. So itís quite beautiful. And in fact, you can already see something very, very cool. Notice that the Legrangian in nu has the property that you can add a constant to all entries of nu and nothing happens. I can add 27 to every entry of nu, and the Legrangian is completely unchanged. Is that true? No. Sorry, it is. Sorry, I confused myself momentarily.
Letís see, itís obvious in one case, but not in the other. In one case, itís easy. Every one of these, each of those vectors looks like itís got a one and a minus one. If I had 27 to every entry of nu, it makes no effect on that, because this is calculating potential differences across an edge. Here is the reason it doesnít matter, because 1 transpose s is zero. So thatís why it doesnít make any difference here. The sum of the sís is zero. Okay. So we use the notation delta nuj to denote the potential difference across edge j. Actually, these examples are fun anyway, because they should kind of be in 364a, just because you get to see duality and more applications, where you get beautiful interpretations. Thatís almost always true in duality. Okay. So hereís the network flow dual. It looks like this. The dual function is you minimize this Legrangian. And this is completely separable now, because Ė for each edge, to determine itís optimal flow in the Legrangian, it turns out it needs to know absolutely nothing. It does not need to know anything other than the potential in its head and the potential in its tail. Nothing else do you need to do. And you take that potential difference, you multiply by your flow, and you add in your cost for flow, and you minimize. And you can write that this way: this is the conjugate of the edge cost. And the dual problem is simply to maximize this g. So that shows you how the primal is a flow problem and the dual with a conservation constraint at each node, thatís what a flow problem is, so I guess thatís okay. And the dual is one involving assigning nodes to potential. So itís like the network flow problem Ė I mean the rate termination problem, but different twist. How do you recover the primal from the dual? Thatís easy.
Itís easy only if these are strictly convex, then you get a unique minimizer, xj* of that. And if these are differentiable, you have a formula for it. Itís very easy. And you get the optimal flows from the optimal potentials. And that looks like this: xj* is Ė this is xj, thatís the conjugate one. And thatís the optimal potentials. Now by the way, this formula is quite interesting. In electrical circuits, this formula is going to be the nonlinear IV characteristic, the current voltage characteristic. Because basically, you think of y here as a voltage potential across an edge. And as result, this function here, the inverse function of the derivative, is the arg min of this thing. This function here is actually the thing that maps an applied voltage across that edge to a resulting current flow. So for example, for a resister thatís linear, and that actually corresponds to quadratic flow potentials, in which case, these are energies. Thatís for the electrical analog, if you know what that is. And a subgradient of the negative dual function, thatís easy. Itís actually just this: all the flows totally uncoordinated. Thereís no reason to believe you have flow conservation. They look at the potential at their head and at their tail, they privately minimize something, which is sort of a net power, they privately minimize that, and then they say, thatís my flow. And then, basically, each node, you look and you find out what is the flow conservation violation. Thatís this. If thatís zero, youíre done. That tells you, of course, that g is maximized. If itís not zero Ė this thing here, the flow conservation residual otherwise gives you a subgradient. Okay. So here it is. You do the following. You start with some initial potential vector, nu. And you determine the link flows from the potential differences.
So you simply look at Ė you calculate this way. Then you calculate the flow surplus at each node. And weíll call that capital Si. Thatís the surplus. And then it says you simply update the node potentials. Itís actually quite beautiful how it works. I will change my story to get the sign right. Just assuming the sign is right here. Letís try that. So what happens is this, letís make an electrical current, so youíre flowing from one node to another. The first thing you do is you calculate the node potential difference. And then you simply calculate a flow that goes across that. Then, if the receiving node turns out to have too much current going in, net current, then it says that potential should go up. Thatís it. That potential should go up. If that potential goes up, it says that everybody flowing into that node is going to flow a little less in, because the voltage just went up, so you flow a little less in. And by the way, the outflows from that node are also going to increment a little bit. If you pull the voltage on the node up a little bit, the outflows are going to increase a little bit. But if the inflows go down a little bit and the outflows go up a little bit, it says that the total Ė if that node was over subscribed before, too much net current, then thatís actually going to be a correction move. And the point is just that these are embarrassingly simple to write down. So these are not algorithms you couldnít figure out or make up intuitively, but the nice part is now you know exactly what they are and how they work. Thatís it. So itís really embarrassingly simple. Well, as I said, itís completely decentralized and so on. And you get flow conservation only in the limit here. And once again, the same story holds. If you decide to actually run the flows, if this is not just a discussion among edges and flows and adjusting potentials before you turn on the flows, if the flows are actually running, if you have a residual at a node for that step, it increments or detriments a buffer. In an electrical network, that would be capacitors at the edges. And if thereís too much flow for an iteration, no problem, the charge on that capacitor just increments up. It could also be warehouses, where youíre storing things or buffers in a network or something. If too much stuff went in at that step, all it does it pop up the queue length, for example. By the way, if too much stuff lowed out, thatís capital Si negative, your buffer actually dropped down a bit, so thatís the picture. Itís probably not a bad idea to explain fully all this, although I think Iíve said a lot of this. This is related to an electrical network.
And this is only for people who know about electrical engineering, so if you donít, thatís fine. So here, the variable is the current in the branch. The source is an external current injected at that node, and they have to sum to zero, otherwise thereís no solution of Ax plus s equals zero. I guess thatís kind of obvious. This Ax equals s Ė wait a minute. Did I just reverse the sign on s? I think I did. Wasnít it Ax plus s before? Do you mind just write Ax plus s equals zero. And then we have to figure out if thatís injected or pulled out. Itís one or the other. Then weíll make it zero, just for aesthetics. The dual variables correspond to node potentials. And then these are the branch voltages. Of course, in electrical engineering, you know Ė in anywhere, physics or electrical engineering, if you say the potential, by implication thatís a quantity where differences matter, not the absolute values. Unless you have some reference or data value, a ground potential. Thatís the same thing here. And the branch current voltage characteristic is exactly this thing, which is the inverse of the derivative of that function. And what I can mention here is the following: if the local cost flow functions are quadratic, then it turns out this is linear. Thatís obvious, right, because then the derivative is a linear function. The inverse of a linear function is linear. So in fact, this goes back to 18 Ė this goes back to Maxwell, who announced that an electrical network, if you have a bunch of flows in a linear resister network, the currents arrange themselves in such a way as to minimize the total energy in the network. When they are minimizing total energy in the network, then almost by accident, you will get flow conservation. Thatís the other view of electrical engineering. The more primary view is to say, no, thereís flow conservation, or something like that. But it turns out itís a minimization problem. Thatís it. So letís give you an example. Itís going to be minimum queuing delay on a single commodity network. So the flow cost functionís going to be something like this: itís going to be xj over c minus xj. And this is something like a queuing delay, roughly. It doesnít matter. But this is roughly a queuing delay. And you can see here that as you Ė itís got a capacity. And as you run the traffic up to the capacity, whatís gonna happen is this denominatorís gonna get very small and your queuing delayís gonna get very large. So this is that.
On the other hand, if you lightly load it, youíre gonna get a very small queuing delay here. This is the queuing delay. And thatís, of course, a convex function. And the conjugate of this, you can just work it out, itís not very hard at all to work out the conjugate of that convex function. And it turns out to be this: itís the square root of (cjy minus 1)2 over some range and so on. Actually, I think we have some plots of these. So here are some plots. Hereís the queuing delay. It says if you lightly load the network, then the queuing delay on that edge is going to be small. Itís going to increase rapidly as you approach Ė this is with c equals 1 Ė as you approach the capacity, your queuing delayís going to go nuts, like this. And the conjugate looks like this. Itís zero out here, and then it goes up this way. So thatís the conjugate. Okay. So this is the mapping from potential difference to you take the inverse of that function before and you get exactly this. Itís the mapping from a potential difference to a flow, something like that. By the way, the potential differences here are actually going to be kind of interesting. If you think of the Ė weíll be able to see that this analogy is going to work beautifully, because if we measure the potential Ė if you set everything up right and scale time properly, you get the following: your dual variables are actually going to be the queue lengths at the nodes. So you can have a queue at the node. And whatís gonna happen is youíre gonna look at the difference in the queue lengths, thatís called the back pressure. And then youíre gonna flow depending on the back pressure. And in fact, this is gonna tell you how much to flow, like that. Thatís the way this is going to work. So hereís a little baby example. You have five nodes and all the capacities are one here. And itís a single commodity network and I donít see where the input and output is. I believe the inputís supposed to come in here and here, and itís supposed to go out there. Itís interesting that it doesnít say it here. I guess it doesnít have to say it, does it? Because those are the sís. I guess thatís fair. So hereís the optimal flow. So these are given. Thatís in input. So stuff is pumped into the network here and itís pumped in here. And itís pulled out here. This, obviously, is the sum of these two.
So thatís Ė call it a data collection network or whatever you want. By the way, weíre minimizing the sum of the queuing delays on the network. And basically, thatís actually the number of packets in the network, roughly. So thatís what Ė sorry, no, I take that back. Itís not that. Itís the average queuing delay. Okay. So letís take a look at what happens. It says that what comes in here, I guess Ė I donít know why that doesnít Ė somehow, this should have the same width as that. I donít know why. Thatís very interesting. Wow. These are weird. Itís spooky. Just ignore it. Itís weird. Itís obviously wrong. The only thing Iím concerned about here is the width of the Ė I guess the intention is that the width of the arrow shows you the flow level. But I think if you look at this, Iím getting some real bad Ė Iím getting kind of a KCL headache here. This does not look good. And this doesnít look good, because the sum of the width of this and this, I think, is probably that. And that goes here. So Iím not quite sure what weíre seeing here. Iím going to have to think about this. Weíll examine this and figure it out. Maybe it has an explanation. Iím just confused right now. Anyway, okay. So this is the optimal flow. And in particular, on these cross-lengths, thereís zero flow. Whatís shown here, these are the potentials. For simplicity, I could add any number I like, like five, to all potentials and it would make no difference whatsoever. I just took the termination node and made that the zero or the ground potential. So these are the potentials here. And then you can look at the potential difference. Maybe that is the difference between that and that. Who knows? Can you eyeball them? It is. Maybe this tells you the potential difference, or something like that. The width of this is supposed to give you the flow level, but thatís deeply suspicious, because for example, this thing is way too small or something here.
Weíll have to figure out what actually happened here. Okay. Thatís the picture. And then hereís how this would actually run if you did this. What youíd do is the following: this is just showing for several different step sizes here a constant. So here you just initialize all the potentials to one or whatever, and you get all sorts of flow errors, or something like that. And what happens is this, if you have a slow Ė if you take alpha is 0.3, this dual function shoots up and approaches what is the optimal one pretty quickly. One maybe goes up more quickly or something Ė maybe this is the point. Sorry, this is the 0.3. Thatís one and thatís three. I got it all backwards. I should have just read the legend. So 0.3 is a nice, low step size. Itís going to go up to the optimum. One is maybe the optimal step size, and three is so much that youíre actually oscillating a little bit and youíre not going to converge to the final, the actual solution here, youíre going to converge to a neighbor of it or something like that. And 2.48 is going to be the optimal flow. Okay. Primal residual is going to be the norm of the flow error, and you can see things like see it go down and stuff like that with the different things. This one is not converging. Itís going to converge merely to a neighborhood of the solution. And so, itís probably not going to get any better than that. Thatís the picture here. And here is the conversion of the potentials to their optimal values. I guess thereís five nodes. One of them has potential zero, so we donít plot it. And this just shows how the other ones are going. And this is for the step size of one rule. And this shows the potentials going, converging to their optimal values. Okay. That finishes up these two applications. Let me just mention a couple things.
If you study these and get a feel for them, youíll be able to construct and solve problems that are much more complicated. And let me just mention a couple of examples. Letís do multi-commodity flow. So in multi-commodity flow, itís the same as a single-commodity flow problem, where you can have flows splitting and all sorts of other stuff, except you have different flows. They are completely separate. They are coupled by the costs on the edges, which could just be capacity costs, but theyíre coupled by the costs on the edges. Everybody see what Iím saying. So basically, you have the same trucks that take material from one place to another, but you have multiple products. And the number of trucks you use to send stuff from one warehouse to another doesnít matter. How much what the product makes in the truck is doesnít make any difference. Itís the same price. So you have the multiple flows. Thatís one. You can go on and on. A common one in networks is that would convert both of these would be something like this: instead of having a fixed route from source to destination Ė thatís our first problem Ė you would have ten good routes. So ten good routes. And youíre allowed to split across those ten routes. So thatís a common setting thatís in between these two. Everybody see what that is? For each source destination pair, you have ten things. You can split the flows across them. So itís not like the single commodity flow, where you can completely split at every node the flows and then recombine them. This allows you to pump your traffic along ten different routes, or along any divisible combination of ten different routes. Thatís that one. Okay. So I think weíll quit here. And weíll start a new section of the class next time.
I have some questions for you. We havenít quite decided what weíre going to do next. And let me tell you what some of the options are and maybe you have an opinion. They might have to do with your projects, but itís up to you. So one option is that we can jump on to these methods for solving extremely large problems, like 1 million variables and more. Thatís option one. Itís very interesting material. Thatís one option. The other option is that we can do some applications that weíve just worked on, like sequential convex programming, thatís for non-convex problems. Is that a vote? Weíre getting a vote for that. Two. Okay. Three, four. How about solving the super large problems with truncated Ė wow, yikes. Donít worry, because theyíre not Ė itís only going to matter by a week. Wow. That was very close. You know what, Iím very sorry to say this, but we may have to wait until the convention to decide. Letís try the hands one more time. Superdelegates for sequential convex programming please.
Student:Are there other options?
Instructor (Stephen Boyd):Yeah, we got other options, but these seems to us are the ones that are going to be useful to you very soon. Sequential convex programming? Itís totally split. Iím not going to be reduced to counting. Okay, I will. How about the super large problems? Oh, God. That might have a slight edge. We will confer this weekend and decide in private in backrooms whatís going to happen. And then weíll move forward.
[End of Audio]
Duration: 77 minutes