Instructor (Stephen Boyd):All right. Now I guess Iíll take that as the Ė as a sign that weíre on. So actually, first of all, embarrassingly, I guess I have to introduce myself even though itís the second week. And I have to apologize for not being here last week. Iíve Ė actually, for the record, Iíve never done that before.
Iíve never missed an entire first week, and I promise I wonít miss any more classes, maybe, mostly, itís Ė anyway, Iíll make a very Ė if I miss anymore classes, itíll be for a really, really good reason.
Anyway, you were in very good hands with Jacob, so thatís okay. A couple of announcements [inaudible], Iíll start with a bunch of administrative things. The first is that we have two more TAís. This is Argerio Zimnus, who is sleeping, as most senior PhD students would be at this hour, and Cwong Mo Co is over here.
What weíll be doing is adding more office hours both for them. We now have Ė since we now have essentially like an army of TAís, weíll Ė we can have Ė we were joking about just having office hours like 24/7. I donít think we can pull that off yet, but weíre going to do something like that, and in fact, I do have some questions. Also Iím going to add office hours for myself, and we donít know Ė weíre trying to figure out what Ė I donít know, if people have opinions about where we should distribute the office hours, so I donít know. We have our own conjectures, but if anyone has any other?
For example, in my case I might add them Tuesday and/or Thursday afternoons. What do you think? Yeah? Oh, that was a yes, oh, okay. Okay, all right, fine. Then Iíll do that. Iíll add office hours for myself, and for the TAís.
Now this week, of course, is chaos week, otherwise known as the equalsí week. If youíre not in this department youíll Ė you actually Ė you cannot even imagine what this week is like, but just look around you and youíll get the idea.
If youíre in this department, you know full well. So those office hours, at least in my case will kick in Ė have to kick in next week, but look to the website for that. Letís see, a couple of more administrative things. Hereís one, and I Ė it has to do with the homework.
What weíre going to do is we actually did a little bit of planning for the homework, and the way itís going to work, and actually itís a good excuse for me to say a little bit about how the course is gonna go for a while, and this is actually substantive.
Homework 1 and Homework 2, Homework 1, which youíve already started Ė yes? Okay, so Homework 1 and Homework 2 are going to be just these big long slogs through Ė I donít know, itís Ė youíll have nightmares about it, and people coming up to you on the street and saying, is this set convex, is this function convex, and things like that for years.
But the nightmares will subside eventually. Theyíll always be there to some extent, so thereíll be little triggers later in your life that will set it off again. Someone will say something like, which of the following, and youíll Ė something will remind you of it and youíll get all on edge.
But seriously, what weíre going to do is this; weíre trying to go fast over the first material, which is actually a little bit, I mean, it is interesting. Weíre going to use all of it eventually, but itís sorta the mathematical basis of what weíre gonna do, and so the idea is to actually not cover it as well as we should.
Meaning, weíre going to go fast. Faster than you Ė thereís no way you could kind of absorb every subtlety. So what weíd like you to do is read the book. I mean, that goes without saying. Read the book, youíll do the homework, listen to the lectures, and youíll get maybe 75 percent of it, something like that.
Actually, thatís enough because whatís going to happen then is in Ė by Homework 3, itíll actually get interesting and weíll be doing a lot of applications from then on, kinda mixed in with new material, and at that point it gets interesting. So the idea is to avoid having something which for four weeks, non-stop, is essentially just a math course, so thatís sorta the rational behind this.
On your part what it means is this; if you think weíre going fast, youíre right, we are. But it also means donít worry about getting absolutely everything. All the things weíre gonna to see youíre gonna get lots more chances, and itíll be more interesting later because itís gonna be in an applied context.
Having said that though, this is your time to kinda get down some of the basics. Okay, so thatís our plan for how the course is gonna go. I should also say that there was one change. I donít know if it was announced yet, and that is that the homework weíre gonna try to have graded a little more seriously.
So the Ė then for example 263; in 263, the homeworkís were graded on a scale of 0 to 4, very crudely. Thatís nominally two bits. The actual amount of information from the entropy was on the Ė of .2 bits. We havenít worked out actually what the entropy is, but it was something like that.
So weíre going to try to do Ė at least give you a little more feedback and make this 0 to 10. Itís going to be imperfect, but weíre going to attempt that. So itís going to look something like that. The idea is if itís perfect, thatís a 10, if you Ė anyway, weíll Ė when we Ė weíll get into that later.
Another admin Ė maybe just two more admin things and then weíll move on to real material. The other is that we posted another reading assignment, and so right now you should be reading Chapter Two, and when you finish Chapter Two, keep reading, just go right on to Chapter Three, and four for that matter after that.
And then letís see, one more admin announcement. This is a bit specific, but itís a good excuse to mention something. We got several questions about what do inf and sup mean, and thatís a good excuse for me to say a little something.
So, you are welcome Ė if you donít know what these mean, youíre welcome to treat these as min and max, okay? So, two Ė one thing is actually for questions like that. When we get more than 10 or 15 questions, weíre going to start a FAQ, on the website, so please let Ė what? Unknown Speaker:
Itís already there.
Instructor (Stephen Boyd):Sorry, itís already there. Okay, it wasnít there two minutes ago, but anyway. So thereís a Ė thereís going to be a FAQ section in the website. Already pushed over? Okay. So there is a FAQ section in the website. And so please check there if you have some sort of basic question first. And this is anything ranging from, I canít get this software to work, to what do inf and sup mean, or when is Homework 3 due, or I donít Ė I guess thatís posted on the website, but that kind of stuff. Weíll just post stuff there.
And if you have suggestions, actually, for us to add stuff there, just send us an email. Okay. And on this topic, I should say a little bit; if you have had analysis, of course on mathematical analysis, then you know what infimum and supremum mean. If you havenít, I would simply substitute min and max.
And the way the course is going to work is something like this; the book, by the way, is not Ė I mean, there are many more mathematical books, so if thatís your interest on your website as pointers to other books, you can go insane. And it can be as formal as you like.
By the way, outside the lectures youíre happy to hunt me down. Iíll tell you what I know, if this is your interest. Frankly thatís not really the whole point of this class, but Iím happy to answer any question about that.
And Iím talking about things like detailed conditions, is this open, is this closed, what happens if the boundary Ė all these kinds of things. You know, what are the exact conditions under which this holds and things like that. In lecture Iíll be very caviler about these things.
So boundary conditions donít hold, everything I say is modulo, some technical conditions. In the book we, generally speaking, donít lie. If you look in the book thereís just a few things that are wrong. Well, you can treat it as true, the book. But in lecture Iíll just be happy to just go broad brush over these things.
And I think the class, by the way, is Ė just from past experience, if youíve had Ė if you have a deeper understanding of all the analysis and stuff like that, great. But I havenít noticed that it makes any difference in the experience. You donít have to know it; you can work with an intuitive idea of these things. Not really worry too much about boundary issues and things like that and youíll be just fine and totally effective when Ė at using these methods.
Okay, so thatís all my admin stuff, any other questions? Otherwise, weíll jump in. So today weíll cover convex functions. And let me say a little bit about how this is going to work. If you go down to the pad here, that would be great. There we go.
So weíre going to look at convex functions. First of all, just define what one is, and then look at various aspects of it. And let me say operationally why you need to know this; because one of the things youíre going to need to do is actually determine if a function is convex because to use this material, thatís like the most basic skill.
So thatís why we want to Ė thatís why you want to look at all this and see how this works. Okay. So letís just jump in. Well a function is convex if its domain is convex. Thatís the first requirement. And the second is that it satisfies this inequality for theta between zero and one.
So this says that if you evaluate F at Ė this point you saw in the convex sets lecture. This is a convex mixture, a convex combination of X and Y. So geometrically itís a point on the line segment between X and Y.
This says if you evaluate a function at a point on a line segment between X and Y, the result is actually less than the same mixture of the values of the end points. Or in terms of the graph, it says that if you take two points on the graph of the function and then draw the straight line that connects them, and I guess an old word for that is the cord, or something like this. Right, so thatís a very old word, is the cord, and it says Ė this says the cord lies above the graph.
So thatís what Ė thatís what this inequality says. And actually youíll get a very good idea of what this means eventually. Another way to say it just very roughly is upward curvature. So it just means curves up, thatís all. And by the way, of course, a convex function can look like that. It can be monotonic decreasing. Nevertheless, the curvature is upward for a function like that.
It doesnít have to be bowl shaped, but it should have positive curvature. Okay, now you say a function in concave is ĖF is convex. That means negative curvature, downward curvature, something like that. And itís strictly convex if itís convex. And not only that, but this inequality here holds with strict inequality provided data is strictly between zero and one. So thatís strict convexity. And you have strict concavity too.
So letís look at some examples; well Ė and these are just on R. So these are just functions you can draw. So the first is just an affine function, so thatís linear plus a constant. Thatís Ė it has zero curvature, so itís convex. And in fact what happens for a function that is affine is the following; is that in effect you have equality here. For always, for an affine function thatís exactly what it means.
It means that if you do a linear interpolation between two points, you actually get the exact function value. So thatís essentially the boundary of a set of convex functions. Exponential, doesnít matter what the coefficient is in here, this is convex. So if A is positive, itís increasing, but the curvatureís upwards. If itís negative, itís a decreasing function, but itís convex.
Powers separate out. It depends on the values of the exponent. If the exponentís one or bigger, or if itís negative, or Ė well zero [inaudible] thatís just a constant one, in that case itís convex. I mean, these are things you can just draw and see. So I think the question is to whether or not a function on R is convex, is a non-issue.
Hereís how you check; you draw it, and you use your eyeball to see if it curves up. So thereís really no issue there, okay? So these Ė weíll find formal ways to verify all these, but I think in terms of a function on R, there is no issue. You just Ė you draw, does it curve upward or not, thatís the question.
And you have things like power of absolute value would be another one. Negative entropy is X log X. Thatís something that goes like this; itís got this infinite slope here and then goes like that. Something like that, and Ė but at point as it curves upwards, thatís negative entropy.
So entropy, which is minus X log X is going to be concave. Okay? Now in concave functions examples would be like an affine function, and in fact, of course, if a function is both convex and concave, then itís affine. And thatís clear because it says basically; this inequality holds one way and the other. That means its equality. This in fact implies the function is affine.
Okay? This powerís in the range between zero and one, you know, like square root for example, you just draw this, and itís clearly concave. Log rhythm, itís another famous example. Okay. A little more interesting example on RN and RM by N, thatís the set of M by N real matrices.
So hereís Ė of course you have an affine function on RN, thatís [inaudible] plus X plus B. Thatís a general linear function plus a constant. So this is the form of a general linear function, affine function on RN. And thatís going to be convex, itís also concave.
Norms; so any norm is convex. That follows actually from triangle and equality, or Ė I mean, thatís part of the definition of being a norm. And examples are things like this; the so-called p-norms, which is the sum of the absolute value of XI and then to the one over P. Now for P equals one, thatís the sum of the absolute values for P equals two. Itís usually Euclidean norm, but for example, for P equals three, itís the three norm, which is the cube root of the sum of the cubes of the absolute values of a vector. Yes?
Student:Is there a half norm?
Instructor (Stephen Boyd):Thatís a very good question, is there a half norm? So some people would Ė let me answer it first Ė well let me just first give the answer. The answer is no because the square of the sum of the square roots is actually not a convex function, and therefore it canít be a norm because norms have to be Ė all norms are convex. Itís actually concave.
Now having said that, it is currently very popular in statistics and a bunch of other areas to use a Ė as a eristic for finding a sparse solution that involves Ė weíll see this later, by the way, using things like the so called p-norm for P less than one, but itís not a norm, so itís weird.
Student:Why do all the norms have to be convex?
Instructor (Stephen Boyd):That actually follows from the definitions of Ė norm has to satisfy a triangle in equality and a homogeneity property. And then from those two you can establish it has to be convex. By the way, I should mention something here. Itís not easy to show Ė if you put Ė just for general P, just take the three norm. Itís not easy to show thatís a convex function. Itís not easy at all. So itís Ė I mean, itís not hard, but you have to know exactly just the right five or six steps to do it.
So this is maybe the first thing Iíve said that wasnít trivial, and itís not. Okay, letís look on matrices; actually, every now and then weíre going to Ė there are going to be problems where the variables are matrices. Sometimes square, sometimes theyíll be rectangular, but letís look at a couple right now.
What is an affine function on matrices? Well the general form looks like this. A trace of A transpose X plus B Ė by the way, when you see this you should read this as follows; this Ė by the way, some people write this as the inner product of A and X plus B. Thatís the standard inner product on matrices, is trace A transpose X.
If you work what it is entry by entry, itís just this. So itís as if Ė well itís this, let me write it another way, letís see if I can do this. If Ė thatís not totally Ė thatís mixing weird and Ė that mixing notation, but the point is this says that if you string out A as a vector, string out X as a vector and then calculate the ordinary inner product, well you would just get this, like that. Okay?
And thatís the same as this thing. So this notation, you might as well get used it. When you see trace A transpose B Ė A transpose X, that really means you can think of it this way. And, by the way, another notation for this is A big dot X or something like that, so youíll see that.
Okay. Hereís an interesting function, which is the norm of a matrix. So thatís Ė and Iím talking about here, the spectral norm or the maximum singular value, or you can call it many other things, and thereís not that many other Ė actually thereís probably just a couple other names for it, the L2 induced norm and maybe the Ė now Iím running out of obvious ones.
So this is the square root of the largest [inaudible] X transpose X. Now I want to point something out, that is very Ė thatís a very complicated function of a matrix. So thatís Ė it is not a simple Ė to take the matrix form X transpose X, find the largest [inaudible] value of it then take the square root of that.
Thatís a chain of quite complicated operations. So thatís a function which is not simple, but itís a norm and itís convex, okay? Now hereís one extremely useful property for convex functions, is this; a function is convex if and only if itís convex when itís restricted to any line. Thatís very, very useful.
And in fact, itís one of the best ways to actually establish convexity of a function. And essentially it means the following; I already said that convexity of a function on R is a non-issue. Plot and use your eyeball. This says in principle, convexity of any function is a non-issue. Now the only hitch is you have to unfortunately plot and use your eyeball on all possible lines that pass through the set of which there are [inaudible] number.
Okay, so thatís the only Ė but conceptually it means that thereís nothing subtle about convexity. Itís basically if you take some complicated horrible function, multiple dimensions, and you take a line, then Ė and then view that function on that line, you should view something that looks like that. And if that happens no matter what line you choose, itís convex. Thatís what it is.
So this is not too hard to show, and may indeed be coming up on a homework or something for you. I mean, to establish it, this is the case. And letís look at an example; so hereís an interesting function; if the log of the determinant of a positive definite matrix Ė by the way, thatís a complicated function right there.
For example, if X were 10 by 10 and I wrote this out, it wouldnít Ė it would take like a book to write out what that is because youíd have 10 factorial terms in log det X. So this is a log of a polynomial of these 100 by Ė well okay, itís 10 by 10 so thereís only 55 entries because the bottom is the same as the top or whatever. So thereís 55 entries, you know, free variable.
So this is the log of a polynomial in 55 variables, and that polynomial probably would take a book. Iím just making a wide guess. By the way, it could be a whole lot bigger than a book too. I could be off a bit. But Ė so thatís Ė this is really a fantastically complicated function. Actually, this function is concave. Thatís going to be very important. Itís going to play all sorts of roles later.
So letís actually show that, that this is concave using this technique. This is Ė well itís the only technique you know except for applying the definition. Now Ė by the way, what does it mean to say its concave? Basically it says that if you have two positive definite matrices, and you evaluate the log and determinance of them, and then you form a blend of the two, letís say the average.
It says, the log of the determinant of that average is bigger than or equal to the average of the log of the determinants of the end points. Does everybody follow that? So thatís Ė and this is not an obvious fact. I mean, once you know it itís obvious, but Ė well itís actually not obvious, letís just say that. Even when Ė once you know it itís obvious only because you know it.
Okay, so letís see how this works; so to establish this we have Ė what we have to do is we have to pick an arbitrary line in symmetric matrix space, okay? So what does a line look like in symmetric matrix space? Well it looks like this Ė and itís one that passes through the positive definite cone.
So itís going to look like this; without lost generality it looks like a point. X, which is positive definite, plus T times a direction V. Now this direction V is a symmetric matrix, but it does not have to be positive semi-definite, right, because thatís a line in a direction Ė thereís no reason the direction has to be positive Ė it does have to be symmetric. So this thing describes a function of a single variable T.
And we have to check that this is in fact a concave function of T, so thatís what we have to do. By the way, if youíre looking at some function, you have absolutely no idea if itís convex or concave. First thing to do when youíre near a computer, sit down, generate arbitrary line and plot and look. Look at 10 or something like that.
And by the way, if you find one that doesnít curve up then you destroy all evidence that you did this and then comfortably say, obviously thatís not convex, and then erase all the files that Ė itís three lines, but the point is then you erase the evidence.
If they keep coming up like this, then after 50 times itís like, well okay, here it goes. And then you roll up your sleeves and you move in to prove it. And it may not help you, but itís actually quite useful to do this. Iíve actually failed to follow this advice on several occasions, jumped in 45 minutes later, wandered Ė got tired, wandered over, typed in a few, quickly turned up, found a point where it had the wrong curvature and then realized I was just Ė I had just been completely wasting my time, so Ė okay, I just mentioned that.
Donít tell people where you learned that Ė where you heard that. All right, so letís work out what this is; well a right X Ė Iím going to write this Ė thereís lots of ways to do it, but X is positive definite so it has a square root. So Iíll take half out on each side, and this will look like this, T, itís gonna look like that, times X half.
Thatís what Ė this matrix is this. But the determinant of a triple product is the product of the determinance. And you take logs and it adds and all that, so you get this because the log of X half plus log Ė sorry, log det X half plus log det X half is log det X. So you get this thing here.
Itís still not too obvious, but what weíre going to do now is this; Iím going to write the Ė this thing is a symmetric, but not necessarily positive Ė itís certainly not positive semi-definite or anything like Ė you donít know. Matrix, weíll take its igon expansion. Weíll write this as whatever, Q lambda Q transpose, and you get this with a T there like this.
What Iíll do now is Iíll do the trick of writing I as QQ transpose then pull it out on either side and Iíll get det Q. It doesnít Ė det Q is either plus or minus one, but it doesnít matter. And then Iíll end up with det I plus T lambda. Thatís a diagonal matrix. I know what the determinant of a diagonal matrix is, itís a product of the entries, and so I get this thing, okay?
So I went over this a bit fast, but I promised I would go fast, so Ėyes?
Student:How did you choose the directional matrix V?
Instructor (Stephen Boyd):Itís extremely important that itís completely arbitrary. So the technical answer to your question, how did I choose the direction V is, I didnít. Or arbitrarily I suppose is the actual correct adverb or whatever, okay? Thatís Ė because if V is Ė if I chose V in any special way then my final conclusion is not gonna hold, itís not right.
To be convex has to be every line. It has to be convex when restricted to any line through the point. Okay. Now weíre in pretty good shape, because I know what log 1 plus T lambda looks like. For any real number of lambda Ė if lambdaís positive it looks like one thing and it goes like this, or sorry, it goes like that. Nope, sorry, that was from my point of view. I did it right the first time, it goes like this. If lambda has the other sign it goes like that; either way curvature is negative, and so this is concave.
Okay? By the way, this is gonna turn out to have sorts of implications. If youíre in statistics, if youíve taken information theory, communications, a lot of other things like that, actually, if youíve done any computational geometry, it turns out some things you know are actually related to concavity of log det X.
Things involving like entropies of ram Ė of galcie invariables and things like that. So actually throughout the class, when we get Ė when weíve actually covered some material youíll see all sorts of things you know from other classes are going to start to connect to various things here.
Next topic is Ė this is just a bookkeeping thing. Itís quite eloquent, but it Ė and itís actually something good to know about. It works like this; when you have a convex function, it turns out you can encode the Ė itís convenient to encode the domain of the function by simply assigning the value plus infinity when youíre outside the domain.
So if you have a function F with some domain, then we simply Ė we define an extended valued extension as follows; itís gonna agree with the function if youíre in the domain and weíll assign it infinity outside the domain. And, I mean, letís not worry about this here, but technically thereís a different between F and F tilde.
And the difference occurs when you call F at a point outside the domain. If you call F at a point outside the domain, you get the dreaded Ė whatís returned as the dreaded NID token, otherwise known as Ďnot in domainí. Of course, what would actually happen is some exception would be thrown at you or something like that, or a NAN would come back or something like that, but thatís what that is.
Whereas F tilde evaluated at a point outside the domain is of F, simply returns the token plus infinity. Okay? Now what happens is then everything kinda works. So if you have a function, sort of it looks like that, and its domain is from here to here, thatís a convex function. What we simply do is we simply assign it the value sort of plus infinity outside; here, thatís plus infinity, okay? So it looks like that, so you make it plus infinity.
Actually everything works, including this inequality if you say Ė if you take this point and this point, everything works. If you draw Ė everything will work because this line will now have a slope going straight up and it all just works.
So this is just something to know about, itís absolutely standard in convex analysis. So itís Ė you should know about it. Now for concave functions itís the opposite. You encode the domain Ė or I should say, you encode the not domain as minus infinity in the function value, okay?
Now we get to first order of condition. First of all, just to remind you, a function is differentiable if its domain is open. I mean you could also talk about being differentiable at a point, and youíd say that itís differentiable to point if the point is in the interior of the domain, but weíll just talk about being differentiable period.
So a point is differentiable if its domain is open and the gradient exists at each X in [inaudible]. Actually, this is a bit circular and what you really want to say Ė you really want to say this is not Ė this is informal so Iíll just leave it that way.
So hereís the first order condition for convexity; this is very important, and actually itís a hint as to why convex opposition actually works very, very well. Here it is, it says the following; form that is the Taylor approximation. The first order Taylor approximation of F at X. Okay, that says as a function of Y. Thatís the Taylor approximation.
So thereís F, here is this Taylor approximation, of course this is drawn in R, but in general this is the Taylor approximation. And of course what the Taylor approximation is this; it Ė at the point in which you expanded, itís perfect. In other words, it coincides with the function. Nearby, so near X, F Ė the Taylor expansion is very near, by which I mean to say, formally near squared.
So itís small Ė the error is small squared. So what the Newton tells you or calculus tells you is that these two functions, one is this affine function and one is this one, is that the are very near as long as Y is near X. For a convex function this thing is actually always an underestimator. So thatís the important part.
It says that the Taylor expan Ė the first order of Taylor expansion is a global underestimator of the function. Thatís what it means. Now, what Ė let me Ė I mean, this is actually the key to everything. I mean itís Ė once you know all these itís kinda trivial, but this is the key to everything because let me explain what happens; later in the class weíll formulate all sorts of crazy problems as convex problems.
And weíll come back and weíll say, Iíve solved it, this is the solution. It will be some horrible complicate problem with hideous no non-linearityís, things that are non-differentiable, god knows whatís in it. And someone will say, well how do you know thatís the global solution? That problem is highly non-linear, itís got all Ė I mean, thatís ridiculous.
I mean maybe itís a local solution, Iíd believe that, but how could you possibly assert that in all over our 100 thereís no point that does better than your point. Maybe thereís some sick little region you havenít examined yet where the function miracously does better.
Everybody see what Iím saying? And itís really a preposterous statement that youíre making. I mean after a while youíll get used to it, but itís preposterous. And then you said, oh no, no, because this is a convex problem, I assure you this is the absolute best there is.
And someone will say, how can you say that, itís ridiculous. This is it, and the reason is this; from local information, thatís a gradient, a gradient is local information. It says, from local information you get global conclusion, which is this inequality. So just Ė although the inequality is not a big deal, the fact that it says something that you evaluate a gradient somewhere, thatís completely local.
To evaluate a gradient all you need to do is wiggle X around near that point and see how the function varies locally. You donít have to do any big exploration far away. What is says if that function is convex then from that little local exploration you can make global bounds on that function that hold arbitrarily far away. Everybody see what Iím saying?
So not a big deal, but this is Ė if you ever Ė whatís going to happen is three weeks into the class things that are just preposterous will be asserted, youíll be asserting them in fact. And if you ever wanted to go back and say, well what Ė you know, but you know how it always is, right, thereís just a string of little trivialities and the next thing you know youíve said something quite deep, and then you always want to go back and say, where exactly did that happen?
Anyway, I say it happened here. Thatís what I say. Okay, second order of condition; I guess youíve seen this. This youíve probably seen in a calculus class or something like that. Usually it has to do with sort of this part, the suficion conditions or checking if somethingís a minimum or maximum or something like that.
And it basically says that the Ė it says, if itís twice differentiable, it says itís a hessian, which is the makers of partial derivatives. If that is Ė it says the following; it says itís convex if and only if that matrix is positive-semidefinite, okay? So thatís the condition.
And then thereís a gap in characterizing strict convexity. And the gap says something like this; it says, certainly if the hessian is positive-definite everywhere then the matrix Ė Iím sorry, and then the function is strictly convex. Actually the converse is false and an example would be S to the fourth on R. Thatís a strictly convex function; X to the fourth, but its second derivative at zero is zero, okay?
Thatís a case where the second derivative fails to be positive at the origin. So this is the condition. So this is useful sometimes. Actually, this is less useful, in fact, and also when Ė although you will have to Ė I mean, weíll see to it, let me put it that way, that youíll have to use this method to establish convexity of something. But as youíll see later, this is to be avoided because generally itís a point to show something as positive-semidefinite, unless itís really simple, the function.
Now we can knock off some examples real quick. Letís characterize all quadratic functions right now. So a quadratic function looks like this; itís a quadratic form, itís a p-asymmetric here, a liner function and a constant, okay?
So the gradient of this function is PX plus Q, and the hessian is P. So thatís the Ė itís constant, itís got a constant hessian. So a quadratic function is convex if and only if P is bigger than or equal to zero, if itís a positive-semidefinite matrix. And an example would be least squares objective. That just Ė immediately. So here the hessian is A transpose A and thatís going to be positive-semidefinite so thatís always convex.
Hereís a function that is not obvious, itís one Ė this is probably one of the first functions you encounter that is not obvious, itís not obvious that itís convex. Maybe other than minus log det X, but hereís one; X squared over Y just two variables. So itís Ė this is convex in X and Y provided Y is positive.
And letís take a look at that Ė first of all, letís just check a few things. If you fix Y itís convex in X thatís clear. If you fix X itís convex and Y because itís a one over Y in that case, okay? Now by the way, Iím using there this idea that these are essentially lines. I mean thatís a line, one is aligned with the X axis, one with the Y axis. So if a function of many variables is convex, it better be convex in each variable separately.
The converse of course if completely false; you can have a function convex in each variable separately, but not jointly convex. And the answer in Ė I mean, the distinction comes completely from the Hessian, right? So you have a, for example, a function of X and YQ variables, you have a two by two hessian that is required to be positive-semidefinite.
To say that itís convex in X and Y, it says the diagonals of that hessian are non-negative. Thatís what it means. To Ė that tells me nothing about the off-diagonals, so thereís a further condition linking the two and the cross condition. Question?
Student:Is there any way to prove convexity or show convexity from [inaudible] projecting it on to one dimension [inaudible] number of times and Ė
Instructor (Stephen Boyd):Be careful because weíre not projecting onto a dimension, weíre restricting to one dimension. And the answer is, yes. In principle if you restrict it to any line, any sort of one dimensional set that passes [inaudible] and the result is always convex, then itís convex, but it has to be understood thatís in principle unless youíre prepared to do something symbolic with every line like we just did with log det.
Okay. So letís take a look at this function, and letís show that itís con Ė by the way, hereís a plot of it, so it looks Ė I donít know, some people have told me it looks like a boat or something like that, the bow of a boat or Ė I donít know, anyway, so and if you look in Ė I guess, one slice itís quadratic and itís Ė I guess itís Ė I donít know what it is anyway so, if you slice it different directions you get obviously convex things.
So lets work out the hessian, well I worked it out, and Iím not going to calculate in front of you all the partial derivatives of the things here, but you calculate partial squared F, partial X, partial Y, and the diagonals, and you fill it out and you get a matrix which indeed is rank one, and can Ė and positive-semidefinite because you write it this way.
And here weíre using the fact that Y is positive here, to ensure that this is positive-semidefinite. So thatís convex, so thatís your Ė maybe one of your first non-obvious convex functions. Hereís another on; the so called log sum X function. Thatís the log of a sum of exponentials of variables. Actually, before we go on let me just say a little bit about this function; itís Ė first of all itís sort of like a smooth max, or in fact some people call Ė there are fields where this is simply referred to as soft max.
So I know whole fields where this is universally called the soft max. The reason itís soft max is this; if you take a bunch of variables, X1 through XN, then X increases of course very quickly. So the biggest X is a lot bigger. If thereís a good gap between X Ė the largest X and the next one, X will accentuate the spread.
Okay? If you add these up and then take the log Ė if for example one of the Xís is kind of isolated and far away from the second Ė if the biggest is away from the second biggest then you can actually quickly see that basically this sum X is basically X of the largest. You take the log of that and you get the largest. So this is sort of a smooth Ė itís a soft max some people call it.
I should also mention this is Ė if youíre in electrical engineering, this is the DB combining formula. So this is how you combine powers that add incoherently. So if you have for example, if you have a bunch of powers adding, and the powers come in at, minus 15 DPM, minus 22, minus 37 and minus 3DBM, everyone in here knows what the power of the result is, except I forgot the numbers I just listed, but I think itís minus three with the largest one. The answer is itís minus three.
So, whatever that is, minus three DB reference to a millawatt. And let me explain what that is; X convert from decibels to power. This does incoherent power addition and log converts back to decibels. By the way, obviously if youíre not in electrical engineering you donít have any idea what Iím talking about, and thatís just fine.
So Iím just saying this is a function, youíve seen it before and it comes up in statistics as well in exponential families, this comes up all the time. Okay. So, in a little statistical mechanics too, itís the denominate Ė a fancy version of this is some normalization constant or whatever. Iím sure people here know a lot more about this than I do.
So thatís a convex function, and the argument would go something like this; you take the hessian Ė by the way, if youíre curious, no one can look at this and write this formula out for the hessian. Just, you know, so if youíre looking at this and saying like, oh, that looks pretty easy, anyway, itís not.
You have to sit down and first you try to find some secret rules for Ė chain rules for calculating the hessian of some function, thatís the first thing you do. Then that gives you a headache. Then I usually just basically Ė at some point you give up and you just go in there Ė you just calculate partial squared F, partial XI, partial XJ, thereís just no way around it. You get this huge horrible mess then you have to go from your mess, your index by index representation to a matrix representation and then youíre off and running.
Then you go back and erase all that and then write things like this. So Ė just like this and then thatís the idea, but lets take a look at it and see what it says; now when you do this you look at it and you see well thereís this first term, Z here is this X of X, so itís obviously Ė itís non-negative.
This think here, itís interesting. Thatís a non Ė thatís a positive diagonal matrix so this is positive definite Ė positive-semidefinite Ė well itís positive-definite. And of course what youíre hoping for is to add that to something positive definite at which point you Ė itís easy because you just say the sum of two positive-semidefinite matrices is positive-semidefinite, done.
And thereís a little minor problem, which is that this sign goes the wrong way, and that could mean two things; it could mean either this is not convex, or you have to work harder to show this thing is positive-seimdefinite, and itís the latter here.
Student:Is Z a vector here?
Instructor (Stephen Boyd):Yep, Zís a vector.
Student:What is diag of [inaudible]?
Instructor (Stephen Boyd):What is diag of?
Instructor (Stephen Boyd):Of Z here?
Instructor (Stephen Boyd):What is diag of Z? Oh, diag of Z, you take a diag Ė this is actually entering Ė of course its mat lab slang or something. But itís entering Ė itís sort of entering mainstream mathematical notation.
Instructor (Stephen Boyd):Exactly. Yeah. So diag of a vector imbeds the vector into the diagonal elements of a diagonal matrix. What department are you in?
Instructor (Stephen Boyd):What?
Instructor (Stephen Boyd):Double A, okay. Now wait a minute here, have you never used [inaudible]?
Student:A long time ago [inaudible].
Instructor (Stephen Boyd):It was a long time ago.
Instructor (Stephen Boyd):Maybe, okay.
Student:Last year, I think.
Instructor (Stephen Boyd):Okay, last year. Okay, fine. Okay, thatís fine, just checking. I thought you were going to say math or something like that.
Instructor (Stephen Boyd):Okay, all right. So it was in a CS class last year, which Ė the memory of which Ė you didnít remember what class it was or Ė what was it?
Instructor (Stephen Boyd):220?
Instructor (Stephen Boyd):Eight, okay, I figured. Okay. So back to our thread here, itís not over yet, you still have to show this is positive-semidefinite. How do you show a matrix positive-semidefinite? You simply show that the associate quadratic form in always non-negative. So you put a V on the left to be transposed, a V on the right. You plug that in here and see what happens and you get this thing.
And this turns out to be greater than or equal to zero using the Cauchy-Schwartz inequality applied here, okay? These are not Ė this is not obvious. Okay? This is not something that you just type in. This is like 30 minutes of thinking and breaking pencils and things like that.
Oh, you know, I just remembered something, speaking of that. I donít know that itís been announced yet, but I Ė itís just a suggestion. A suggestion actually itís not a bad thing to work on homework in possibly small groups of people depending on who you are. I mean people already probably have the sense to do this anyway, but youíre being officially encouraged to work in small groups like that. In which I case I could say so after 30 minutes of working this, breaking pencils, throwing pencils at the other people in the group and things like that.
Actually working in a group is very good because itís easy to think you understand something, very easy, when youíre by yourself. And it turns out if you have to address like two or three other people and try to explain it Ė actually, you know while youíre explaining it youíll realize that so your mouth is going like this, right, and thereíll be something in the back of your head, I donít know what part of it is, will actually come out and say, by the way Ė also just looking at the others in the group you realize, this makes no sense whatsoever.
I mean, what youíre saying makes no sense. So itís a very good [inaudible]. Whereas a lot of things you know by yourself can be totally obvious. Totally obvious, then you try to explain it to someone. Also, if you try to explain it and itís this long complicated Ė you go, look, its super simple, itís so easy. You just Ė okay, you do this, but then its, well hang on, I forgot to say this.
And then also, by the way, do you remember this and then you put that there Ė this is also a hint that this is way to complicated, your description of it, so youíre encouraged to work in small groups. All right, letís move on. So a geometric [inaudible] is concave, so this is the product of a bunch of variables, these have to be positive, although actually this works for non-negative.
So itís a bunch of variables and then the enth root of that, so thatís concave. And itís the same type of argument as for log sum exptH, same type of thing. Okay, some other connections, the idea of an epigraph and a sublevel set. So if you have any function then the sublevel Ė the alpha sublevel set is the set of points with F value less than alpha.
By the way, later in the course F will be an objective or it will be some type of objective, or something like that. And for example, if F is a design, then if F represents the power dissipated by some circuit design or something like this, this will be the set of designs that meet an alpha spec.
Thatís what this is going to mean, okay? Thatís what a sublevel set will often mean. By the way, if itís an estimation problem, an F is a measure of implausibility, like negative log likelihood in a statistical estimation problem. This will be the set of points, which are at least alpha plausible values, okay?
So something like that. All right, now if you have a convex function then the sublevel sets are convex. By the way, the converse of that is false, but itís a very important thing and weíll have a name for it very soon, a function whose sublevel sets is convex.
The real correct connection between convex function, convex set, because weíve overloaded the word convex now to mean two things; it applies to sets and it applies to functions. The real connection is through something called the epigraph. So I guess epi means above, and so epigraph means everything about the graph.
The graph of a function of course, is the set of pairs, XY. So itís an RN plus one, itís the actual graph. The epigraph is everything above it. So if I have a function, the epigraph is this shaded region like this. And hereís the real connection between convex set and functions.
A function is convex if and only if its epigraph is a convex set. So Ė and in fact, it turns out you should just be thinking about convex functions in terms of their epigraphs just always. So, for example, when I say hereís a function like that, in fact letís put a domain restriction on it from there to there, you should just immediately visualize this set drawn in, and itís a convex set.
Actually this is going to be important because weíre going to do a lot of calculus on Ė with convex functions and you want to think about what does it mean in terms of the sets? Thatís what you want to be doing. So this you should just be thinking of at all times, the epigraph.
Okay. Letís look at Jensenís inequality; so our basic inequality for convexity is this; it says if you take a point between zero and one, if you take two points and take some kind of weighted average of the two, thatís what this is, and evaluate it, itís Ė so this says that F of the weighted average is less than the weighted average of F. I said it right. I always forget this.
Iíll show you a pneumonic in a minute, which you can sneak off and do if you have something to draw on. You can even do it if you canít draw, but it takes me longer. Now thereís tons of extensions of this, for example, instead of just two points you could have a finet number of points and a bunch on fada Iís that add up to one and are non-negative. Thatís a convex combination.
And the same inequality would be true thinking of an accountable infinite number of points, some combination that way, and then you can have an infinite one. The most general is this; if F is convex, then F of an expected value is less than or equal to expected value of F of X. Now thatís called Ė thatís Jensenís inequality.
And this is where Z is a random variable, however, which is in the domain of F almost surely. So thatís this, and this is Jensenís inequality. And this basic case is nothing by this; itís a distribution on Z, extremely simple. It takes only two values, X and Y with probability theta and one minus theta, and you recover this thing.
So this is Jensenís inequality. I never remember Jensenís inequality, especially because whenever it comes up usually F is half the time convex, half concave. Of course if itís concave it goes the other way. And so I actually never remember it and I usually have to just draw a picture and remember this thing. Thereís another way to say it though, I know what that is; so if you want a general pneumonic about how it works, it basically says that for a convex function dithering actually hurts.
And let me explain what that means; it means that if I have a function Ė a point here, and hereís a convex function, like that, okay? Letís imagine, thatís actually my target point in some process, but now, actually when I manufacture it, I actually get a distribution of values like that, okay? Whose mean is this point, everybody cool on this?
So thatís the Ė this happens, so this is Ė and then this tells you the cost, okay? And this could be the power, I donít really care, something like that. Or the speed of some Ė lets make it the power. So this is the power, and it basically says, you know, look, these points, that are where the manufacturing [inaudible] they were on your side. The threshold voltage went up or whatever, something happened, and it actually dissipates less power than your nominal design.
Everybody see that? And this is where it worked out badly. And now I ask, whatís the expected value of the power? What is it? Well the first thing you would do is youíd say, well look, itís around here, because if I approximated this by an affine function and I propagated this distribution through an affine function, then its mean would be the same. So the first quick answer is something like this; well itís the same, if this is like one watt, and you have these manufacturing variations, sometimes youíre less than a watt, sometimes youíre more.
And it says that the average is going to be on the order of watt, you know, like a watt. Everybody got that? But the fact is its worse Ė the average is worse than a watt. Itís like 1.05, it canít be better. And the reason is this; itís true that when you go up and down here,
The first order approximation, in that case sometimes youíre less than a watt, sometimes more, but because the curvature is upward you get Ė letís see, you pay more when itís bad then you recover when itís good. Did that make any sense? And thatís entirely due to the curvature. So that my pneumonic for Jensenís inequality. It basically says manufacturing variation generally isnít good. Yes?
Instructor (Stephen Boyd):Thatís your question? So this is a good time for me to announce this. There will be periodically times, Iíd say multiple times per lecture where Iíll go in something and Iíll make no sense whatsoever. Its best Ė if this starts happening like three or four times a lecture then you can let me know, but thatís good feedback.
Itís best at least for now in the spirit of just rushing forward, weíll just move on. Did anyone understand what I said? A handful of people, theyíre just probably just being polite. Okay. Youíll learn to just move on. Now, let me Ė actually, let me Ė this is a good point to kind of explain Ė give a sign post and say where we are and how this works.
What youíre gonna have to do is youíre gonna have to look at functions and figure out if theyíre convex or not, thatís what you have to do. So the methods I just looked at involving lines and all that, if you have to resort to that, if you actually write out a hessian, I mean this is to be avoided to be honest with you.
This is Ė you do this only in a last resort. Of course every now and then you have to; weíll arrange that you will have to because everyone should have to do this once or twice or something. But the point is the right way to do it is using a convex calculus. And so the right way to do it is this; whatís gonna Ė this method three, just sort of just general method is gonna work like this, youíll learn a bunch of atoms.
Youíve already seen a bunch, affine function, powers, log sum exp, X squared over Y, norms, quadratic functions, thing like that, where once you know it, minus log det X you know itís convex, okay? Then weíre going to look at a calculus. And a calculus meaning methods to combine these and rules for showing itís convex. And you saw this with convex sets last lecture.
But here thereís gonna be a bunch Ė there are gonna be some simple ones here. Now in this rule set, these divide into what I would call sort of the really obvious basic ones and then thereís sort of that intermediate tier, and then you get into the advanced ones and the ultra advanced ones and things like that, thereís sort of no limit on these things.
So I will tell you when we move into different levels of esoteric on this, but Ė okay. So these are extremely basic sense of following. If you have a convex function and you scale it by a non-negative scalar, itís convex, thatís totally obvious. If you add two functions that are convex, itís convex, okay? And that extends to adding five functions and it goes to an integral or even an expected value of convex functions would be convex.
Composition with an affine function; so if you pre-compose with an affine function, so in other words if you apply an affine function then a convex function, you get something thatís convex, and itís Ė many ways to check this. Actually, just directly is simple enough, just with the old theta in there. Okay? Very simple to show these things.
Letís look at some examples; we can make some examples now. F of X is minus log sum VI minus AI transpose X, and this of course is defined on the region where these are strictly positive. Thatís the interior of a polyhedron. So I have a polyhedron defined by AI transpose X less than VI. The interiors where thatís strictly less, and thatís where this makes sense.
This is by the way called the log barrier for this polyhedron, and you have a minus sign here. So you know, thatís kind of a very complicated highly non-linear function of X. You could work out the gradient; you could work out the hessian. This one wouldnít be too bad because if you work out the hessian, which I would not recommend doing by just getting in there and slogging it out, calculating partial squared F, partial squared XI XJ, but using some of the rules for calculating hessians, some of these are in the Ė these are in the appendix by the way.
It would work, but the easiest thing by far is to say this, hereís my function, ready, Iím going to apply Ė here, Iím going to take this function which is fi of Z is sum minus log ZI, okay? So thatís it, it takes a bunch of variables, takes their logs, takes minus sum. Thatís a convex function. Why? Well each of these is a convex function and a sum is convex. Thatís a convex function. Again, nothing stunning yet.
Now weíre going to simply pre-compose this function with this affine mapping. Iíll call it B minus AX, thatís an affine mapping. And that gives you this function and then here, I just applied this, this composition rule. Hereís another one; is the norm of any affine function, so norm AX plus V is gonna be affine Ė convex, sorry. Itís going to be a convex function of X.
Okay. Hereís one that maybe not totally obvious; so hereís one convex function and hereís another one, like that. The point wise maximum is this function here; it looks like that, okay? So at each point itís the maximum of the two Ė of the functions, okay? It looks like that.
By the way, in terms of epigraphs, what does this correspond to? Precisely. So calculating point wise maximum of functions, in fact you can even write some silly formula for it, you know something like this; epi of max over I FI is equal to the intersection over I of epi FI, something like that, okay?
So thatís the correspondence here. That preserves convexity, okay? And you know, it means for example, here this function which is the max of a bunch of affine functions as a Piecewise linear function, thatís convex. Obviously not all Piecewise linear functions are, but any Piecewise linear function expressed in this way is convex.
Hereís one, this is again, not obvious. The sum of the R largest components of a vector. So take a vector in R50 and the sum of the top three elements. Thatís a very complicated Piecewise linear function, itís convex.
Why is it convex, because itís the maximum of A transpose X or A Ė this is for the sum of the top three, is any vector with three oneís and 47 zeros. Now thereís a giant pile of those, thereís 50, choose three. But the point is, itís the maximum of 50 choose three linear functions done. Itís convex.
So you have to watch out here because you slowly sneak up on this and youíll find out after a while youíve actually done something and some of these things are not obvious. Proving this by another method would really Ė could be very, very painful. I mean, this one, hessian doesnít even exist, itís not even differentiable, which would save you the horrible trouble of getting in there by hand and working out a hessian.
But just proving that directly would be a real pain. Now this maximum business extends, and it extends to an infinite max or a point wise supremum. So hereís the statement; if you have a function of two variables, X and Y, and suppose itís convex in X for each Y in some set, you donít even know what the set is, totally irrelevant what the set is.
Finide, infinite, makes no difference whatsoever. Then it says that if you take Ė again, you can leave this as max if you havenít seen this before, this is simply Ė if you simply take the maximum over this possibly infinite collections of functions point wise you get a new function, thatís going to be convex.
And hereís some quick example; lets look at a couple of Ė letís start with this one, letís take the maximum eigen value of asymmetric matrix. Now we discussed that before, that is a really complicated function. For a matrix that is six by six or bigger, there is no formula for this, none, because thereís no formula for the roots of a sixth degree and order and higher order polynomial.
I mean, you donít need to know that, but itís a good thing to know. Well itís not useful; itís just a cool thing to know. This is really a very complicated function, the maximum eigen value of a matrix, but watch this. If the supremum of Y transpose XY over all Y that have Ė over the units sphere. The units sphere is the set of all points whose norm is one.
Now letís check me and see how the argument goes. Look at Y transpose XY. Now by the way, when you look at that you are Ė at this point you are trained and wired, all by the quadratic part of your brain should be lighting up. This has been proved in FMRI studies, okay? But the point is actually here the variable is capital X. What kind of function of capital X is this? Remember, you have to suppress the flashing quadratic neurons. Itís linear, thatís a linear function of X. This is sum, XIJ YI YJ, itís linear in X.
Okay? So for each Y thatís a linear function. This is a supremum of an infinite collection of linear functions. In fact, thereís one for every point on the units sphere in our end. Supremum of a bunch of linear functions, linear functions are convex. Supremum over these things is convex, thatís a convex function. And now you know something thatís not totally obvious.
I mean, itís not that unobvious either, but itís Ė this would be Ė since thatís like a two line proof or something like that, thatís not bad. And of course itís a disaster if you actually try to write out what lambda max is. I mean, just Ė you write down square root of Ė or thatís not lambda max, but if you write down the largest of the lambda Iís for which the character [inaudible] vanishes, itís all over, youíll never recover, this is an example.
Let me hit the next one, is composition. So under certain circumstances composition preserved convexity. So letís see what that is; if you have H of G of X, -- so the rule goes like this, and Iíll Ė the famous one is this, it says that a convex increasing function of a convex function is convex. Okay?
So if Ė this thing will be convex if you have a convex Ė if the outer function is convex and increasing, which Ė I mean non-decreasing, okay? So thatís the condition. And the way to derive these as other ones, for example, itís convex if this thing is concave and H is convex and non Ė and decreasing, roughly is what it is.
Now the way to check these is simply to write out the chain rules. So if you take Ė you imagine that these functions are differentiable and of one variable and you work out the second derivative and you get something like this. And from things like this, this is how you read off the rules, by the way, these things can Ė unless youíre doing this all day long these rules, you can easily forget them.
What you should remember is there are composition rules, and you have to go back and [inaudible]. Whenever Iím somewhere and I have to figure this out I actually quietly write this out and then figure out the rules myself. By the way, the rules hold even when these are non-differentiable. You donít need any derivatives. This is just to check.
And lets actually Ė maybe we can make up a new rule, letís make one up for fun, ready? All right, letís see, letís try here, lets say that G Ė Iím not gonna remember what Iím saying, so Iím gonna write it down. Suppose I told you Ė I have to get this right Ė suppose I told you that G prime is positive. So G is going to be increasing, and I told you Ė do I even need G prime? No, sorry. I take that back.
Iím going to tell you that G is concave. So that means this here, and lets make Ė lets see if I can get it right, and lets Ė let me ask, what are the conditions on H to make F concave? Does that make sense? So what we want is Ė weíre gonna say that thatís Ė weíre gonna assume that thatís less or equal to zero, and I need thing to be less than or equal to zero. Well that means this has to be positive. So I have to have H prime positive. So H has to be increasing.
And then I look over here, thatís positive no matter what, so I need this to be less than or equal to zero, and I just made up my very own new composition rule, and itís this Ė I hope I get this right. Okay, let me go very slowly, if H is concave and increasing and G is concave Ė okay, ready, so a concave increasing function of a concave function is concave. Isnít that right?
Anyway if I got it wrong, I donít care, you get the principle. Okay, itís a 300 level class, I can mess up minus signs all I like, thatís your job to fix them and stuff like that. So I think I said that right. So letís look at some example Ė by the way, the only one I actually remember, to be honest with you is a convex increasing function of a convex function is convex.
Then I have Ė I remember one other thing, there are composition rules and then I have a note attached to that, thereís lots of them and they get very confusing, although there are people, Iíve noticed who just know them immediately. Theyíll say, oh yeah, not thatís a concave increasing function Ė no, decreasing function of a convex function, thatís con Ė something or another and Iím like, really? I donít know. Then they have some internal pneumonic or something for doing this, but I donít know them just to tell you the truth.
Letís look at some examples; the exptH of a convex function is convex. And thatís by the way is very interesting. It says that the exponential actually can only Ė it preserves positive of a curvature. If a function curves up, [inaudible] also curves up. Thatís what itís saying geometrically.
Inverse is interesting. It says the inverse of Ė itís not the inverse, itís the reciprocal, there you go, thatís the English word, the reciprocal of a positive concave function is convex. So, for example, letís see if I can get an example of that. One over the square root is a Ė that would be a convex function, and indeed it is, itís X minus one half, and it goes kinda like that Ė sorry, for you, like that.
Actually, by the way, thatís fine because flipping the axis doesnít change the curvature property, so I donít have to draw it for you. If it looks convex for me it looks convex for you. So thatís the composition one. There are then vector compositions ones and Iíll say Ė Iíll give some examples here. So here you have vector composition, so here I have a function of Ė a multi-argument function of a bunch of other functions.
And now, of course, the possibilities just explode because you have horrible things where you have a function thatís sort of increasing in some components, decreasing in others, the arguments themselves are either convex, concave, and it gets very complicated.
But here you have something like this; X is convex if all of the functions are convex, thatís the easy case, and H is convex and decreasing in each argument Ė sorry, non-decreasing, roughly speaking increasing. By the way, thereís one subtlety here that I want to point out, there a tilde here.
Tilde is the extended value extension. And Iím not going to spend time in lecture going over this, but you want to read that part of the book at some point about this tilde because thatís not a typo, itís very important.
Let me give a couple of examples here. By the way, what this shows is that actually the earlier rules that you learned are actually Ė they can be derived from this. Let me give an example; how about the sum of convex functions? So hereís a function H, H of Z is one transpose Z, itís the sum of the Zís. Okay?
This function is Ė well letís work it out. It is certainly convex, right, in Z? Itís also increasing in each argument, do you agree with that, because it just sums the I. So itís obviously increasing in each argument. Therefore, by this composition rule, it says that if I compose this with a bunch of functions, each of which is convex, the result is convex.
So I Ė so from this rule Iíve rederived the simpler rule which is the sum of convex functions is convex, everybody see that? Letís try one more. Lets try H of Z is the max of Z. Well the max function is itself convex. Itís also Ė itís increasing in each argument, I mean thatís clear, right?
If you increase any element of a vector the max doesnít go down, so itís non-decreasing. Therefore, this subsumes actually several of the earlier ones. Actually it turns out thereís only two rules. Thereís this rule Ė in fact there are only two rules, thereís this rule and thereís the affine pre-composition.
But itís good for a human being at least to think of them as eight or 10 rules or something. Quick question?
Student:Increasing [inaudible] do you mean also that when you Ė that they increase from vector to vector, or just that Ė [Crosstalk]
Instructor (Stephen Boyd):Nope, I do not mean that. I mean this Ė I mean, hold all element fixed except one. Increase one the function cannot go down. Thatís what non-decreasing in each argument means. Okay, so I think weíll quick Ė this covers, these are the basic rules. By the way, this was very fast, [inaudible] weíll cement this, donít worry, and weíll continue next time. And then maybe even by next week itíll get interesting.
[End of Audio]
Duration: 77 minutes