ConvexOptimizationI-Lecture04

Instructor (Stephen Boyd):Great, I think weíre on. Let me start today with a couple of announcements. Letís see, the first one Ė you can go down to the pad, is that we have posted homework 2. So Ė which Iím sure youíll be very happy to hear about.

And homework 2 by the way is Ė I promise, sort of the last one like homework 1 thatís this kind of slog through interminable math. So itíll be the last one. And then after Ė and starting homework 3 I actually promise itíll start getting interesting. So thatís our promise.

Letís see, the other announcement Ė I guess this oneís important; next Monday there is a holiday. When Ė thatís normally when the section is. Thereís gonna be a section and we are trying to figure out when itís gonna be.

Weíve been offered two, like not optimal times. I think one is like Tuesday Ė next Tuesday at 11:00 a.m.

Student:[Inaudible]

Instructor (Stephen Boyd):Wait till you hear the other one. Wednesday Ė oh, I donít remember this one Ė I donít know, I canít remember anymore. Kwangmoo, do you remember? Wednesday at 2:00 p.m.?

Student:2:15.

Instructor (Stephen Boyd):2:15.

Student:Tuesday.

Instructor (Stephen Boyd):Now Tuesday, what do you mean?

Student:[Inaudible]

Instructor (Stephen Boyd):Wait, well we have random Ė okay, does anyone have like, I donít know, violent or other Ė can articulate a very good reason why it shouldnít be either Tuesday or Wednesday? Our thought was that Wednesday was pretty close to the homework due time or something like that.

So Tuesday Ė Tuesday at a crappy time. Itís also in a crappy room by the way also, so Tuesday crappy time, crappy room. Sure, okay. Hopefully that hasnít been taken now, so weíll Ė this obviously weíll announce Ė weíll announce this on the email list, and weíll also of course put it on the website, so when we Ė when thatís fixed, but there will be a section that would have been Monday, itíll be Tuesday.

No, wait a minute; I can announce it in class, so Ė which is strange. Weíre gonna repeat that when thereís another Monday holiday later in the quarter, and we donít feel like missing sections, so whatever happens next week is going to happen one other time during the quarter.

Okay, another Ė another announcements, my office hours, I had promised to expand them, modulo quals, and in fact I Ė it looks like, you have to check though, it looks like I actually will have some time today after class, and then also after 3:30 today, so Iíll be around if people want to come by.

And then sometime of the weekend, or whatever, Iíll figure out when my actual office hours will be. But Iíll put something Ė thereíll be something in the afternoon, like maybe Tuesday afternoon or something like that, and Iíll coordinate that with the TA office hours.

And then thatíll of course be announced. Okay, any questions about last time, our whirlwind tour through Ė or actually really like a sprint through convex functions? If not, weíll finish that up today. Is it Ė a ton of material here and the only Ė as I said last time, the important part is that you should get a Ė just sort of the basics, get the basics down and then some of the more advanced stuff youíll get when youíre forced to use it.

Student:I have a question.

Instructor (Stephen Boyd):Yes?

Student:When we were talking about matrix space and restricting it to a line to determine the [inaudible] Ė

Instructor (Stephen Boyd):Right.

Student:How do you ensure that when you add a multiple, to make sure you can stay in that same space?

Instructor (Stephen Boyd):Oh, well the Ė right, so the question had to do Ė we were analyzing convexity of a function of matrices, which I forget what it was. I could look back here, but I wonít. Of symmetric matrices, and we had to generate an arbitrary line in symmetric matrix space. And we did that by picking a symmetric base point, that was some X0 or Ė I donít know how I labeled it, and then a symmetric direction, and that was this; X0 + t V.

Now itís a sub-space, or quite clearly if you add a symmetric matrix to a scalar multiple [inaudible] you get a symmetric matrix. So this is a symmetric matrix. Now there we also assumed, without loss generality, that X0 the base point, was positive-definite because I think it was the log get maybe we were looking at, or something like that.

This was assumed to be positive-definite, but V was not, V was a direction. I donít know if I answered your question. I think I just said what I said last time. Are you Ė howís your cognitive state? Is it improved?

Student:So what youíre saying is when you add matrices of the same type as that one, you stay in this?

Instructor (Stephen Boyd):Itís a sub-space, absolutely. A set of symmetric matrices is a sub-space, ad a linear combination of Ė a linear Ė a multiple of one to another, youíre in the sub-space. Okay, letís move on. And let me just remind you where we are Ė what the big picture is.

The big picture comes down to this; youíre gonna want to Ė well two things, youíre gonna want to construct convex functions, but youíre also gonna want to deconstruct convex functions. And what I mean by that is that thereíll be some function thatís gonna come up in some problem, I donít know what it is, you know, medical imaging or optical control.

It doesnít really matter. And youíre gonna want to know is it convex or not. And in very rare cases, you will be forced to do this by hand. So by hand means you whip out your theta between 0 and 1, two arbitrary points, and try to argue this inequality.

Thatís gonna occur occasionally. If the function is twice differentiable, the equivalent of doing it sort of by hand Ė and by the way, every now and then you do have to do things by hand. So to Ė you will actually have to work out the hessian, and it might be easy, it might be not easy to establish that the hessian is positive-semi-definite.

But in fact, the way you really want to do things is reserve these two for when there is no other option. What you really want to use is the calculus of convex functions to establish convexity. And for that matter, to decompose a function in front of you into a sequence of operations with the leaves of the tree sort of being adams, and adam is a function known to be convex or concave or something like that.

And here you want to know about the operations that preserve convexity, and we look at sort of the really obvious simple ones, you know, nonnegative weighted sum, thatís kind of obvious. Composition if an affine function, thatís also very easy to show. Pointwise maximum supremum, we looked at examples too, and by the time you get to sort of supremum, you can actually start showing convexity of some things that at first glance look pretty complicated.

I mean, even with maximum, for example, to show that the sum of the five largest absolute values of entries of a vector is convex. Itís not Ė first of all, itís not an obvious fact, itís not that easy to show. Itís like one line if you use a pointwise maximum argument.

Then we look at composition. Thatís where we are now and Iíll finish that off now. And I pointed out that in fact, these are Ė this is not a minimal list. And in fact, the minimal list, as far as I know, of the things covered here, this is basically like two. There are two operations that are used because composition actually subsumes nonnegative weighted sum and pointwise maximum supremum.

All right, so letís look at composition. The composition, the most basic case, itís the only one I actually remember, is the scalar case, so here you compose a function with another, and the question is, when does that preserve convexity?

And the answer is, and this is the only one I remember, is a convex increasing function of a convex function is convex. And here I should Ė I do have warn you, you should Ė I mean youíre gonna have to know that there is a subtlety. This is not a subtlety interesting for its mathematical interest or anything like that. This is a subtlety that comes up in practice, and will mess you up if youíre not aware of it.

And the subtlety is this; the Ė when you say an increasing convex function, the increasing actually refers to the extended value function. Okay? For example, you have to distinguish between for example, X squared Ė now X squared is clearly not Ė X squared on ours, obviously not increasing.

However, X squared on R+, if you restricted to R+ you would say, well thatís increasing. And in fact, any normal person would say it. And it is indeed increasing, however; remember that if you simply Ė if you define it to be undefined for ĖX then the extended valued version actually looks like this.

And it actually assigns a value minus infinity. At this point, this function is by no means increasing, I mean, not remotely. It starts out at + infinity, and then dives down to zero. So thatís non-increasing. Now there is another version of this, and thatís this; you take the square, like this, and then you assign it to be zero below for X-, okay?

This thing in increase Ė this thing is actually increasing on Ė the extended valued version is increasing, this function. So these are things unfortunately you do have to know, that do come up, you can get deeply confused, and it can actually have real consequences if you donít Ė if youíre not sensitive to this.

Thereís some irritating things here, let me just show you one. Hereís one; letís suppose we want to look at something like the norm of AX Ė B to the 1.62. Donít ask me why, but we do. And we want to argue that thatís a convex function of X, okay?

Well you can work from the inside out and you can say, look, thatís an affine function. Thatís a convex function of an affine function, and therefore norm AX Ė B is convex. Now what we want to do is say something like well this is a convex function to the 1.62 power. And Iíd like it to say a convex function of the 1.62 power is convex. Thatís kind Ė itís sort of true, but sort of like not true, or something like Ė you have to be very careful now.

All right, now I canít just define Ė letís get my notation consistent, H. I canít just say H (Z) = Z to the 1.62 for Z+. I mean, I can say that, but the problem is this function here Ė I mean, this is by the way, slang or Ė not quite slang, itís an informal way to say that the domain is non-negative Z. Thatís what this is.

You cannot apply the composition theorem, absolutely cannot. So the correct way to do it is to say this, there you go, is to define the function that looks like that. So thatís the Ė and I know this is a little bit silly, but this is very important, and it will mess you up if you donít Ė if youíre not sensitized to these things.

This is the function which is 0, if you pass a negative argument to it, and itís the number to the Ė itís the argument to the 1.62 power if you pass a positive argument to it. That function is actually increasing and convex on the extent Ė its extension. Its extension is. And now I can apply this to this and then I can make another argue Ė I can say, oh hey, look at that.

What Iím passing to the function H is actually always nonnegative, itís a norm. But if you + -- if you add Ė therefore, this part, this case, or this switch in the definition of H never occurs. And so indeed, we find that this here is convex. So I donít know if that makes sense, but thatís it.

So I didnít mention that last time, but I thought Iíd just quickly mention it now. So letís look at composition; the sophisticated version is the vector composition virgin Ė version here. So if you have vector composition, it means the argument G is actually multiple functions.

So letís say itís G1 through Gk, so I wanna Ė Iím gonna take a vector function, it takes many arguments of a bunch of functions, and again, the base composition theorem, or the canonical one that I remember is a convex increasing function of a convex function is convex.

And by increasing I just mean non-decreasing, so itís just informal. So a convex increasing function of a con Ė of convex functions, I should say, is convex, so that looks like this. And the way Ė now when you prove these, making no Ė I mean its three lines, right. You make no assumptions whatsoever about differentiability. But the way to sorta understand what happens is to write out the formulas.

The same for the composition case, the scalar composition. So here we will take N=1, so the second derivative of F is = to Ė this is just a chain rule for the second derivative of this thing, itís the Ė its G prime, thatís of course Ė thatís DGDX which is a vector times then the hessian evaluated G of X times G prime, so thatís a quadratic form, and then plus this thing.

And thatís nothing but sum. This is partial H, partial whatever, itís I component, evaluated at G of X and then times G prime prime I of X like that, okay? And now you can see for exam Ė we can check things out.

If G is convex this first term is nonnegative no matter what, and then the second term to be convex you need Ė sorry, that if H is convex, you need the G Ė if all the Gís are convex, it means this vector is an element wise nonnegative vector that interproducted with Ė I donít know if thatís a verb, but it is now. You interproduct that with the gradient of H.

If H is increasing thatís a nonnegative vector, and so this thing is positive too. But now you can see that thereís actually other ways to constructing. So for example, a more sophisticated version would go something like this; letís see if I can get it right. A convex function of a bunch of others is convex provided Ė okay, now let me see if I get right, Iíll start at this thing, provided for each argument either the argument itself is convex and H is increasing in that argument, or the function Ė the argument is concave and H is decreasing.

Did that Ė I think that came out right. By the way, there are people who know these things and memorize them Ė not memorize them, but they have very good pneumonic or something like that. Iím not one of them, I just redo this. Whenever I see it Ė whenever I have to deal with it I go somewhere quite, or find a little piece of a whiteboard or paper and just write out formulas like this or something.

So this is vector composition. And we looked at some examples the last time. One more, and this would pretty much finish off the basic operations that preserve convexity of functions, and this is minimization.

Now by the way, weíve already seen that max Ė thereís a sense in which maximization Ė thereís not a sense, sorry, maximization preserved convexity. And weíve already seen that. Thatís the maximum pointwise supremum. Notice that in the case of maximization, there is no requirement of any kind on how the variable youíre maximizing over enters into the function. Everyone know what Iím talking about here?

The point is that if I have a function of two variables, F of XY and I want to look at something like this, letís say, and define that Ė Iíll try to get my notation right, as G of X, this thing, the only requirement here for convexity of G is the following; is that for each Y this function is a convex function of X, period.

Thatís the on Ė and for that matter, this thing does not have to be convex, it doesnít even have to Ė this doesnít even have to be like an RN or even be a vector space, it could be an arbitrary set. So supremum over a completely arbitrary family Ė so if you maximize, well assuming you can Ė when you maximize thereís no condition Ė I mean, nothing matters as to how Y enters into this function.

Now minimize also works, and thatís a bit a weird because generally what youíre gonna find out with all these things are they weird asymmetries. In other words, itís Ė that you can do one thing but not the other. The cool part Ė well why we have this class is it turns out often the thing you can do is the thing you often want to do.

Thatís Ė otherwise this would be interesting, but not useful. Whatís interesting here is that there Ė you actually Ė itís very rare that you can sort of both maximize and minimize something like that because theyíre sort of like Ė the have opposite flavors. But in fact you can, but itís a much stronger condition, hereís the condition; you can minimize Ė some people call this partial minimization, but the way, because full minimization would be to minimize F of XY over X and Y.

You can minimize over a variable provided Ė and the result will be convex provided Ė this is very important Ė F is jointly convex in X and Y and the set C over which you minimize is convex. Okay? And then this is easy to show and all this sorta stuff.

Now letís look at some examples, hereís a very basic one and itís one you may have seen before anyway. Letís take a general quadratic form and block it into two Ė block the variable into two groups, X and Y. So this is a Ė thatís a general quadratic form. Itís the quadratic form associated with this matrix A B B transpose C block.

All right, well we know that F is jointly convex in X and Y provided this full block two by two matrix is positive-semi-definite.

By the way, when is this function in X for any Y, whatís the condition, on A B C D?

Student:A is positive [inaudible].

Instructor (Stephen Boyd):Precisely. Itís only that A is positive-semi-definite. Thereís no condition whatever on B and C. So if I were to take this function here and maximize F of XY over Y, the result would be convex provided only that A is positive-semi-definite. Yes?

Student:[Inaudible] on what condition on A could you then state that that function was not convex?

Instructor (Stephen Boyd):On what condition on A Ė oh, which function? The Ė

Student:That one.

Instructor (Stephen Boyd):F, okay, in this case because itís a quadratic form, itís necessary and sufficient. So if this matrix here has one negative Eigen value or more, itís not convex period. And not only that, we can easily produce points Ė I can make you a line where instead of going like that it will go like that. Actually, you tell me the line? You tell me the line?

Student:It needs to go like that?

Instructor (Stephen Boyd):Yes. You have a quadratic form with one negative Eigen value.

Student:[Inaudible]

Instructor (Stephen Boyd):Yeah. Take the Eigen vector, you sail that along that line and itíll go like that, itíll go down. And that pretty much ends the discussion on convexity for that function. Now if I minimize this function over Y here Ė and what weíll do is Ė this is a technical condition. This actually works with C positive-semi-definite.

In fact, Iíll tell you what happens when C is only positive-semi-definite. When C is positive-definite, you minimize over Y, yeah, you can do this analytically, because then you take the gradient and you set it equal to 0 and all that.

And you end up with this; you know, you find the optimal Y, you back substitute that into this form, and what youíre gonna get is this thing. X transposed [inaudible] A Ė B C inverse B transpose. Now this is the sure compliment, so Ė which you maybe have seen before, if not, I assure you on homework 3, is that right, yes, homework 3 youíll see it, because itís just a good thing to know about.

So thatís called a sure compliment of Ė maybe of C in the big block matrix, so thatís the sure compliment. And thatís convex, and you know what that tells you, that tells you this matrix has to be positive-semi-definite. So you may have seen this somewhere in some other context, maybe estimation or Ė I donít where, but it would come up. It would come up in mechanics and come up I all sorts of things, this sure compliment here. Yes?

Student:[Inaudible] the graph is convex?

Instructor (Stephen Boyd):Joint convexity means that if you plot sort of X and Y, the whole Ė then its bowl shaped the graph.

Student:Okay.

Instructor (Stephen Boyd):Convexity in X means that if you fix Y and sort of look at a slice, it would go like this, it would go up, and it could be quite different, right, the two. Hereís one; the distance to a convex set is gonna be Ė the distance to a convex set is convex. By the way, thatís a rather complicated function. I mean, itís not simple.

If I draw some convex set like this, lets just get a rough idea of Ė I mean, the set is everything in here, letís work out what the distance to it is? Whatís the distance in here? 0, now I can draw the level curves of the distance, so for example, I might dra Ė I might end up Ė I wonít do this right, but you know, you get the idea. It would look like Ė something like that, right?

These would be all of the points with a certain distance from here, and then as you move out, actually of course these get smoothed out, right, so if you have sharp corners in a set then the distance level curves get smoother. Like that, so youíd get things that would look like this, letís see if I can do it right, thatís like actually circular. Is that it? Yeah. So I think thatís it.

This is to give you a rough idea. So this function, though complicated, is convex. And the argument would go something like this; youíd say, well letís just take a look at this thing. I look at norm X Ė Y. Here I take norm X-Y, I have to make a Ė this is a very important argument, thatís jointly convex in both X and Y.

Thatís the critical part that allows me to do this, okay. So thatís gonna be convex.

Student:[Inaudible]

Instructor (Stephen Boyd):Oh, here, yeah, let me give you the full proof, okay? I mean, I could rip out Ė I could get out thetaís and stuff and do it by hand, but it donít want to, show Iíll show you the snappy proof. Snappy proof goes like this; I have two variables, X and Y. What kind of function is X-Y of X, Y?

Student:[Inaudible]

Instructor (Stephen Boyd):Yeah, itís affine, sure, but itís linear, which is stronger than affine, but affine is good enough, youíre absolutely right. Therefore this is a convex function, the norm of an affine function of X and Y. Thatís better than getting out Ė whipping out the theta, wouldnít you say?

Student:[Inaudible]

Instructor (Stephen Boyd):Whatís that?

Student:[Inaudible]

Instructor (Stephen Boyd):Itís actually liner, but affine is good enough to establish convexity. Weíll look at the last example of Ė this one is sort of on the borderline between sort of basic methods and more advanced ones, although it actually comes up in more applications than you think, thatís the problem with these things, is you look at something, think itís esoteric, and then by the time you look at enough problems you realize it actually does come up.

So thatís the so-called perspective of a function. So the perspective of a function is a transformation. It does this; it takes a function from RN to R and itís gonna produce a function which has one more argument, so itís gonna be a function from RN plus 1, so if you want to explicitly make it a pair, RN cross R to R, and itís this, itís G of X and T is T F of X over T.

So it does a weird scaling, it scales X first and then it multiplies by T, okay, so thatís the Ė and thatís a point Ė thatís a function jointly of X and T. So obviously for any fixed positive T, this is obviously a convex function, because, you know, X over T is linear, therefore, affine. F is convex and then you multiply by a positive constant, thatís clearly the case.

Maybe itís less obvious that when you fix X, TF of some vector divided by a constant is convex, maybe thatís less obvious, but Ė and showing this is not hard Ė by the way, the Ė as you might guess, the perspective of a function is related to the perspective of a set, so Ė and there is some connection between the epigraph of Ė this is sort of Ė you remember that one of the Ė probably the only maybe not obvious transformation of sets that preserves convexity was the perspective transformation.

So not surprisingly, the perspective of function is related through epigraphs to that, but I wonít go into the details, weíll just look at some examples. Here, X transpose X, thatís the sum of XI squared. Well thatís obviously convex, but that says that T times X over T transpose X over T, which when the Tís cancel out is X transpose X over T is convex, okay?

So there you go, your quadratic over linear function is convex and we got it by a quick perspective operation on just the norm. So, we went from Ė we took something that was completely obviously convex, and then it was something where this is less obvious.

Okay, so you start with negative log rhythm, and then you form T X Ė T F of X over T and you end up with T log T minus T log X and you actually get that this in convex on X and T. Some of things you know, like if you fix X for example, this is linear in T so that hardly matters and you get the negative entropy over here.

Itís not so obvious that this is jointly convex in both X and T, and by the way from this you can derive the various things like negative entropy, cobac libeler divergent and all that kind of stuff. Another example would be something like this; if F is convex then G which is an affine Ė this has to be positive Ė affine positive function multiplied by F of a general linear fractional transformation, but with the denominator required to be positive, thatís going to be convex, so Ė and thatís on the Ė thatís where this is in the domain of that Ė or well, sorry, when C transpose X plus D is positive and this image point is in the domain of that, so thatís gonna be convex.

By the way, I recognize fully that none of this is motivated and all that kind of stuff, so donít think I donít. Weíll look at a couple more concepts; one is the conjugant function which is going to get heavy use later Ė later in the class, so Ė but actually not for a couple of weeks, but then it will get very heavy use.

So this is just gonna be your first glimpse at it [inaudible]. Oh, I should add that all the things weíre doing are complete Ė these are completely standard things, these are not weird esoteric things, theyíre Ė like the conjugant, everyone knows what it is. Itís got slightly different names, some people call it the finshell conjugant and thereís also some other names for it, depending on how fancy you want to sound when you use it, but these are just completely standard concepts here.

Iíll let you know when we deviate into something thatís not Ė so take any function, convex or not and you define the conjugant function, thatís F star of Y, and thatís the supremum of Y transpose X minus F of X over X. And so letís see if we can figure out what that looks like graphically.

It looks like this; you form the graph of F here, and then you form this function, YX Ė well weíll just do this in one variable. So, I form a function where the slope is Y, so thatís Y, the slope here. And then it says I wanna maximize the deviation between these two functions.

Well, one side of deviation Ė anyway, I want to see how much smaller does F get than this line? And here you would see that that point occurs like right here. And thatís gonna occur, by the way, when this is Ė if itís smooth, when this thing is tangent. Either way, itís gonna curve when itís tangent to this graph here, like that. And then this point will give you the conjugant function here.

So this gives you a very good way of working out the conjugant function. For example, if I decrease Y slightly, what will F start of Y do, or to be more convenient, let me ask you, what will minus F star of Y do for this function? This just requires you to visualize. If I decrease Y slightly, what happens here? Well I changed the slope, okay?

And then I go down here and Iíll make a point of tangency, itíll shift down a little bit, right, or to the left Ė well down and to the left. And that Ė this point of intersection will creep up a little bit, everybody agree?

So in fact, minus F star will increase slightly, so F star would decrease. Now by the way, what happens as this slope changes? Thereís a critical moment here, and the critical moment occurs right there, everybody see that?

So at this very critical moment something interesting happens. I draw a line here and that would sort of give me that value of the conjugant there, negative value. Now as Y decreases further, what happens? This gets interesting. What happens is the point of contact leaves here, shifts over here like this, and this thing now starts going down. Everybody see this?

This is your first chance when we do duality, youíll get tons and tons of chances to do visual exercises like this, but this is your first exposure to this idea. So this is the conjugant Ė now hereís an interesting about a conjugant function; it is convex Ė the F star of Y is convex in Y, completely independent of whether X is convex or not, Iím sorry, F is convex or not, totally independent.

And itís really stupid; look at this, what king of function is whatís uncovered there above my hand and to the left Ė right of my finger as a function of Y? Itís affine Ė thatís affine in Y for each X. Hey, thatís a supremum over affine functions. Thatís convex, period.

So this function is convex, okay? Actually I can tell you something fun Ė Iíll go ahead and let you Ė no, Iíll mention it now, its fun. Now you know usually Ė youíve seen conjugantís right? Youíve seen conjugateís for complex numbers, youíve probably seen conjugateís in some other Ė I mean as a bunch of places work.

But, when someone uses in a mathematical context the word conjugant, and thereís some sort of cultural background there. A conjugate is usually something which when applied twice does what?

I mean weíre just talking roughly, right? If you do a conjugate twice, what do you get?

Student:The original.

Instructor (Stephen Boyd):Part of the original thing, or you know, something Ė and if youíve seen this in a lot of different Ė other context, it might be something related to it like the closure or something like that, that would not be surprising, the conjugate.

So you might guess that the conjugate of a conjugate of a function would be an interesting thing, and you might even guess it was the function, but Iíll tell you what it is and its super cool. This is just for fun. Iíll tell you what it is; thereís a function, and the conjugate of the conjugate is not the function, itís something called the convex envelope of the function.

And it is the Ė letís see if I can say it right Ė itís the largest convex function that fitís under this one. I think I said that right. So in this case hereís what you do. In fact, I can describe them to you as epigraphs very easily, ready? Itís this; we have to have a name for the convex envelope, so F enth is going to be the convex envel Ė is going to be the function which is the convex envelope of F, and I think I can write it Ė I hop this is gonna work, there we go, it did work.

I donít know if thatís clear. This says that you take the epigraph of a function and you form the convex hull of it and thatís the epigraph of the envelope. So the Ė letís sketch here the convex envelope of this function. Well itís gonna look like this, itís the function itself out here.

It turns the corner and then it goes like this Ė ignore that, my little Ė and then it hugs the function again. Make sense? So that is what just if youíre curious, this is. Itíll come up, youíll see, later. So thatís what it is. By the way, for a convex function, roughly speaking F star start is F.

Roughly speaking, means that there are some technical conditions and things like that. Okay. Letís look at some examples of the conjugate, and this is actually a calculation youíre going to have to do a bunch of times. Itís going to come up in duality and things like that, so youíre gonna have to do conjugate calculations.

So Iíll just do one and then I quit, and then youíll do some, I promise, and then thatíll be enough. Then after that youíll do it on demand when you need to.

So the negative log rhythm is convex, letís calculate itís conjugate. Of course I can calculate the conjugate of the log rhythm if I want, I can [inaudible] of anything I like, but weíre interest mostly in conjugates of convex functions.

So the Ė lets take a look at this function, so this looks like this, I guess thatís the negative log, something like that. Thatís the negative log, and I want to work out the sup or maximum over X of XY plug log X. Now log X of course is unbounded above. So obviously here if Y is 0 or positive, this thing is unbounded above. So the supremum is plus infinity, okay? So that much is clear.

Now letís assume then that Y is negative. This is interesting, because now itís a log thatís growing, but then you subtract a linear thing and in fact a linear thing will win out on a log always, and they will Ė to find out exactly where they Ė it reaches it [inaudible], you just take the derivative and itís a Ė these are smooth things.

So you take the derivative of XY [inaudible] X and you get Y plus 1 over X is equal to 0. That is the point X that would maximize XY plus log X, provided Y is negative, okay? So this says that X is one over minus Y, okay? And thatís a Ė and then you plug this X back into XY plus log X and you get minus 1, thatís the XY part, plus log 1 over minus Y, okay? And thatís this thing, thatís the conjugate, and thatís for Y.

And itís important when you work out a conjugate to get the domain correctly here, so okay. If you have a strictly convex quadratic, so thatís one half X transpose QX Ė by the way, the one half is here because Ė the one half is there often because it simplifies some formulas or makes them look prettier.

Other cases where youíre not like basically calculating gradients, you leave the one-half off, and Ė itís sorta you choice. Actually, this is one of those forea transform like things where you can stick the one over two pi anywhere you want, and of course everybody does it differently which is pain in the ass, so. So, itís one of those stories, but if you put the one half here youíre gonna get something pretty.

So if you work out the supremum of Y trans Ė this is a quadratic function [inaudible] equal to 0, you get X and all that, and you get a beautiful thing that the quadratic, the conjugate of a quadratic function associated with quadratic Ė with positive definite matrix Q is Ė is the quadratic for with positive definitive matrix Q inverse.

So thatís what that comes up in. Conjugate by the way has all sorts of interpretations, but I put them off until Ė like in economics and things like that. I put them off until we get to duality, and we get to cover everything on this.

Quasiconvex functions so I think we have only one more of these and then weíre done Ė you done with your first dose of convex functions. Oh, I should let you know thereís like a whole industry, in fact almost like a little subculture of Ė itís been going on for like 40 years, where people worked out little variations on convexities, so thereís quasiconvexity, thereís sudoconvexity, and I forget, that goes on and on and they Ė anyway, and Ė it keeps otherwise dangerous people off the streets so itís Ė it provides a positive Ė the net benefit to society I believe is positive.

And itís fine or harmless people, theyíre nice enough, but just to warn you, you would find tons and tons Ė and they get of course all bent out of shape and they make little diagrams with Ė this show what implies what, and they go, did you know that a strictly quasiconvex, blah, blah, blah, is actually pseudonormal convex? Anyway, and theyíll make Ė draw diagrams.

And if you donít adequately say, wow, thatís really interesting, and then youíll be in trouble. So thatís the best thing to do, thatís my recommendation when you are confronted with someone. By the way, someone who pesters you and says did you quasiconvex or sudoquasinormal strictly convex.

I donít even know what they do; you can look on Google to find out all the different things. We have looked at all of these things, the 27 different extensions, and identified two that seemed above threshold. So I just want to let you Ė this is one of them, quasiconvexity.

So quasiconvex is this; its s function whose sublevel sets are convex. Now of course any convex function satisfies that, but thatís not enough and hereís an example which is obviously not convex, but every sublevel set is, so want happens is here, to get a sublevel set you just put a threshold. You know, I really should put it above like that, because you want to look below, I donít know.

You need a fishing line to put across here. And you pull it across here, and you can see for example, the alpha sublevel set is an interval the beta sublevel set Ė I mean assuming that this continued indefinitely, is actually an half infinite interval, but itís convex, so this is Ė this is fine.

And you call something quasiconcave if itís Ė if the super level sets are all convex. So letís just look at some examples to see how this works. Squared the absolute value of X, I guess thatís a function that looks like that; thatís quasiconvex on R, and itís obviously not convex.

Itís quasiconvex because you Ė you just simply put a line anywhere and you get an interval. Hereís an interesting on, the ceiling of Ė of a number. So the ceiling of a number is the Ė itís the smallest integer thatís bigger than the number Ė bigger than or equal to the number, thatís the ceiling.

Now this is pretty weird, because thatís integer valued, okay? Now integer valued functions, for example Ė well letís have a discussion about integer valued convex functions. Who would like to start the discussion? What can you say?

Letís discuss the theory of integer valued convex functions. Do they exist?

Student:[Inaudible]

Instructor (Stephen Boyd):Constant, okay. For example here is a function, 3 convex, letís keep going, anything else?

Student:[Inaudible]

Instructor (Stephen Boyd):Four, thank you. Itís going to be a long discussion at this rate. Okay. What do you think? Iím not asking Ė I donít want to prove Ė weíre talking eyeballs here.

Student:[Inaudible]

Instructor (Stephen Boyd):Yeah, okay, so it canít Ė so the only Ė these are the only ones, right, the only integer valued Ė you couldnít have two values, like you couldnít have 3 at one place and 4 at another, okay. So that was a very Ė you are now in total command of the extensive theory of integer valued convex functions. Basically there hardly are any other than Ė theyíre known as constants, actually.

So thatís why this is a bit interesting, that the ceiling is gonna Ė which is obviously integer valued, and itís going to be quasiconvex. And actually, whatís kinda cool about it is we will be able to optimize things involving these. You will, I absolutely promise, thereíll be problems involving like FIR filters, again, if youíre in EE, and weíll ask questions like, whatís the minimum order required to have one decibel ripple over this [inaudible] minimum 40 decibels rejection over here.

Now when you just hear that question the convex part of your brain is not lighting up, because you just hear Ė you heard order of a filter, thatís an integer. Being familiar with the theory of integer valued convex functions and knowing that itís a very short theory, you might say, well thereís no way you can optimize something involving an integer, well you canít, so weíll get to this.

Thatís quasi [inaudible] and itís easy enough to check. These are just stupid things, every ceiling, the sublevel set of a ceiling function is just an interval, and so itís very easy. Log is quasilinear Ė product X1 X2, thatís a quadratic form that is neither convex nor concave. Itís the quadratic form associated with that matrix.

That matrix, just take a look at it, itís not positive definite and itís not negative definite, itís got split Eigen values, so itís neither. Itís certainly neither.

Student:[Inaudible]

Instructor (Stephen Boyd):Quasilinear it means quasiconvex and quasiconcave. And by the way, these functions do not have to be linear, not even remotely. A linear fractional function is quasilinear, meaning it both quasiconvex and Ė this is the most famous by the way, because this function here is absolutely not convex.

If you have a liner fractional function it looks something like this Ė well there might be a place where it goes into minus infinity or something like that, but itís gonna look like this Ė like that. I drew the wrong one; it doesnít have to look like that. All right, I take that back, it can be, depending on A B C and D, sorry. But in general it doesnít matter.

A liner fractional function is gonna be quasiconvex and quasiconcave. There are famous problems that have this form. Hereís an interesting on; distance ratio, so the ratio of your distance to two points, on the set of points where youíre closer to one than the other. This is a half space by the way, which I think Ė itís a half space. So as long as youíre closer to one thing than another, then your ratio of your distance to that then the other is actually gonna be quasiconvex.

But the way, all you need to do is take this function and put a gamma Ė take it to the gamma, like if youíre in wireless and you have here the SIR from a good transmitter and a bad one. And it says, -- well you have the inverse SIR, and it says that the inverse SIR is convex, quasiconvex, provided youíre closer to the transmitter you want to hear than to the interferer.

Sorry, I have to throw these in, because Iím bored. Sorry, go on?

Student:Is it the same of convexity preserving operations that work on these quasiconvex [inaudible]?

Instructor (Stephen Boyd):No, thatís what cool, itís not. Itís a more general set. For example, a monotone function of a quasiconvex function is quasiconvex, because quasiconvex just basically says, you know, all that matters is your sublevel sets. So if I put some monotone transformation it works just as well. So itís a different set.

Oh, and itís neither a restriction nor a generalization of the others. For example, the sum of two quasiconvex functions is generally not quasiconvex. And letís just look at a quick example to show that; hereís a quasiconvex function, hereís another one, and the sum, for sure, is not.

Oh, thereís another name for quasiconvex, itís called unimodel, thatís another name. What reminded me was I added these two together in my head and thatís what we would call not unimodel.

Okay, so letís look at one thatís not nontrivial, just for fun, itís the internal rate of return. So here I have a cash flow sequence, so X0 up to XN and then XI is a payment in period I, and positive means to us.

So, weíll assume that X0 is negative, that means that we pay out at period T=0 and weíll assume that the sum of all of these things is positive, so I guess thatís what weíd call a good investment. The sum of Ė if these are positive it means itís a payment we receive, and this is the sum total of everything and itís balanced out by our investments.

So minus X0 is the investment. Then if I work out the present value of this cash flow stream, with an interest rate R then thatís nothing more than this, I sum from I =0 to N XI with 1+R to the minus I.

So thatís my present value assuming an interest rate of R. Now the internal rate of return is defined Ė usually completely, incorrectly, and ambiguously in Ė in fact essentially every text I look on in finance. It actually said itís the interest rate that makes the present value 0. Well letís see, I would say somewhere in middle school you would learn that Ė for example, a function like that could actually have a 0 at Ė there are many values of R for which it can be 0, but this apparently hasnít stopped people from defining it as V.

So the correct one is itís the smallest interest rate which makes the present value 0, so thatís the definition of the internal rate of return. For example, if your rate of return is 8% means that your cash flow stream has a present Ė or letís say the income derived from the cash flow stream is exactly the same as had you invested it at an interest rate of 6% or 8% or whatever I said. So thatís the idea.

Now this is a rather complicated function of X, so in fact even Ė I guess not of a particular one, but if you had a whole bunch of different cash flows and things like that, itís quite complicated to visualize what Ė and the way you visualize what the internal rate of return of a cash flow stream is Ė well you really have to just form this sum.

Thatís a function of R; note itís a function of powers of 1 over 1 plus R, so itís actually a polynomial in 1 over 1 plus R. Then youíd find the roots of the polynomial. These would be the values of the interest rate at which that investment is neutral and then youíd find the smallest of those.

This is complicated; itís like an Eigen value or something. This is not simple. Itís quasiconcave. By the way, this is your first hint that something cool is happening, Iím going to mention it; because this is the first time we ever hit it.

Weíre going to find out things like quasiconcave functions, weíre going to be able maximize. Youíll be able to maximize this, five lines of code with complicated constraints, so youíll be able to maximize this. We can minimize convex functions and quasiconvex functions, we can maximize quasiconcave ones.

Now it turns out the Ė itís hard to minimize internal rate of return, thatís a very difficult problem. What this says is itís actually easy at least algorithmically and mathematically to maximize internal rate of return.

Now if you think about it carefully in practice, weíre not neutral. Problems of maximizing and minimizing internal rate of return do not occur in equal Ė with equal frequencies, right? Actually pretty much you never want to minimize internal rate. You want to maximize internal rate of return, and youíll wow, look at that, the thing we want to do turns out to be the thing we can do.

I just thought Iíd mention this, because it comes out Ė I mean itís kinda stupid, you donít mention this in like 263, you say, isnít that amazing, we can minimize norm AX minus B. Every now and then we want to maximize it. Just though Iíd mention it.

So what is Ė how do you show this? To show something as quasiconcave you look at the super level sets. By the way, what youíre doing is youíre saying please letís study the set of cash flow streams that have an internal rate of return exceeding 8%. Thatís what you Ė thatís a rather complicated set.

Thatís like the set of symmetric matrices whose Eigen values are less than Ė whose maximize Eigen is less than 7. I mean itís not impossible to imagine what it is, but itís quite complicated. What is it? To say that your internal rate of return is bigger than R, that means that for all interest rates up to R your internal rate of return Ė Iím sorry, your present value is actually positive.

So, thatís this. Thatís an infinite set of inequalities, one for each interest rate up to capital R. Iíll let you check this, but itís a quick way to see this. By the way, if you tried to do this in cases of a sequence of 2 or 3 using the formula for the roots of a polynomial or something, this would not be fun.

Now for quasiconvex functions thereís a modified Jensenís inequality, and let me just draw it, because itís better to draw it than to Ė than Ė I donít know whatís here, so. Jensenís inequality says something like this; you have two points like this, and let me just draw a convex function first.

And it basically says that if you draw this line, this cord lies above the graph. Thatís what Jensenís inequality says. Itís says actually you do a better job Ė for example F at the midpoint is smaller than the midpoint in the average of F. Thatís basically what it says, thatís convex.

Quasiconvex is cool. Let me draw Ė thereís a nice quasiconvex function, and Iíll pick some points. Iíll pick this one and Iíll pick that one. Oh, no Ė well the cord lies above it Ė it doesnít matter. I mean, in this case the cord would lie above it. But Iíll draw the picture of what you have to have for a quasiconvex function.

Instead of the core, what you do is you draw this thing at the maximum of the two values. So it says the function has to lie below there, which it does. Another way to say it is if you have a quasiconvex function and you have one X and you have another Y, and you for a linear convex combination of X and Y, and you evaluate F, itís not worse than the worst of X and Y at the end points. Thatís a way to say it, if worse means large in F. Does that make sense?

So this is the picture, and the way you write that is this way, that along Ė when you form a blend and evaluate F, youíre not worse than the Ė you couldnít be worse than F at the end points. Thatís the picture.

And you have some things like this, if F is differentiable, if itís quasiconvex Ė this occurs if thereís a characterization here, and it basically says this, that if the Ė it says that the sublevel set of F lies on one side of the gradient picture. So if you take the gradient like this and form a half space, it says everything over here has a higher or equal value of F, thatís what it says.

Or another way to say it is every Ė and for a quasiconvex function, every point thatís better lies on this side of this thing, period.

Okay, thatís another way to say it. By the way thatís a hint that we can minimize it, because when you evaluate a gradient and it gives you half space and it says, I donít know where the minimum is, but itís got to be in this half space.

You can thing of that vaguely as query Ė by evaluating a gradient you got Ė this is very vague and weird, but it basically says you got one bit of information, because it basically said itís in that half space, not that one.

Weíll get to that later, but thatís why this would work. And finally, in our whirlwind tour, the second of the two extensions and variations that were above threshold. This one is very simple, but itís important enough.

Itís not like itís an extension or anything, it just happens a lot. And it comes up Ė itís log concavity. This is weird, because it turn out Ė to say somethingís log concave, just says log F is concave, thatís all it says. And itís a weird thing, it says that if you evaluate the function at a point on the line say between X and Y, itís bigger than or equal to the weighted geometric mean.

Itís the same as if you just take logs here, you get this. Now for some reason applications come up overwhelming log concave, not log convex. I donít know why. It doesnít matter, Iím just telling you. Log concave things come up all the time. Log convex things like never come or almost never come up.

It could be that in all the applications where log Ė where this thing comes up Ė which is statistics, itís everywhere in statistics. Weíll see itís in lots of things. But everywhere where it comes up their conventions are that they want to maximize things, like you want to maximize likelihood rations and maximize posterior probabilities and things like that.

So hereís some examples; powers are log convex for positive exponents and log concave for negative ones. And this is very, very important, many, lots and lots of probability densities are log concave, okay? Thatís very important. And thatís called a log concave probability distribution.

An obvious example would be a Galician, because if you take the log of this thing, and letís look as a function of X, you get a negative quadratic.

Another one would be like an exponential variable or a Laplasian variable with a two sided exponential distribution. But actually tons and tons of distributions are. So if youíre in statistics or something like that, you know all Ė various families, all the ones you can probably think of right off the bat would be log concave.

Things like beta distribute Ė pretty much everything. Pi square, you know all of these things are gonna be log concave. The log concavity is going to be the basis of a whole bunch of bounds in statistics, so if youíre in statistics this comes up and it may even Ė it might even have been mentioned explicitly, although I doubt it.

Actually who here Ė I know thereís people in statistics here, theyíre not gonna self identify though. Iíll find out. I can find out whoís in Ė anyway. Has log concavity ever been mentioned in the statistics Ė yes, cool, what class?

Student:[Inaudible]

Instructor (Stephen Boyd):In seminars, oh, okay. Not in like Ė how about 310 Ė no, it wouldnít be in 310, it would be in the statistics stuff.

Student:In 300 itís mentioned in passing.

Instructor (Stephen Boyd):Cool, good, okay, there we go. Hereís one, totally non-obvious, has amazing applications like this immediately. Is the cumulative distribution of a Galician. So itís got different names, I think urpsey, I think in ee and communications they decided it wasnít good enough [inaudible] and the called it the Q-function or something, is that correct? Something like Ė isnít that what that is?

Student:[Inaudible]

Instructor (Stephen Boyd):One minus Ė itís what?

Student:[Inaudible]

Instructor (Stephen Boyd):One minus five, okay great, thatís good for my colleagues to introduce a new notation for something thatís been used for hundreds of years by many other people. Sorry, letís move on.

So the log of accumulative distribution, and I suppose the log of a q-function is long concave, okay. And you know, this is not so obvious. The cumulative distribution looks like this of a Galician, it goes like that and then goes up like that. It looks like that. If you take Ė itís obviously not concave.

You take the log, and the part thatís 1 Ė well thatís 0 obviously [inaudible] and this part curves down like that. And itís asymptotic in fact to something like a square here, which is the asymptotic of the Galician, the tail of a Ė small values here are well approximated by Galician itself.

Student:[Inaudible]

Instructor (Stephen Boyd):No, thatís false. Iíll tell you very shortly which are.

Student:[Inaudible]

Instructor (Stephen Boyd):All which?

Student:[Inaudible]

Instructor (Stephen Boyd):In one [Inaudible] thatís true, because theyíre monetel. So theyíre definitely quasiconvex. Weíll get to this. So letís look at some properties of these things. Well, just by working out the hessian of the log of a function is, itís something like that [inaudible] or something.

You get this, it says basic Ė itís actually quite interesting, we can look at it, it says that F of X Ė if youíre talking about a log concave or a log convex function, obviously the function itself has to be positive, or non-negative at least. It can be 0 by the way, which in log will map to minus infinity which is perfectly okay, because for a concave function minus infinity is essentially our OOD token.

I gave it another name the other day that was out of domain. What did I call it before, NID. Anyway, one of those tokens Ė itís the token for being out of the domain, so minus infinity. So this is non-negative, and this is actually quite interesting. I guess you could put the F of X over here or something like that.

This says to be concave is to say that the hessian is less than or equal to 0. This says actually the hessian can be less than or equal to. Thatís a rank 1 Ė thatís a dyad. So this actually says, youíre actually allowed one positive Eigen value in the hessian of a log concave function.

Thatís cool. One positive Eigen value youíre allowed. So it means you have one direction along with your curvature can actually be positive. We needed that because look at this, okay. But itís got to be overwhelmed by Ė itís gotta be no more than something having to do with the gradient here.

I mean I donít want to give too many interpretations of this, but. Here are some obvious things; product of log concave functions is log concave, totally obvious, because the product of two things, you take the log itís the sum. So thatís the sum of convex functions.

Sum of log concave functions is not always log concave, thatís easy. Add to make a mixture of Galician, moved apart you get tow bumps. Oh, by the way, if itís log concave its quasiconcave, because log is monotone. And you canít have two bumps on a quasiconcave function, because then a super level set will be two isolated intervals so it wonít be convex, the set.

Now we get to something which is highly, highly, highly non-obvious, itís this; if I have a function of two variables and it is log concave and I integrate over one of the variables, the result is log concave, itís true.

For literally years I have looked for a simple proof of this, and simple means like simple, assume everything you want, X and Y can be one variable, F can be differentiable as many times as you like, so Iím talking about a simple proof that would please our careful mathematician friends, Iím talking for just a normal person where you just write out the first and second derivatives, look at some stuff and say Ė and you end up with this and say, there it is.

Iím just talking about something, a bunch of calculations that would establish it. I have failed, always. By the way, people Ė Iíve said this challenge out to a bunch of people, and several time people have come up with Ė by the way, I have a four page one which is very complicated and anyways.

So if anyone every finds one out Ė figures out a simple one for normal people that just uses calculus or something like that and a couple of arguments here and there, itíd be great. After which I will say that it was obvious, but at the moment itís obvious but unknown to us, this simple proof.

Itís certainly not known to me, thatís for sure. Itís not obvious. The implications though are stunning, and I will show you some of them. Convulsion preserves long concavity. That gives you a lot of stuff right there. So if you have Ė by the way, convulsion it means if you have two random variables each with log concave density and you add them, the distribution of the sum obviously is gonna be log concave, so thatís the kind of thing you would get just right off the bat.

So convulsion preserves log concavity and hereís a fascinating app Ė is this, it says if you have a convex set and you have a random variable with log concave PDF, then the probability Ė this is a function of X, that X plus Y is [inaudible] is log concave.

And the way to show this is I just apply this one integral. Your integral thing is Ė thatís the big hammer, you just reduce everything to the one as of now, non-obvious fact that you can integrate over one set of variables a log concave function and the result is log concave.

So roughly speaking, integration preserves log concavity. And you apply that Ė youíd make a function here, G of X plus Y, which is just the indicator function. Itís 1 of youíre in the set and 0 if youíre outside and work out this integral and itís that. And this is very cool, this immediately has amazing applications which I will show you one right now.

So hereís the way it works; I have some set of values like this in some multi-dimensional space like that, and these are Ė you should think of these as acceptable values of what happens. So these Ė each axis could be Ė I donít Ė it doesnít really matter. This could be the Ė Iím setting the process recipe or the target process recipe or whatever for making ICís.

One of these could be threshold voltage which if itís to high the chip is too slow. If itís too low it dissipates too much power. By the way, how many people are in circuit design? Now come on. Does anyone care about it? Maybe they just sleep late. They were in the clean room until very late last night and Ė I know theyíre in this class.

But if no one is here then I will remove these examples from the Ė all right. So this is the acceptable [inaudible] and whatís gonna happen is youíre gonna set a target Ė youíre gonna set the target value like that and youíll adjust all your manufacturing to hit that target value.

By the way, when you do manufacturing runs and you look what comes out on Monday, Tuesday, Wednesday, Thursday, and things are biased away from the point Ė if the mean of that is away from that point you need to get new people running the manufacturing, because you just Ė you would offset it.

So we can assume that when you hit the target point the distribution of values [inaudible] will come off in manufacturing will have that mean. I mean if it doesnít have that mean, then you shift it or whatever.

So you have a mean, and letís say that the manufacturings [inaudible] are given by this random variable W. It could be some horrible thing, some joint distribution, but we assume the following, its log concave PDF.

And Galician would work just instantly, but thatís it. Whatís gonna happen now is this. It means that when you hit a target here, youíre gonna get values around like that or whatever, every now and then youíll be out. If youíre out then thatís something that doesnít perform. Thatís something thatís no good and thatís a waste.

And your yield is the probability that you actually end up in the set. This all makes perfect sense. Now, imagine taking the target and moving it around. As you move it around your yield changes, obviously, right.

So for example, if you put the target here, what would the yield be? Iím not endorsing that as a choice by the way, but what would the yield be here? Would it be 0? No, probably not, because every now and then accidently you would manufacture an acceptable part.

I mean if the distribution doesnít have fine eyed support, so every now and then accidently. And this would be a poor choice, and itís totally obvious what a good choice is, you want something deep in the set, and thatís got all sorts of names.

Its called design centering, is one of the names for it, and yield optimization, itís got all sorts of other stuff. Although, yield optimization means something different in other contexts, so. Everybody got this? And now you should also be able to visualize also the yield as a function of the target point.

So itís high here, low here, less low here, and high is somewhere Ėat some point in the middle here. So, hereís the point. That function is log concave. Just very roughly speaking, let me tell you what that means. It means we can optimize it. We can maximize it. So you put these two together, and what it says is you can actually do Ė you can actually directly maximize the yield.

Can you make a note? Weíll make a homework problem on that. Not for homework 2, later, donít worry. Weíll do a yield maximization problem. Okay, so that it. One last topic on convex functions is convexity with respect to generalized inequalities. Generalized inequalities allow you to overload ordinary inequality.

All you need to define convexity of functions is you need an id Ė you need the ability to talk about Ė well in the very general case you need to talk about the line segment between two points. We work in vector spaces, so for us thatís fine. And then you need this inequality to be able to say that like F of theta X plus 1 minus theta Y is less or equal to theta [inaudible].

And so you need that inequality. These can be vector valued and then you can talk about convexity with respect to a generalized inequality. The only case where this would really come up and is actually anything is with matrix convexity. And so there it just says the inequality holds, but the inequality has be interpreted as in terms of matrices.

So Ė now I think weíll quit here. This sort of finishes our whirlwind tour through I guess the [inaudible] of convex sets and functions. Next time half the lecture will be a little bit boring, because itís just nomenclature of optimization, but by next week actually I think things will get interesting, so weíll quit here.

[End of Audio]

Duration: 77 minutes