Instructor (Jacob Mattingley):Okay. Good morning and welcome to EE364-a. This is the second lecture, and in case you havenít guessed, I am not Professor Boyd. He is currently on his way back from India, and heíll be here in time for the lecture on Tuesday. You donít need to worry. It will be a proper professor from Tuesday onwards.
A couple of things before we move on. First of all, the first couple of weeks are going to be relatively dry mathematical material. Thereís a bit of ground that we have to cover before we can do the really interesting parts, so some of that starts today. As part of that, weíre going to be going quite fast indeed, so hopefully lots of you have read or scanned part of chapter two, because weíre hoping to get through most of that if not all of that today.
If you do find that Iím going quite fast today, thatís hopefully because I am. You mind find that itís helpful if you go back and read chapter two once youíre done with this lecture. Before we dig right in, another thing is that we have posted homework one. Itís not due until the 17th, which is next Thursday. Homework 1ís already out. Office hours for TAs have been posted, and there will probably be more to come.
Today, weíre going to be talking about convex sets. Iíll start off by just describing what an affine set is and then a convex set. Weíll go through some examples, then weíll talk about how you make more complicated convex sets from simpler ones and talk about a few other topics towards the end. This should be review to start off. We can form from any two points Ė we can parameterize the line going through those two points using a parameter such as Theta. In this example, ? can vary over the whole real line, and it should trace out the whole line.
If weíve got a point X1 here and a point X2 Ė when ? is zero, weíre sitting right at X2. When ? is one, weíre sitting at X1. It traces out the whole line like this. More generally, an affine set can be described as a set where any two distinct points and the line through them are all inside the set. So take any two points from an affine set, trace the line through them, and that line better be in the same set.
One example of an affine set is a solution set of linear equations, which youíve all seen by now. There are lots of different ways of writing it. We can write it with a [inaudible] parameter. We can write it using an equation on the left-hand side here, but at the end of the day, this is one of the ways you can represent any affine set. A convex set, which is going to be of considerable interest to us this quarter, is one where instead of being able to very theta over the entire real line, weíre instead constrained to having ? be between zero and one. It can take on any value between zero and one.
If we find two points in our set, instead of before where we could have any point on the line through these two points, the line segment described with ? here has ? varying over zero, one. So to start off with, line segment ? between zero and one and these are the points traced out by ? as we vary it.
The convex set is a set which takes any two points in the set Ė so take X1 and X2 from your convex set, and the line segment between those two points had better be in the set. That means that an affine set is, in fact, a convex set, because an affine set contains the entire line through X1 and X2. Of course, it does contain the line segment between X1 and X2. Our first example of a convex set is the affine set from before.
This is just a formal way of expressing it here. We take X1 and X2 out of this convex set C. We vary ? between zero and one, and the convex combination of points here has to lie in the set C. Here are some examples. The set here Ė obviously, if I had picked these two points here, every point along this line lies inside the set. This here is a problem, because if I pick these two points here, all of these points here do not lie in the convex set. That clearly violates this set membership here, and this cannot be a convex set.
This one here is a more technical example. This one looks like it should probably be convex but for some of the points where the boundary is missing. If I had picked a point here and a point here, this line actually is not inside the set, because itís only got a partial boundary around here. These are some very simple examples described graphically constrained to just our two. Youíll see quite quickly that convex C can make sense in quite bizarre spaces.
Following on from convex combinations, before, we had just two points involved in our determination of whether a set was convex. The thing is that we can actually extend this concept to a convex combination. If we picked points X1 through XK to start with, we can describe a convex combination of those points as being a set where we weight each of the points, X1 through XK, and the weights all have to be nonnegative and they have to add up to one. This is just a definitional thing at this stage. We define a convex combination as something like this.
If we had three points like this, weíll see that the set of all convex combinations actually looks something like this. If you do the math, youíll find out that as you vary ? 1 and ? 2 and ? 3, and you always have them nonnegative and you vary them so that they sum to one, the set of points that the sum traces out is called the convex hull of the set. We can take any set of points or just a set and we can form the convex hull by finding all of the convex combinations of points within that set.
Letís take these black points here. Supposed someone just handed you these black points and they said what is the convex hull of these points? At that stage, that sounds very unlikely, but later on, weíll see why you might be interested. We take these points and we want to know all of the convex combinations. This is the smallest convex set that contains all of those points. Itís formed by basically taking the points at the edges of the set and adding the line segments involved in those points and try to make them as straight as possible. Then, you fill in the rest of the set.
So if you asked, for example, how I got to this point here, you could form it as a convex combination between these three points here. Or, in fact, this one lines just directly on a line between these two. For any set, we can find the convex hull of that set. Hereís another example. The kidney shaped figure that we had before Ė to find the convex hull of this set, we need to just basically fill in this region here. Thatís the problematic region before. If we found two points like this, the line segment did not line entirely within the set. Any questions so far on convex combinations and convex hulls?
The main difference is on what you can do with ?. An affine combination of points that we had before is you have three variables. A linear combination Ė this would be a linear combination if we didnít have this restraint here. This thing makes it a convex combination instead of linear. Any other questions?
Thatís exactly right. If you already have a convex set and you want to know the convex hull, thatís simply the convex set itself, because that set is required to contain all convex combinations.
The use of a convex hull Ė if you already have a convex set, the convex hull is useless to you. But sets are by no means likely to be convex in some sense, so itís useful when we want to make a convex set. But certainly if you call some function conf and you [inaudible] on a convex set, it would simply return the set without trouble. Yes, a convex hull has to be convex by that definition.
Letís have a look at convex cones now. Letís introduce a conic combination of points. This is again a specialized version of a linear combination of points. We take two points. We can have more than two. We have nonnegative weights on these points. Thatís described as a conic combination of these two points, X1 and X2. Here, we have a figure where we have a point X1 and a point X2, and this is the set of points ? 1 X1 plus ? 2 X2 where ? 1 and ? 2 are nonnegative.
Obviously, itís a cone, and thatís why we chose the name convex cone. It also has to be convex. The reason for that is if we look back to our definition of convex, there was a special case of these. We added another constraint on top of this. If we merely wanted to insure convexity, weíd only have to check that X was in this cone when ? 1 and ? 2 added up to one. Any convex cone has to be a convex set.
Now we have a look at hyperplanes and halfspaces. Hopefully, youíve seen these many times before. We start off with a definition. A hyperplane is a set where A transpose X is equal to some constant B. A is nonzero, because otherwise, we could have any point in the whole series. This is an example hyperplane. We parameterize it with a point, say, X, 0 and then we need to know that A transpose X0 is equal to B, and then any other point that also has an inner product with A of B is also lying on this hyperplane.
This is a hyperplane in two space, which is a line. In three space, you can think of it as looking something like this [inaudible] here. This would be a hyperplane in three space. It need not go through the origin, by the way. Following on from that, we have a halfspace, and this is similar to a hyperplane, but we just take the area on one side of that plane. We can have it either open or closed. This can be a strict inequality or a non-strict inequality. This is an example here Ė same hyperplane, and itís the half space on the lower, I guess, in this case.
Likewise, the non-shaded region you could describe as another halfspace with A transpose X greater than or equal to B. The direction of this inequality indicates what side it falls on. A few things Ė halfspaces are not vector spaces. If you take two points from here, it should be closed. If itís a vector space, it would be closed under scale of multiplication. Thatís not going to be the case for halfspace, so a halfspace is not a vector space. People often get that confused, so donít fall into that trap.
Thatís a hyperplane and a halfspace. They are both convex. A hyperplane is also affine, which ensures that itís convex, but a halfspace is merely convex and itís not affine. Itís not affine because we would have to be able to contain the line through any two points. Letís pick these two points here and then draw a line between them.
Student:Is there displacement from the origin and A and B that are written?
Instructor (Jacob Mattingley):There is, and itís basically if you find the point thatís nearest the origin Ė I think this is part of the homework exercise. If you found that X1 was the point nearest the origin, then you could use A to find out Ė Iím going to leave that to you, because I think it is part of the homework. There is a relationship. Those are some very simple convex sets.
Some more sets that weíre interested in Ė first of all, a Euclidean ball. At the moment, Euclidean is going to mean that we use the two norm in the ball description. We parameterize a ball by picking a center point XC and a radius for the ball of lowercase R. Itís the set of any point that is [inaudible] in the Euclidean sense to XC then R. Hereís one way of expressing that. Hereís another way of expressing that. This is a free parameter representation where you have a ball U and itís just a unit ball. You scale it by R, and then you center it around XC.
This is the free parameter. Here are two [inaudible] descriptions of the same thing. We can get more interesting than a Euclidean ball and take an ellipsoid, which is a generalization of the Euclidean ball, and thatís points Ė we take a symmetric positive definite matrix P and we use this expression here. You can actually go back and forth between this representation for an ellipsoid and this representation for an ellipsoid. This is just a sample ellipsoid. Ellipsoids and Euclidean balls are convex sets. One of the ways you could prove it is you could find two points and show that if those two points are in the set, then their convex combinations have to be in the set as well.
Again, if you havenít read chapter two and you havenít seen these things before, Iím going to be going quite fast. More generally, weíre seeing two norms a lot. Weíre seeing the Euclidean norm. But more generally, we can define a norm as being any function that satisfies these things here. First of all, this is something like definiteness and non-negativity. We need to know that Ė these are just a bunch of things that you should look at that define a norm.
You can verify quite quickly that the Euclidean norm does in fact satisfy these, and there are various other ones as well. Just a little point is that this notation here is [inaudible] of the absolute symbol, which is a norm in the space R. If you take the absolute value of a number, itís going to satisfy all of these things in the space R.
In 364a, quite different from 263, if we simply have the norm symbol without any subscript, it does not necessarily mean the two norm, and in fact, itís usually used to represent a general norm. Weíve kind of moved up a level of sophistication, if you like, that no longer is it simply enough to just say this is the Euclidean norm of X, but we want to specifically say we are after the two norm. Youíll find and use often different kinds of norms in this course.
We can define as we saw previously similar to the Euclidean ball here, for general norms, we can define a norm ball. This ball doesnít necessarily look like a ball. Itís a ball in the general sense. We define them very much the same as we define a Euclidean ball, but theyíre going to look quite different. Letís take an example. We have the one norm, which is the sum of absolute values. Can anyone tell me what the norm ball with the one norm looks like in R2?
Instructor (Jacob Mattingley):Okay. It looks similar to a square. Itís actually a diamond like this if I recall correctly. The one that is a square, however, is the infinity norm. Thatís the maximum absolute value. That does indeed look something like this in R2. These are two other norm balls. They donít look much like balls, but thatís their name. You could call this the one norm ball and this the infinity norm ball. There are others.
We can also define a norm cone by adding another variable here and saying the norm cone is any point X, T where the norm of X is less than T. This is our favorite norm, the Euclidean norm, and itís the norm contrast out. Itís got a couple of names. Itís called an ice cream cone for obvious reasons, and itís also called a [inaudible] cone as well. This is actually a solid figure. It might look like a wire frame, but itís actually filled in. Itís any point where X is in R2 here along the bottom and T is up here.
One of the things we know about norm balls and norm cones is that theyíre convex. Yet again, weíre adding another building block, which will become very useful to us later. It doesnít necessarily mean a whole lot now, but just store away that norm balls, norm cones, some of these other things like halfspaces Ė theyíre all convex, and thatís going to be very useful to us later.
Before I go on, Iíll mention that if you want to know what the one norm cone and the infinity norm cones look like, you can think of these here as a cross section when you look down from the top of the cone. If you looked down on this cone, the level sets of the cone would look something like this. If you looked down on the level sets of these cones here, the cross sections are going to look like this. We could fill this in and similarly for the infinity norm. You can think of them as a pyramid, and the one norm is rotated and the infinity norm is a [inaudible] pyramid.
Here, we have as lightly more interesting set of shapes that are also convex. A polyhedron is the solution set of finitely many linear inequalities and equalities. Hereís one of the ways that we can represent it. We can say a polyhedron is any X such that AX is less than B and CS equals D. That would be a polyhedron. We note here that thereís actually a slightly different symbol for less than here, and the reason for this, and youíll often see this both ways. Youíll see an ordinary straight symbol here. Youíll sometimes see this curvy one.
The reason is because this here is what we got from R. If we say three is less than or equal to five, this is what we use. Here, weíre actually overloading this notation to say that AX, which is a vector, is component wise less than B, and to make this completely explicit, weíre using this slightly different symbol here.
During this lecture and other times, I might get a little bit sloppy and the curves might not be as obvious, but the reason for this is because itís actually not quite the same as the ordinary R ordering, and weíll see a generalization of this quite soon. Note that itís different, and the reason is just to make it very clear that weíre doing something different to what we were doing before in R.
Positive semidefinite weíll see in a second. This is one example of a polyhedron here. This is an intersection of a finite number of halfspaces and hyperplanes, and so in this particular case, weíve got five halfspaces defined by these normal vectors A1 through A5, and you can think of this here as the halfspace defined by A1. If you find the intersection of these five, youíll discover that itís this shape here.
A polyhedron is convex, and this is something that you need to prove. Iím not going to go through it at the moment, but you should be able to show fairly easily that when you find an intersection of a finite number of halfspaces and hyperplanes you get a convex set, and itís called a polyhedron. An affine set is a special case of a polyhedron, and so is a hyperplane from before.
Student:This one only uses the inequality, so could you give a quick example of one that actually uses equality?
Instructor (Jacob Mattingley):Okay. Hereís one. Pretend that this is actually an R3, and weíve got five inequalities like this. Weíve got an equality which constrains us to being in this plane. They can be anything like that.
Next, we have a look at the positive semidefinite cone, and thatís what you were asking about before. We say that a matrix is Ė you should know about positive semidefinite matrices by now, but we define a couple more sets. First of all, Sn is the set of symmetric [inaudible] matrices. Letís have a quick look and see. Do we suppose this is a convex set? Any suggestions? Yes, itís a convex set. How would we go about proving it?
Letís use the definition. Weíll pick an X, which is a symmetric matrix. Weíll pick a Y, which is also a symmetric matrix. Our requirement for something to be convex Ė if this is a convex set, then if we take a ? which is greater than or equal to zero and also less than or equal to one, and we take ? X plus one minus ? Y, the set here is convex if each of these things here is also in the set. Is that true for a symmetric matrix if you take a sum of symmetric matrices? Is it still symmetric? Yes, it is. That means that this is a convex set.
It is, in fact, affine, yes. Again, youíre quite right. The reason for that is because we never used the fact that ? was between zero and one. ? could be anything and we would still have a convex set. Yeah, itís all of these things. Letís look at a more interesting set. Weíll define and use this notation extensively that S sub plus sup N is the set of positive semidefinite N by N matrices. Weíll assume and especially when we say is, weíre assuming that theyíre symmetric.
This is a set like this. If youíve forgotten what positive semidefinite means, it means that when we take a quadratic form such as Z transpose XZ is greater than or equal to zero for all Z. Hereís the claim, that S plus to the N is a convex cone. Looking back to convex cones, that means that if we take a nonnegative combination of two of these items from the set, it also needs to be in the set. You should be able to see quite easily that if we had X that was positive semidefinite and Y that was positive semidefinite, if we took a conic combination, ? 1 X plus ? 2 Y, this, too, is going to be positive semidefinite. Can someone tell me how I can make that really explicit quite quickly?
Student:Z transpose on the left and [inaudible] right use the distributive property to just Ė
Instructor (Jacob Mattingley):Exactly. Iíll do it like this. Iíll say Z transpose ? 1 X plus ? 2 YZ like this, and then Iíll expand it out and Iíll get ? 1 Z transpose XZ plus ? 2 Z transpose YZ. We know that this is positive or nonnegative because X is positive semidefinite. We know the same about this. We know that ? one and ? two are both nonnegative, and so the whole must be nonnegative. Iíve just shown that if we take a conic combination of two positive semidefinite matrices, the result is positive semidefinite. This means that S plus to the N is a convex cone and furthermore, itís also convex.
We can do a similar thing for positive definite matrices, and here, we just add an extra plus to indicate that itís strictly positive definite and not just positive semidefinite. Hereís an example of what the cone looks like in S2++. This is a cone of positive semidefinite matrices. You do write something like this. Exactly how you do it in MatLab, Iím not sure, but if you want, you could pick a point in the space Ė say we pick this point here.
You could find out its X, Y, and Z coordinates, plug it into the matrix and you see what the minimum [inaudible] value is. If itís greater than or equal to zero, then itís in this cone. If itís less than or equal to zero, itís Ė well, if itís less than zero, itís outside the cone. If you did a grid search in here, you could build up a set of points which are inside the cone.
Student:Can you have a positive definite matrix thatís not symmetric?
Instructor (Jacob Mattingley):Yes, you can. This particular picture, it does depend on it because weíve explicitly said that. Basically, itís bad form to give somebody a non symmetric matrix. This proof here nether used the fact that that was symmetric, so yes, itís true that the matrices donít have to be symmetric, but thereís never really any reason why we would [inaudible]. Thereís no harm in taking any matrix said to be positive semidefinite and immediately symmetrizing it without loss of generality.
We could actually find an explicit form for the determinate of this matrix. It also needs the fact Ė whatís that? Anyway, we can generate this plot. Feel free to look into how you do it. Iím going to move on for now because we do have a lot to get through.
Why did we use this particular sign here rather than an ordinary one? Thatís a good question. We have specifically used this curvy sign, and that is again because weíre not using the ordinary ordering on R. This is now a matrix that weíre saying is greater than or equal to zero in some sense, and itís the sense of positive semidefiniteness. We have again very deliberately chosen to use this different sign to avoid confusion with the pure, ordinary symbol like this. You will find some people who use that symbol, and thatís okay.
So now weíve looked at some sets that are by themselves convex sets, but we also want to know how to make convex sets out of other convex sets. There are certain operations that when we apply them to the set, they will preserve convexity. What it means is that youíll be able to build up more interesting sets from these simple ones that we started with. If we were constrained to one set, we could explicitly write equations and explicitly prove weíre convex. Weíd have a lot of work to do if we cannot with an unusual set.
First of all, letís have a look at some of these things. One of the ways we can see if a given set C is convex is we can just apply the definition. Sometimes, thatíll be easy. Sometimes, it wonít be easy. Another way we can do it is we can take operations that are known to preserve convexity. We can take these little [inaudible] that we know are convex, and we can put them together and quite quickly make a fairly interesting looking set. Hereís one. Itís an intersection.
This is actually any number of intersections, and I think we look at that on the next slide. An affine function will preserve convexity. If we take a set C, put it through a function F which is affine, itís going to preserve convexity. A perspective function and linear-fractional Ė weíll see those in just a few moments.
Hereís another way. This is the crude and nasty MatLab approach. Say I want to know if a set is convex, and say Iíve got some oracle which tells me whether a given point is in the set or not in the set. Hereís what I can do. I can pick a couple of points, say, X1 and X2 completely at random from the space that Iím in at the time. Then, I can test my oracle and I can see if X1 and X2 are in the convex set. Suppose they are. The requirement of convexity is that when I find this convex combination of these two points, that point itself has to lie in the convex set.
If somebody gives you some random set and you just want a very first pass of is this even remotely likely to be convex, hereís what you do. You make a tiny little window in the corner of your screen. You run MatLab and you start generating random points from the set. You take a convex combination and you test to see if itís still in the set. Now, as soon as you find some ? which is between zero and one where ? X1 plus 1 minus ? X2 is not in the set and X1 and X2 are, you immediately return and say itís obvious. Of course itís not convex. You can see that when youíre looking at it. Itís quite easy to do that.
Hereís another thing, which I wonít go into at length. Basically, midpoint convexity where you just take ? equal to 0.5. Thatís actually enough for some technical conditions. If you want to run a whole bunch of tests, you could just find X1 and X2 and take their average. You test to see if the average is in the set. If it isnít, you return and say no way itís convex. If it is, you donít necessarily know much. Itís only useful if itís not a convex set.
But if you did, say, 100,000 points, you could give a very rough first pass. Note the quotation marks Ė itís very rough. Itís a heuristic method. Nevertheless, it can be a very fast way of finding out whether something is convex. It fails in situations where Ė itís to do with boundary issues. Iím not going to talk about these much, and the focus of this class is not really on the technicality, but if we took these points here Ė in this particular case, it would work. Say that point there was in the set. Then these other points wouldnít be anything [inaudible].
Letís have a look in more detail at intersections. Hereís the claim Ė the intersection of any number of convex sets is convex. Fortunately, I can just refer you to your homework, because I think you have to show this in your homework for a couple of sets. When I say any number, that means finite, infinite and unaccountably infinite. This is a very robust way of finding out whether somethingís convex.
Hereís an example. Here, itís an unaccountably infinite set, I think. We just take this random thing here and we know that the individual elements are convex. We put them all together and we put up with this shape here. I donít want to dwell on the shape. Itís colored in lots of detail in the book, and in fact, let me just take a moment and mention the book.
The convex optimization book is available, and these slides are based very much on the book. Everything here is covered in detail in the book. The book is also correct from a mathematical point of view. If Iím tossing around boundary issues and there are little technical differences, weíre not going to worry about them in lecture nearly so much. The book is correct. If you have done analysis and you are concerned about all kinds of things like closure and compactness, then you can go to the book and it will be correct. Weíre not going to worry so much about that now.
Hereís a set where we have an infinite intersection of convex sets, and it is itself convex. Letís have another look at a way that we can preserve convexity. Letís have a look at affine functions. Suppose we have some function F that takes RN into RM, and itís affine. One way of looking at this is saying that if F of X is AX plus B, A is a matrix and B is a vector. This is actually not the most general way of expressing it, but this is pretty good.
The image of a convex set under the affine mapping F is convex. If youíre given some set S which is just a set at this point and itís RN, we can use the mapping F and all of the points from X that are mapped into FX Ė the new set is a convex set, and likewise for the inverse image. Just to make clear what the difference between these is Ė suppose we have some set S here, and this is a convex set. We take this under a mapping F to some other set here, this is the set F of S. This is a convex set if F is an affine mapping.
Likewise, we can do the same for an inverse mapping. If we had another set Ė these are not very interesting looking sets, but it could be any set. We can take this back via F and F is a convex set. After this mapping, itís also a convex set. If we have an affine function, we can actually create quite a lot of damage. When we have a known convex set, we can make other ones out of it.
Student:Which notion of adversity Ė does it have to be one to one?
Instructor (Jacob Mattingley):Actually, we donít really run into those issues. This is the definition here. Itís simply the set of any points where the function value of that point matches to the set C. Itís the inverse image of the set. The function Ė it is affine, so is it one to one? It probably is. It doesnít have to be. Itís the inverse image, which is a well-defined thing. Itís always convex if the set C is convex.
Some examples of this Ė we have simple scaling by two. We have translation where we move around, and we have prediction. Letís briefly mention that the converse is not true. We might have a set F of C thatís convex when C wasnít convex. Weíre more interested in this, anyway. Here are some simple examples.
Prediction could be really simple. We might have a set X1, X2. This might be a point in a convex set C, and if this is convex, then the set of X1 given that the sum Ė the sum of X1 Ė for sum X2, the set of just X1s might be a convex set. Thatís a very simple prediction. A more interesting one is whatís called a linear matrix inequality or an LMI. This is an affine mapping. If we have a set of matrices AI and theyíre all symmetric matrices, then the set X where this weighted sum of symmetric matrices and the positive semidefinite sense is less than B, that is also a convex set.
Iíll talk about that one for a second. If we take the cone Ė this function F of X Ė weíll make it equal to B minus A of X where A of X is equal to X1, A1 up to XM, AM. This is an affine function, and it needs a little bit of hammering to get it in the same form as we saw above here. It is an affine function.
We already know that the positive semidefinite cone Ė we already know that this is a convex set. If we take the inverse image of the set here with this function F, what we get is that the inverse image is equal to any X and RN such that B minus AX is in the positive semidefinite cone. This is just a fancy way of saying that B minus A of X is less than or equal to zero in the positive semidefinite sense.
Weíve taken a very innocent looking set, S plus of N. We talked about how that was a convex set earlier. We took also a fairly innocent looking affine function, F of X, and immediately, weíve got this fairly unusual looking LMI, which is actually a set of weighted symmetric matrices. Thatís whatís on the left-hand side here. This is a convex set, and there were just two very simple ingredients. You can see that we can cause a lot of damage.
This comes up a lot in control, and itís a very useful set. Another one here is the hyperbolic cone. This one you can get in a similar way. We already saw that the set ZU, which is a cone when we restrict the U. We saw this set before. This is a norm cone, and we showed that this was a convex set. We know this is convex, and we want to get to something more interesting. You can see that if we apply the affine function F of X equal to P one half X and C transpose X, the inverse image of this mapping is also a convex set.
If you take this cone here and you map it through here, you get a hyperbolic cone. Again, thatís something that will be interesting. You should verify the details and as always, details in the book if you need them as well. Hereís some quite bizarre sets that are definitely not obviously convex, and you can show quite simply how they are convex if you know how to do it.
Departing even more from obvious sets, weíve got a perspective function. This oneís a little bit unusual to start with. If I have some function P that reduces RN plus 1 to RN by taking the last component and dividing Ė you can think about it as taking some vector like this and we pick off the last component. We divide each of these elements by that component and then we throw this away. Itís a somewhat unusual operation, but [inaudible] one, two, three, Iíd pick the three, and my new vector would be one third two thirds. Thatís a perspective function.
The images and the inverse images of convex sets under the perspective function are convex. That also is going to be useful to us later. Remember, weíre just building up a box of tricks that weíre going to find useful later. If itís a bit dry, bear with us for the moment. Thereís a demand constraint here that this last entry has to be positive.
A generalization of the perspective function is when we have a linear-fractional function. If we have some matrix A, a vector C and vectors B and D, then again, under both the image and the inverse images of a convex set under this transformation are themselves convex. There is a domain constraint on positivity of the denominator here as well. Just to see one example of this because itís probably quite a mystifying concept if you havenít seen these before, we take just a set C here like this.
Hereís a linear-fractional function, which you can think of as similar to a perspective function or a generalized perspective function. If we hold this figure on the side Ė if I hold it like this and I look at this figure from a different angle, Iíll find that these parts here are closer to my eye. If I look at this, Iíll find that these things loom closer in my vision than the other parts of the figure. This is quite a strange transformation, and C here is obviously not convex. In fact, why is C not convex?
The line segment is very much not in C. If this were a convex set, though, under this transformation, it would remain convex. The reason itís not is because you wouldnít be able to see it too well if it wasnít convex. Iím actually not exactly sure what the dotted line is about. Actually, probably what it is Ė itís probably a line under the same transformation. I suspect thatís what it is.
Another thing you can say about functions is if they transform line segments into line segments, it actually is a convex function. I suspect thatís what it is, actually. Youíd have to check me, but I think this line moves to this line, and because this line segment moves to this line segment, you can also say that itís a convex function. There are some slightly weird functions. Have a look at them later and put them away as things that weíre going to find really useful to prove that sets are convex.
Letís have a look at generalized inequalities. Before, we saw moving from a simple inequality in R to a component wise inequality. It turns out that itís quite useful to consider other kinds of inequalities as well. Letís introduce proper cones first. A proper cone is a convex cone with another bunch of constraints.
First of all, the cone has to be closed. Iíll just mention that some of you will have an analysis background where you know exactly what closed means and you have read many things about closedness. Some of you donít, and thatís fine for the purposes of this course. The five-second introduction to what closed means is if we have a shape and the set includes the border of that shape, then itís a closed set. If it doesnít include the boundary at every point, then itís not a closed set.
If you know what closed means, that was probably an insult to you, but thatís actually probably enough. Itís an intuitive idea of what closed means. Itís probably enough for this course. Thatís the first requirement to make a convex cone into a proper cone. We also say that itís solid. It actually has a nonempty interior. Again, interior is not a concept that some of you will know precisely what it means. Others of you probably havenít heard it before. You can think of interior from an intuitive point of view as the inside of the set. The set not including its boundary.
Tell me about this here. If I just have this ray here, is that a proper cone? Why not? This has an empty interior even though itís getting thicker by the second, but if I just have a ray out from the origin, it has no interior, so itís not a proper cone. Thereís one more requirement, and that is that the cone cannot contain a line. That means that this set here Ė this is a cone, and itís also a convex cone, but itís not a proper cone because it contains this line here through the origin.
These are the requirements. It has to be convex, closed, solid and pointed, and that gives us a proper cone. Weíll find that thatís pretty useful. If we didnít have these constraints, when we go to generalized inequalities, itíd fall to pieces quite quickly. Youíre saying that any convex cone includes a line. So what about this cone here? This is a convex cone, and it does not include a line. It includes a ray out from the origin, but it doesnít include a line through the origin.
Here are some examples of proper cones. First of all, the one youíre most familiar with is the nonnegative orthant. If I have R2, for example, the nonnegative orthant is this portion of R2. Thatís a proper cone. Itís convex. Itís closed because it includes these boundaries. Itís solid because thereís certainly plenty of stuff in the interior. Itís pointed because it contains no line, and so itís a proper cone.
Hereís another example. The positive semidefinite cone is [inaudible]. Thatís probably slightly more tricky to prove, but all of these concepts generalize into this cone here. Theyíre not a problem. Hereís another one, the set of nonnegative polynomials. Weíre going to find that this is extremely useful and used all the time. Weíre going to use this a lot. This one is a more interesting example, but we probably wonít see it all that often. Still, itís a proper cone.
This one here? Itís a cone because it contains Ė the definition of a cone was just that if I have a point X in the cone, ? X for ? positive has to be in the cone. I put any point in here, and I scale it with a linear vector, and itís still in the cone. I pick this point here. I can scale it positively and itís still in the cone. I take this point here. I scale it positively. I stay in this area. You canít find me a point which wonít scale and remain in the cone.
This is not a convex cone. This isnít convex, and so neither is it a proper cone. Letís have a look now at generalized inequalities. Before, we had this symbol with the curvy lines. Now we add a K to it. If we take a proper cone K, we can parameterize a generalized inequality with this proper cone K. We describe it as being less than Y. If Y minus X is inside this cone K, we can [inaudible] strict inequalities in a similar way by saying that X is strictly this if Y minus X lies in the interior of the cone K.
Letís have a look at a quick example. The component wise inequality that weíre familiar with Ė this says that X is less than Y in the component wise sense if each of the components XI is less than or equal to YI. Letís take a couple of points Ė this one here and this one here. Letís call this X and this Y. X is described as being less than or equal to Y in this proper cone K if Y minus X lies in the cone. Letís find Y minus X, and itís this thing here. Y minus X is not Ė X minus Y. Is this backwards?
Okay. Y minus X Ė whatís going on here? Y minus X is inside here. If I had a vector 3,3 and a vector 1,1. Subtract them and I get 2,2. It points to Y, and this is inside this proper cone here. Thatís a component wise inequality. We can say the same about a matrix inequality. Weíve seen these before. X less than Y is this PSD cone here means that Y minus X is positive semidefinite. These are so common that we drop the subscript. Not only that, but we often also drop the special symbol and just use an ordinary symbol like this.
For the purposes of clarity, itís a good idea to stick with this. If itís obvious what youíre talking about, you can drop the K and sometimes people even just use an ordinary symbol. As youíd expect when weíre using the same symbol generalized, a lot of the things that we saw before carry over, and the properties are analogous. We have things like this here, and this property will carry over. Be careful, because not everything does.
One of the things that doesnít carry over is we donít have a linear or a total ordering. That means that if we find X and Y, X may be neither less than Y or greater than Y. Hereís another way of expressing that. This generalized inequality here is not in general a linear ordering, so like I just said, we may have neither of these holding. You can see that quite easily in our usual case here. If I have a point here and a point here, then this point in the sense of component wise is neither less than this point nor is it more than this point. Itís not a total ordering.
We now introduce another concept in this lecture, that X in a set is described as being the minimum element with respect to this generalized inequality if any other point inside the set is actually more than X in this case. Thatís one concept, and weíll have a look at a picture in a second. A second concept is a minimal element. We describe an element X as being minimal if any point that is less is actually the same point. Thatís a little bit bizarre to start with. Letís have a look at that.
Weíre going to use the cone K is equal to R plus to the 2, so component wise inequality in R2. This point X1 here is the minimum element of S1. What that means is if I picked any other point Y in S, then I can explicitly say that X1 is less than or equal to Y. That will hold for any point in here. Letís have a look at the difference here in S2. What we need here is that I picked X2 and S2. X2 is a minimal element. That means that if I pick some other point here, I know that if Y is less than or equal to X2, then it means that Y is equal to X2.
Hereís a point, for example. Make that Y. X2 is not the minimum element, because for this way, I cannot write X2 less than [inaudible] to Y. I canít actually do that because itís not a linear ordering. Thatís not allowed. But I can say that any point that is less is actually the same point. Thatís because Ė hereís another way of thinking about it. If I take this cone K, this set here is the set X1 plus K, and thatís the set of all points that are unambiguously more than X1. They can always be described in reference to the cone K as being more than X1. X1 is unambiguously least.
In this set, if I drew the same figure here, these are all of the points that are unambiguously more than X2. This point here is not Ė this point Y is not in the set, so I canít say that X2 is unambiguously less than Y. What I can say is I can form this other set X2 minus K, and this is the set of points that are unambiguously less than X2. Any point in here, I can say with full authority Ė this is a point thatís less than X2. That means because I find that there are no points inside the set S2 that are unambiguously less, I can describe X2 as being a minimal element.
Itís a slightly confusing terminology here, but hereís how it goes. A minimum is something that you cannot under any circumstances claim that thereís a least point. All points are more is what you say for minimum. Itís very confusing, even when youíve seen it before. Minimum Ė all points are more. Minimal Ė no points are less in this unambiguous sense.
We could have a different cone. We could have this cone here. This is a proper cone. We could call this K2. Then, we could again look at this and we could say Ė we could talk about this relative to K2. How would you describe X1 relative to the generalized inequality parameterized by the cone K2? How would you describe X1 in terms of this one? Itís minimal, so itís no longer minimum because the set doesnít lie inside this cone. There are points that are incomparable, if you like. There are points that you could argue are actually no more than this in an unambiguous sense.
Thatís going to be useful to us. Letís introduce another concept while weíre on a roll. This one is the separating hyperplane theorem, and it says that if we take two general convex sets, C and D, and if theyíre disjoint Ė they have no intersection and C intersect D is the empty set. If C and D are disjoint, then there is some A, not zero, and thereís a B so that the halfspace A transpose X in C lies in a halfspace A transpose X less than or equal to B. The opposite holds for X and D.
Hereís a couple of sets. Here is a hyperplane between the sets, and if F is in R2, then itís a line. You can also think about it as looking down Ė a prediction of R3. This says that we can find some hyperplane so that D lies entirely in this halfspace here, and C, if you like, lies entirely in this halfspace here. Itís called a separating hyperplane for quite obvious reasons. This theorem, which I think is proven in the book, says that any two sets like this that are disjoint Ė there has to be a separating hyperplane.
Strict separation requires more, so I wonít go into the details here, but strict separation would be the hyperplane that actually passes through neither set. We can use this separating hyperplane theorem and turn it into another one called the supporting hyperplane theorem. A supporting hyperplane we define as being one where Ė this is described as being a supporting hyperplane if the only point that it touches is Ė if it goes through X0 and all of the set is on one side. You can say that if C is a convex set, then at every boundary point of C, there will be a supporting hyperplane.
Letís try to connect these two new ideas together. We have a separating hyperplane theorem which says if we have two sets and theyíre disjoint, we can find the hyperplane that separates them, and one which says that if we have a convex set and a boundary point, we can find a supporting hyperplane. We can actually use the separating hyperplane theorem to prove this one. We can actually say Ė weíll form a set X0, and weíll also form a set interior of C.
We take a convex set C. Weíll assume that itís convex by looking at this part of it. Weíll say that X0 is one set and itís convex because itís just a single point. The interior of C is a convex set if C is convex. The interior of C and X0 do not intersect if X0 is on the boundary, and then we can find a separating hyperplane between X0 and the interior of the set. That will be a supporting hyperplane.
How did I use the convexity? Itís actually used in the proof. This is a non-convex set, and if I had a point here, I cannot find a supporting hyperplane, because any line through here is actually going to cut through the set. The theorem is that if C is a convex set, then at any boundary point, I can find a supporting hyperplane. The other thing I used is that the interior of C is convex if zero is convex, and so C and D are disjoint convex sets. I used convexity of those sets and the fact that theyíre disjoint to show that there is a separating hyperplane and I used that to show that a supporting hyperplane exists.
The square we saw before Ė its interior was convex, but nothing special. This holds for any convex set. If the set was the interior of C, all of these theorems would hold. Itís just a convex set. In fact, if we look at this one here, I could have a hyperplane though here. This also is a separating hyperplane. Hereís a convex shape, and this is a supporting hyperplane. Soís this. These are not unique supporting hyperplanes and separating hyperplanes.
Hereís another one. We are powering through the material, and again, I encourage you to have another look through chapter two. What we can do now is introduce a dual cone. Dual cones again are somewhat strange when you first see them, but theyíre going to be very useful to us later. Youíll be seeing the world dual, duality and other things like that every week throughout the course. This is defined for any cone K Ė it doesnít have to be convex. It surely doesnít have to be proper. We define the dual cone, K*, as equal to any point Y with inner product greater than or equal to zero with a point X in the cone K.
Letís have a look and see what that means. If I have a cone that looks something like this Ė Iíll call this K. It happens to be proper, but thatís fine. K* in this case is any vector Y where Y transpose X for all X and K is nonnegative. Letís take this line here. Draw out a right angle here. Thatís a right angle right there. I take this one here and I do a similar thing. This set here is actually the dual cone of K. This one here is K*. Thatís the set of Ė if I take any vector in here, its angle to all elements in the set here is less than or equal to 90 degrees.
If I look at this vector here or any vector in this dual cone K*, its angle to all elements of K is less than or equal to 90 degrees. This is the dual cone. A few examples Ė if I look at the nonnegative orthant from before Ė thatís the set here Ė the dual cone of the nonnegative orthant is the nonnegative orthant itself. Itís described as being self-dual. You can see that quite easily if I have two nonnegative vectors Y and X, Y transpose X has to be less than or equal to zero, and you should be able to go through a proof of that quite easily. We could just do YI, XI and so if the components are nonnegative, so is the product and so is the sum.
The positive semidefinite cone Ė thatís also self-dual, so in this particular case, we actually use a slightly different definition of inner product, and again, details are in the book. If you find the set here, itís going to be the set itself. Euclidean norm cone Ė itís self dual, but finally, we get to one thatís not self dual, and in fact, most of them arenít. A lot of the useful ones are, but hereís one that isnít. Itís the one norm cone. The dual cone of a one norm cone is actually the infinity norm cone.
The other thing that we note is that the dual cone of a dual cone is actually the cone Ė the initial cone that we were after Ė if itís a proper cone. Thatís with a big if, if K is proper. It may also hold if K is simply convex. Youíd have to check me on that. What we can say now is that the dual cones of proper cones are proper, and hence, we can define generalize inequalities. If you give me some inequality with K, I can define a generalized inequality with K*.
We talked about minimum and minimal elements before, and here is one of the ways that those come into play. We have some set S. We know nothing about the set. Letís say that X is the minimum element of S. That is true with respect to the generalized inequality. X is the minimum if and only if for all ? that are strictly in the dual cone if X is the unique minimizer of ? transpose Z over S, then X is the minimum element.
This is just touching on the very first parts of duality, which is going to come in lots in this course. Itís a little mysterious at first, perhaps, but hereís a definition. Itís going to be quite fascinating what the ? means exactly. For now, thatís what a minimum element is. A minimal element is slightly different.
If X minimizes ? transpose Z over S and there is some ? Ė if we can find any ? thatís in the dual cone, then X is minimal. Weíll see later how useful this is. Finally, we can say that if X is minimal, there is some ? such that ? transpose Z has a minimizer X. These things Ė Iím really starting to run out of time, so Iím just tossing them out there, and weíll see duality in lots more detail.
Last thing today is what we call the optimal production frontier. Youíve probably seen this before, especially in 263. We talk about it with a very simple production frontier. Hereís what it says. P is a set of possible production methods. On this axis, we have fuel, and on this axis, we have labor. Say weíre testing out different production methods in our factor. If a point is in the set P, it says that there is some production method, which, say, produces 100 milk bottles and takes this amount of labor and this amount of fuel.
If X is in P, there is some possible production method that uses that amount of labor and this amount of fuel. This is a very simple example in R2. This set we could find. Some marketing guy might tell us what products are available. We might do some research. We might do some testing. There are lots of different ways we could find this set. What we say is that an efficient method is one that is minimal with respect to R plus to the N.
First of all, what would it mean if we had a minimum element? Letís say we had X and P, and weíre using this generalized inequality with R plus to the N. If X was a minimum in the set, it would mean that if I had X here, say this was in the set. The whole of P would lie within this. It would say that any production method requires more labor, more fuel or both than X. That would be X being minimal. Minimum is a slightly different quality.
If I had X1 here, this is a minimum element. What it means is that there is no method that requires simultaneously less fuel and less labor than X1. Itís efficient. Any vector along here Ė there is no method that is unambiguously better. Thereís none that simultaneous require less of both. Labor might be more expensive than fuel, so we might want to move along the curve and trade off between them, but we donít want to move off this curve. Likewise, this part of the curve is also efficient methods.
The problem with X5 is we can produce at X5, but we could produce at X2 and we would use less labor and simultaneously less fuel. Thereíd be no point in picking the method X5 if these were all the parameters involved. X1, X2, and X3 are efficient, and X4 and X5 are not. You can think about these as being minimal with respect to R plus to the N. There are other ways. We might be interested in a different cone, and there might, in fact, be a minimum for a different set, but this is just tying at least one of these things into a practical application.
Any final questions? We are done, and Professor Boyd will be back on Tuesday.
[End of Audio]
Duration: 77 minutes