NaturalLanguageProcessing-Lecture16

Instructor (Christopher Manning): Okay, hi, everyone. So today I知 going to go through in some detail some of the ideas that I started on last week of how you can actually go about building a compositional semantic representation to provide some kind of handle on how you can start with meanings of words and some tactic structure, and actually put together a meaning for a complex sentence. And so most of the work that痴 gonna be talked about here is how can we have enough mathematical machinery and essentially what we池e gonna using is lambda calculus to glue things together, and first order star predicate logic is the output form, so that you can actually have a complex meaning being built compositionally out of the parts. And so that値l be today, and then for the last two lectures next week, on Monday we値l then talk about well, how can we learn about and represent the meanings of words, so that値l be the lexical semantics part. And then the last lecture, I値l talk about some applications like question answering and textural inference. [Inaudible] say a sentence about quizzes. So again, have precisely one more quiz and sort of shifting the schedule between having had the holiday on this Monday and lectures ending next Wednesday, our goal is to hand out the final quiz by the end of this week, Friday, and then for it to be due on the Wednesday midnight, the day of the last lecture. Okay. Okay, so first of all, for the first couple of slides, essentially I知 gonna be kept in a slightly more formal way with the kind of material that we talked about last time, and then go on and talk about how that we can get further than this. So this sort of bout looks familiar from last time. And so we have two people, 適athy and 擢ong, we have a verb 途espects which we池e representing with lambda calculus taking two arguments, and inference of the verb 途un. And then here is our grammar, and so I知 stop writing stopping writing definite clause grammars, so our grammars are just effectively being written before the colon and just like a contextually grammar before, and after the colon is writing how meanings are combined to represent the whole sentence.

And the form of what happens with the semantics in our grammar rules is always gonna be incredibly simple, so what we池e doing here is we池e saying our composition rule is, 典ake the meaning of both halves and do function application. Do function application [inaudible] only one thing a verb phrase rewrites as a verb, and so you just pass the meaning up. So essentially, we池e always going to be either just doing function application or passing the meaning up. And so the action then comes into well, how can we have lexical representations of meanings that値l allow the right kind of combination? Okay. So having this is then enough that we can do the same kind of example that I showed last time, so we have, 適athy respects Fong. We write down the meanings of the words next to their base categories, we and then build a verb phrase where we do function application lambda calculus just like this. So we致e got the meaning of 途espects, the meaning of 擢ong. When we do function application we say, 徹kay, 擢ong substitutes in for the value of y, and therefore the return value is lambda x respects x Fong. And then we do the same again at the top node. We say, 徹kay, here痴 the meaning of the verb phrase. We apply it to the argument of the subject 適athy, and so we get out, 適athy respects Fong. And so note in particular we kind of do that swapping here, so that internally the verb is taking its object first. So the outside argument here is representing the object argument, not the subject argument, and then it痴 kinda swapped around inside. So we do get out the right meaning of, 適athy respects Fong, and not the other way around that Fong respects Kathy. Okay, so the general scenario that I致e kinda used as my main motivating example at the moment is that essentially, what we池e going to be doing is building a database interface. There are other things that you could imagine. You could imagine that these are meant to be commands that your pet robot understands or something like that, but as a kind of a concrete way to have a semantics that we can map onto, I知 going to assume that we致e got databases. Now, I知 not actually going to really spend any time on talking precisely about the mapping onto SQL, I知 just gonna to sort of show random motivate show bits of SQL that you could imagine mapping it onto and take that as motivational. So I知 really going to show the mapping for first order logic, but for those of you who致e seen databases, I mean following the kind of data along tradition of databases, it should be fairly straightforward to see how that if we can represent things first order logic, we can then do a further tetonistic transformation into an SQL query.

Okay, so in particular what I知 going to assume essentially is that we致e got database tables that represent our predicates of interest, so in that previous example we had a respectralation between people. And so when we have a respectralation in our predicate logic, I知 going to assume that we have a respect table, and the respect table is going to have two columns. It痴 going to have the IDs of people who respect other people, the respecter column, and the respected column. So that if we have Kathy and Fong as our two IDs for people in the database, we could then if we take, 適athy respects Fong, as kind of a new information statement, we can that as, 徹kay, what we wanna do is insert into our respects table a pair of respecter/respected, where Kathy is the respecter and Fong is the respected. Alternatively, in PEP痴 slightly more interestingly we could also look at the goal of then answering the question, so we all want to be able to answer the question of, 泥oes Kathy respect Fong? And so then we値l be wanting to do a query of the database and work out whether it holds a particular row. So the little this point we use the teeta little SQL trick where we say, 鉄elect 土es from respects where respects dot respecter equals k, and respects dot respecter equals f. So we池e wanting to ask 土es/no questions like, 泥oes Kathy respect Fong? And we kind of fake that up in SQL by saying, 鉄o select 土es. And so then if this relation is in the database SQL will return 土es, and if it isn稚 in the database SQL will return 渡o rows returned, and we interpret 渡o rows returned as just meaning 渡o. Okay. Okay, so that値l be our goal to be able to build that kind of interface. And one other note I should致e said I mean just like last I mean in the examples I show there, and the semantics I make, I知 assuming that for every predicate in the semantic form, we exactly have a relation that in the database for this database table that exactly matches that relation. That kind of makes it convenient and easy. I mean in practice that may not be true, and the example that you could think of there is equivalent to the example from last time with chat AD, so that chat AD actually only has one database table for countries. It has all the information about countries, but what it has is predicates, which you could think of as equivalent to database views where it has a capital predicate, which is taking just then two columns out of the countries table of country name and capital name. And so I知 assuming in the same sort of way that I致e got database views where I that map precisely onto the predicates of interest. Okay, so let me go through a little bit more formally the kind of representational base that we池e gonna use for this inference. So the model I知 gonna use here is going to be using the higher order lambda calculus, so the higher order is meaning that you can have variables that aren稚 only individuals, but the variables themselves will be functions like last time. And in particular the version of things that I知 going to present is then assuming that we致e got a type lambda calculus. So traditionally when you池e presented a lot of things in either the lambda calculus world or the logic world, things are completely untyped. You just have variables x and y, and do things with them, but there痴 absolutely no reason that that has to be the case. And just like most people have decided that programming languages tend to work better when you have types on variables, you can also put types on variables into logical systems.

And I think there are good reasons of both convenience and foundationally that once you池e working with higher order logics, I it can be very convenient to have type logics, and I知 not gonna go off on about that for a long time, but we池e gonna assume that all of our variables have types. So we池e going to have Booleans, zeron 1, we池e going to have individuals, who are people and things in our model. And then we池e going to have functional types where we can talk about properties of individuals. So the basic property is something that takes an individual and says whether something is true or false about them, and then we have more interesting ones like binary relations, or ternary relations. So in terms of the kind of model we have in mind, this picture that I put on the blackboard is the kind of picture of our model. So we have an oval world, which is the outer circle, and inside the world we have various individuals, which I致e denoted with x痴, and some of them have names, and here痴 Kathy and here痴 Fong, and here痴 some other individuals in our world. And particular individuals can be objects; they don稚 have to be people. Okay, so then if we have properties of an individual, properties can be represented as subsets effectively. The subset where that the property is true, so this is the car property, and the things inside of the car, so A and B are cars, and the other six things the other four things in my world aren稚 cars. Similarly, this thing here is in Palo Alto, and that thing there is in Palo Alto, so I致e got this [inaudible] in Palo Alto property. Red things [inaudible], so those are my various properties I can have in my models as effectively my model of the world that I can do inference on with in a model theoretic way. It痴 a little bit more complex when you then have predicates that have more than one argument because then I can稚 draw the same convenient circles in the same way. But you can think of a two-place predicate like 斗ikes, or 途espects is like this. So it takes a first argument, which I致e only partially represented it here, so you can read it. So the first argument should be able to be any individual, but I致e only showed two of them. And what it then does is returns another function, so the function it returns is then a property. So the property is then saying whether the 斗ikes Kathy property holds of these I知 sorry you know what? I drew this the wrong way around for some I was wanting let me just quickly change this. This whole point was it was meant to take the objects first, and then take the people second, is what I just illustrated with my previous example. Let me just quickly change this. A, B, C, [inaudible]. Okay, sorry. Okay. This is one [inaudible]. So the first argument it takes is the object, so if you have the predicate, so if you have 斗ikes, and its object is A, you池e then what痴 returned is a property, a function that takes a single individual argument and returns a Boolean.

So this property here is the set of people that like object A. And so that set there is 適athy as I represented it here. And here痴 the property of individuals that like object B, and the set that for which that is true is 適athy and Fong. Okay, and you can apply in that same general way if you want to have functions that take three arguments that you池e building these higher order functions, or return lower order functions until they ground it in either individuals or Booleans. Okay. Okay. Something to note that I will use sometimes though tends to confuse people, so I won稚 use all of the time, once you have types in your so the conventional way of writing untyped lambda calculus is things like this, lambda y, lambda x, respect xy where here shows that this is something that takes two arguments. Now once you have a typed lambda calculus, that all functions inherently have a type. So it痴 just like java that if you have a method that it has a particular type that is its signature, and you can know that a function has a particular type. So you kind of actually don稚 have to write all these lambda arguments, you can just write 途espect because it痴 just a fact about respect that respect is a function that takes two individual arguments and returns a Boolean, that that痴 just the nature of the function. And so because of that we can then write things in the shorter form, which I will just show. So this is referred to as the long form where you池e showing lambda abstractions of all the arguments and then how they池e being applied, but the short form is just to write the function without excessive lambdas.

And so I mentioned very briefly and I値l come back to in a minute, the idea of having quantifiers. So that when you have a quantifier like 兎very it痴 a relationship between two properties, so this is, 摘very kid runs. And so you池e evaluating the property of being a kid and the property of things that run, and what you池e asking is whether for everything that this property is told, whether this property also holds. So this is writing out a generalized quantifier in the form of the long form. But actually, we can just write it out in the short form like this because 徒id is a property of functions of individuals to Booleans. 迭un is a function from individuals to Boolean and the generalized term in 兎very is then a function that takes two properties and returns a Boolean as to whether it holds. And so we can write it out in the shorter form. And so though these kind of general rules for lambda calculus, which I知 not gonna spend a lot of detailed time on the fine points of, but the general things that are for lambda calculus that you can do lambda extractions, so you can always pull out lambdas if you want to like this if it痴 useful to. You can do function application, which gets called beta reduction, so you can learn from this that Alonzo Church was really fond of Greek letters and named absolutely everything with a different Greek letter just to keep things complex. In those eta reductions, so eta reduction just let痴 you do the reverse of lambda extraction, and then alpha reduction is used for renaming of variables, so you don稚 get chance collisions between variable names. Okay. And so the central idea is that we can effectively come up with good semantic types that correspond to syntactic categories, or at least common pieces of syntactic categories. We値l see that there are various complications in various places. So in general when you have nouns or verb phrases that they will be properties. So something like being a car is a property, so that there痴 a certain set of things that are cars. But also a verb phrase, so the 斗iking A is also a property because then you池e picking out the set of subjects for which liking A is true. The simplest case of noun phrases are individuals, so when we have things like the, 適athy respects Fong, example, Kathy and Fong we both represent as individuals. That kind of very quickly stops working when you want to deal with more complex cases like, 摘very car is red, and so I値l come back to how people deal with that in a moment. What about if you want to sort of be building a meaning of 途ed car? Well, the prop it seems fine to say that red is a property at the end of the day, the set of things that are red things, but we have this kind of idea that we want to be able to build up the meanings of words by doing function application. And so if we simply have the meaning of car is a property and the meaning of red is a property, well, we kind of can稚 combine them by function application any more. And at this point there痴 actually a choice as to how you go about designing your system. One way you could go about designing your system is to say, 添ep, G痴 function application isn稚 going to work, so what instead we池e going to do is say, 摘very time we write a syntax rule down, we池e then going to write special rules as to how to generate the meaning according to that syntax rule. So for example I could say that car痴 a property and red痴 a property. And if I have the syntax rule 渡oun phrase goes to adjective noun, what I say is the way you translate that is if the adjective meaning is the property p, and the noun meaning is the property q, you translate that as lambda z, p of z, nq of z. So I知 kind of building up in my syntax rule a translation rule. And that痴 referred to in logic as using a syncategoremic translation where you池e introducing extra stuff in your syntax rule.

The general trend that people have used when doing computational semantics has been to really avoid doing that because you get a lot more simplicity and power if you can just have automatic translation by function application. But the cost of that is you then have to encode the extra complexity down in the lexicon, and so that痴 what痴 then shown on this slide. So if we just want to be able to make a meaning for 途ed car by function application, then what we have to do is say that one word is a function that takes the other one as an argument. And so we say it痴 the adjective that does that. So we say that well, red takes the meaning of car as an argument and returns something. Well, what does that mean that red should be? Well, the meaning of car as a property, it痴 something from individuals to Boolean, and when we have a red car, that痴 also a property, so this set here is the set of red cars, and that痴 also a property. And so therefore, the meaning of red should be something that takes a property and returns a property, and so its type is what you see here. It takes a property individual to Boolean, and returns another property. Okay, and once you start going down that route, you just kind of keep on heading down that route and you get into higher and higher order functions. So if you want to have a adverb like 砺ery in 砺ery happy, you say, 展ell, what does 砺ery happy do? It takes an adjective like 塗appy and it returns something that has a meaning that痴 kind of like an adjective, right? 天ery happy is sort of like an adjective meaning apart from it痴 describing a more extreme adjective meaning. Okay, so if that means the type that we池e going to give to adverbs that modify adjectives at least is it takes as its argument the adjective痴 type, and it returns something that is of an adjective type. So it痴 intebol to intebol, to intebol to intebol, which is just about incomprehensible, but in practice, these things are fairly straightforward. If you kind of just think of them in terms of higher-level notions. So you once you think that these are properties, and adjective is something that maps from one property to another, and this is something that maps from adjective meaning to another, that then things sort of fall into place in sort of straightforward enough way. Okay, so here is my grammar fragment that I知 going to start off doing some examples for the grammar fragment, and I値l add to it a bit more so I can answer questions. So this is a very straightforward PCFG grammar I should say CFG grammar I mean the only way it痴 different from the kind of ones that we致e seen in the Pentree Bank, is if you look at the noun phrase structure, I知 actually kind of producing a binary noun phrase structure rather than flat noun phrase structure. If you want to build up semantics of noun phrases, it really helps to have a binary noun phrase structure, different reason than making parsing going fast. So a noun phrase goes to a determiner and then an N-Bar. An N-Bar can have any number adjectival PP modifiers and then goes to a noun.

But apart from that, note that the semantics is in all cases trivial, so the semantics of our rules is simply function application that you have something be the head, and it痴 going to be the head [inaudible], and if there are other things, they just become arguments of it that I知 denoting there. And that means that all the interesting stuff then happens in the lexicon. And so this shows some of the lexicon, and I値l go through some of this with examples, but just go through a couple of things at the start. So we have 適athy, 擢ong, and 撤alo Alto, and so they池e things that are individuals. I guess when I did this I didn稚 actually put 撤alo Alto in as an individual, but we could have somewhere here 撤alo Alto also represented as an individual. Okay, then you have things like 田ar where 田ar is a property, so it痴 something from individuals to Booleans just like here. For 途ed, red is a little bit more complex since what I致e represented here as a property is actually then 途ed prime. So 途ed prime is the property of things that are red. So the red here is then this red that痴 of the intebol to intebol tied, where what it does is it takes a property 菟, so this is the higher order part of our lambda calculus. And inside its lexical inferri, it does the meaning combination. So it says, 徹kay, if you池e gonna make up something like 途ed car, I take as an argument the property and then I build as my lexical inferri to return the property in which p is true of it, and the red property is true of it. And so we kind of do the clever bits of meaning combination down in the lexical entries. Okay. And then maybe more straightforwardly down at the bottom we then have the verbs, which are much like before. So 途un is taking individual and returning a Boolean. 迭espects is taking two individuals and then saying whether it痴 true or not, similarly for 斗ikes. So 斗ikes is just like here, and an example like 途uns is effectively another property here, so there痴 the things that run, and so maybe this runs or that car doesn稚 run. Okay.

Okay, so these are the ones in large print, when I get on to the more complex sentences, the print gets even smaller. And I think to some extent they値l sort of work through some of the details of this, you have to kind of look more carefully and see that it all makes sense more than I can do in lecture, but nevertheless I want to try and point out some of our high level points of what痴 going on. So what we wanna do is make a meaning for 途ed car Palo Alto. And I mean first of all, to think about it conceptually, well, what should the meaning of 途ed car in Palo Alto be? The meaning of 途ed car in Palo Alto, well, that seem like that should still be a property. It should still be something that picks out a set of individuals that satisfy the property 途ed car in Palo Alto. I mean in particular, in terms of my model that I have right here, what I知 wanting it to do is pick out precisely this set here as this is the 途ed car in Palo Alto set of individuals. So the end result of the semantics that we make should be that we池e getting out a property. So how do we actually get that all out? This is the bit that I致e talked about already. So 田ar is a property from individuals to Booleans, so 途ed has this more complex meaning where it takes a property and returns a bigger property. So we can combine in here 田ar as p, and what we get back is lambda x, car x and red prime x. So that痴 already created a property, which corresponds to the things that are cars and red, so that痴 that sort of bigger property up here. Okay, and then we wanna do combining in Palo Alto. And so in at that kind of scary lexical entry I致e written next to it, which I didn稚 stop to dwell on in the time. But while in principle you can kind of reason through in first principles what kind of types you have to put on words to be able to get things to work and then essentially to build those types. So if you think about 妬n Palo Alto 妬n Palo Alto it should itself be a property, right. That that seems like it痴 something that can be true or false of individuals there. At least at a certain point it time, they池e either in Palo Alto or they池e not in Palo Alto. So here I have this property of things that are inside Palo Alto. Okay, so that means that when we build this pp up here of in Palo Alto, its meaning should be a property. Well should complexify that once more. The idea of 妬n Palo Alto should be a property, but just like an adjective we池e then going to want to have the pp modifying the meaning of 途ed car just like an adjective, so I値l explain that in a moment. Okay, so we have we wanna build up a property meaning, but what we actually have is an individual 撤alo Alto, and we have the word 妬n. So that suggests that we want to have something that takes an individual and is returning something. So the prepositions meaning starts off by taking an individual y.

And so well, what痴 it gonna return? It痴 gonna return for 妬n Palo Alto something that痴 then an adjective meaning, something that will take a property and build a more complex property where it satisfies the previous property and 妬n Palo Alto. And so that means at the very end of the day, that what you get for some a word like 妬n is it痴 taking an individual, which is the object of the preposition, and then it痴 returning one of these adjective meanings. You can have 妬n as having that kind of type. Okay, so it takes 撤alo Alto as an argument, so 撤alo Alto becomes y and substitutes into 妬n to give this meaning, okay. And so now this is something that works just like an adjective, that it値l take a property as an argument and return a more complex property. And so here it takes as it痴 argument things that are 途ed cars, and so all of this becomes lambda p, and so you substitute 妬n, so then you get lambda x, car of x, and red prime of x, and in prime Palo Alto x, which is then the meaning of the whole thing. Does that sort of make sense? I know there痴 a lot of notation stuff, but the basic idea is we can take these things of various functional types, we can write them so that they take arguments of the appropriate functional type, and then they return an argument of the appropriate functional type. And so in this sort of way if we model all noun phrase modifiers as properties, that as things that map from properties of properties, we can then have successive noun phrase modifiers kind of combine together and the resulting meaning would be still a property. So by extending this we can have sort of the large leather green couch in my bedroom, which my uncle sold me, and we can kind of have each of those modifiers be something that maps from a property to another property, and we can combine it together and make up the meaning of the whole. It痴 worth noting finally, the way I致e wrote the grammar if I go back a few slides but when you池e at this N-Bar level at the N-Bar level you can just take modifiers. So you could take adjectives on the left, or you could take prepositional phrases on the right, and if I have other things like relative clauses, you could take them on the right as well, so you can just take any number of modifiers. And so that actually means that the grammar has a syntactic ambiguity where you can take the modifiers in either order. So here I built 途ed car first, and here I built 田ar in Palo Alto first. And interestingly, that痴 then what people call a spurious ambiguity, whether things that are licensed by the syntax that don稚 actually make any difference to the meaning that痴 generated for the sentence. So regardless of which one I combine with first, I get the same meaning of lambda x, car x, in Palo Alto x, and red x. Okay.

And precisely the reason why I get an ambiguity that痴 spurious there is that the meanings of the different modifiers I知 putting on are actually just turning into extra conjoined terms. So things that are just turn into extra-conjoined terms are referred to as intersective modifiers in linguistics, and so that痴 exactly what I was doing here. I was having the set of cars, I was intersecting it with the set of red things, and then I知 intersecting that with the set of in Palo Alto things, and I知 coming down to this intersection and that gives me intersective semantics. So if you池e in that world and so what痴 of the things that we think of as modifiers have intersective semantics? So 堵reen leather couch. You have the set of couches, you have the set of green things, you have the set of leather things, you can intersect them, and you have the set of green leather couches. And providing we池e in that world of intersective semantics, it痴 very easy to see how to do this with database queries because effectively you can have a table for each one of these properties, and then you can just join the tables and find out the individuals, the database IDs for which are in all three of these tables. So that痴 kind of the easy case and the case that we can actually practically work with most of the time. And sort of important to note though, that there actually are and this is something that痴 talked about quite a bit in linguistics there are nonintersective modifiers where you get a meaning from the whole thing, which you can稚 kind of represent as an intersection of terms. So I mean the most famous example is of words like 殿lleged. So if you say that someone is an alleged murderer, that doesn稚 mean that they satisfy the property of being a murderer and that they satisfy the property of being alleged. It means that they satisfy the property of being an alleged murderer where alleged murderers somehow a property meaning that you can put together out of the meanings of the parts. And so somehow if you wanna be able to deal with alleged murderer, you wanna be able to deal with that, but it痴 kind of not so straightforward as to how to do that because effectively you want to be sort of working out a sort of a new functional form out of an old functional form. I mean here痴 the other example of this that I had here, which isn稚 quite so convincing, but I think illustrates the same point. So this example here is 登verpriced house in Palo Alto. And so although it痴 not quite as dramatic as 殿lleged, I mean I think it痴 reasonable to say that overpriced isn稚 a term that has absolute reference.

So you can have one semantics for an overpriced house and it痴 in Palo Alto, and that痴 a kind of a global scale of what痴 an overpriced house, so using these semantics, well, every house in Palo Alto qualifies as an overpriced house by global standards. But if you make the modification the other way, you then have the set of houses that are in Palo Alto, and then you池e saying overpriced out of houses that are in Palo Alto, and so that seems to have a different semantics of houses that are overpriced even relative to the standards of Palo Alto. Okay, so you can get tricky things with nonintersective adjectives, but I知 not gonna deal with that further today, and I知 just gonna stick with the intersective adjectives. Okay. So all痴 good until now, but unfortunately natural languages just get a bit more complex that what we致e done so far. Okay. So I don稚 know if this ever troubled any of you guys when you did a baby logic class, but when you think about how people translate English into a baby logic class, something fundamentally weird happens, right? So if you wanna translate 適athy runs, you translate it into 途uns Kathy. That痴 easy. But if someone then tells you to then translate into predicate logic 渡o kid runs, well then, you kind of have to know to do this really funny thing where you write down 渡ot there exist x, kid x, and run x. So run here the noun phrase subject was an argument of the verb, just like it seems to be in the syntax of the sentence, 適athy runs. Kathy is an argument of running. Syntactically this example doesn稚 seem any different. 哲o kid is now the subject of 途uns. Runs, a predicate, has taken a subject, that痴 his argument, but the translation is kinda completely different. We found this really funny thing where we致e wrapped some weird logical structure around the outside of the whole thing, and 途uns is buried down here somewhere. So that looks kind of awkward, and so the question is how can we produce those kind of translations in our system. Well, here痴 another example of how things go wrong. So we have, 哲othing is better than a life of peace and prosperity. A cold egg-salad sandwich is better than nothing, and if you don稚 look too closely it seems that you should be able to deduce from that that a cold egg-salad sandwich is better than a life of peace and prosperity.

Why is that? Well, it seems like 妬s better than should be a transitive relationship, right. That if you have that whatever a Blackberry Pearl is better than a Motorola Razor, and an iPhone is better than a Blackberry Pearl, you should be able to conclude that an iPhone is better than a Motorola Razor, right. 的s better than is a transitive relation. And so that痴 what I知 attempting to do here, I知 trying to do inference using the fact that 妬s better than is the transitive relation. But with sentences like this, it just goes totally wrong, and well, why does it go wrong? Well, the reason is again, 渡othing is a quantifier. It痴 just like this 渡o kid runs, here when you have 渡o kid, that as soon as you have things that are quantifiers, it kind of just doesn稚 work if you池e treating them as arguments to relation, and you seem to do something special and different. Okay, so somehow we need to be able to model in our system how we represent the meaning of these quantifiers and deal with them. And so well, how can we do that? Well, we致e done pretty well with things that are properties. We have 途ed car and 撤alo Alto, and things like that. So we could build everything about noun phrases as complex properties, except when they have a determiner out the front of them like 鍍he, or 兎very, or 渡o, and things like that. Once we see that, it seems like we have to do something very special. So what are we gonna do from the meaning of 鍍he red car in Palo Alto, or 兎very red car in Palo Alto? Well, first of all, mention what we gonna do with 鍍he. What for 鍍he what the easiest thing to do is say, well, we want to then map down to an individual. Just like we had Kathy and Fong as individuals. 典he red car in Palo Alto, that should be an individual. It should be this guy right here, that that痴 the meaning of the 鍍he red car in Palo Alto. An so a possible semantics for 鍍he, and this is actually precisely the semantics for 鍍he that was proposed by Bertram Russell, is to say that 鍍he, the meaning of 鍍he can be represented as the iota function. So iota is the semantics of 鍍he, which was back in the lexicon if you look back ten slides. The meaning of 鍍he is that this is a partial function. So if there痴 precisely one thing in the model for which p of x is true, then the function returns that thing. So if I ask for 鍍he red car in Palo Alto, the meaning what痴 returned is the individual b. If there are multiple things for which a certain property is true, so for example if I just say, 典he car, then this is only a partial function, and the return value is undefined. So if the function doesn稚 apply, it痴 got no return value. So that痴 our meaning of 鍍he. It will return an individual if one is uniquely specified, and if there one isn稚 uniquely specified, you don稚 get any value in return.

Now, in now I mean that痴 kind of true. I mean in practice though, it痴 not quite that simple. So when I I mean basically we use 鍍he when we池e talking about things that we can identify. So we talk about 鍍he president of Stanford University because there痴 only one of them, and so it can be uniquely identified, but we don稚 say 鍍he department chair of Stanford University because that kind of it sort of feels wrong because it doesn稚 identify an individual, and that痴 Bertram Russell痴 core inside. I mean in practice it痴 a little bit more complex than that because in practice people will interpret the meaning of 鍍he as relative to some situationally defined context, so if you say something like, 撤ass me the phone, you池e not making a claim there痴 only one telephone in the world. What you池e making a claim is in this situationally defined context of the room that we池e sitting in, there痴 one phone, and that痴 the one that you池e meant to pass me. Okay, so that gives us a semantics for 鍍he nevertheless, so I mean now we could actually say a sentence like work out a semantics for a sentence like, 適im likes the red car in Palo Alto, because the meaning of 鍍he red car in Palo Alto would just be this individual b, and then we can look across and say, okay, 斗ikes b returns this function, and if we evaluate it with respect to 適im, we get 鍍rue as the answer, and so we can evaluate that kind of sentence. Okay. And so we can then translate that kind of sentence into our SQL database form, so 途ed car in Palo Alto, properties where we make a query of a database, and we return the set of objects that satisfy it, just like in a regular database query. So this値l be select cars dot odge for properties I知 just assuming that it痴 a one-column table where the field that痴 called odge. Select cars dot odge from cars locations 途ed, so I致e joined my tables where cars dot odge equals location dot odge, and locations dot place equals Palo Alto, and cars dot odge equals red dot odge. Okay, so then if I want to do 鍍he red car in Palo Alto, what I知 wanting to say is the return set from that database query is a single row, and I can actually write that in my SQL by being a little bit clever and writing a having clause, and I can say, 鉄elect all of the same stuff, having count equals one. And so this having clause will precisely implement my Bertram Russell semantics of 鍍he. If there痴 a single thing that satisfies it, I will get returned that one row of the database, and otherwise I don稚 get a return. Okay. Well, that痴 good up to there. If I could just say, 擢or the red car in Palo Alto, well, it turns into this individual, and we know how to evaluate individuals with respect to something like 斗ikes, but what if I want to say, 摘very red car in Palo Alto? Well, then it seems like we have to do this fundamentally change things around where with words like 兎very, these determiners, they have this quantificational force where they take other things as their arguments, and so that痴 the thing that we池e going to represent. So what we池e gonna take as the type of a generalized determiner, a word like 兎very and 渡o, is to say, 典his is something that takes one property. So for 兎very car it値l take the property of being a car. For 兎very red car in Palo Alto it値l take the property of 途ed car in Palo Alto, and then what it痴 going to return is something that maps from properties to truth-values.

Okay, and so I can have semantics for generalized determiners in terms of these what I値l call generalized quantifiers that take a single argument. So 都ome kid will take two 都ome kid runs, you take the two properties as arguments, you can join them together and then the 都ome predicate says is there at least one individual for which this property is true? And the 兎very predicate says is it the case that this is true for all individuals? Okay. Just what痴 expressed there. Okay, so now I知 kind of just about in business in the following sense. Okay, here痴 kind of how I want to do 兎very student likes the red car. Okay, so I have 田ar, I build up the bigger property of 途ed car, and then I combine it with the determiner 鍍he red car, so this is this iota operator here. And when we evaluate the iota operator of 鍍he red car, its value is just an individual. It痴 the individual b in our model over here. Okay, so because its value as an individual I can apply the predicate 斗ikes to it, and then I get something that is a property of 斗iking the red car. In particular the property that I get out is precisely this property here of 斗iking the red car. Okay. And so then what I want to say is that, 摘very student likes the red car. And so well, I壇 really be in business at this point if I could do the following trick. Up until now I致e treated the verb as the head of the sentence and say, 泥o function application. That痴 how I handled 適athy likes Fong. But 斗ikes took two arguments, 適athy and 擢ong, and it works, but it seems like that痴 not gonna work for me for 兎very student. But what if I just changed my mind and said, 展ell, let痴 take 兎very student as the head in the semantics, and have it take 斗ikes the red car as an argument. How would things work out then? Well, 兎very is something that takes two properties and then returns a truth-value, so it combines with one property here to give us 兎very student. And well, look, the meaning of the verb phrase is a property. We致e made precisely this property that I put a box around on the board, so 兎very student could take that as an argument, and then we壇 precisely get the 兎very predicate evaluating those two properties. And so it would be saying, 的s this property true of every individual? And at least for the partial representation I致e shown over here, it is true of every individual, and so I壇 get x my answer 土es. Okay, that looks really cool apart from the one problem that I played this mucky trick whereas before the verb phrase was the head of the sentence, I now made the subject noun phrase the head of the sentence, and that might then get you worried as to well, kind of what happens after that. Like what if I put a quantifier in object position? What if I said, 摘very student likes every car? Well then, I sort of am in worse trouble then because then if the meaning of 兎very car is something that takes a property and returns a truth-value, it sort of seems like I can稚 combine that with 斗ikes at all. So I need to sort of build some more technology to be able to deal with quantifiers. So the person who sort of initiated this enterprise was Richard Montague. So Richard Montague was a philosopher in the late 60s and early 1970s. He actually committed suicide, which is a long story I will not digress into for too much detail at the moment. But based the central idea of his work was to notice the fact that modern logic had developed all of these rather nice tools that had completely abandoned the goal of actually trying to provide semantic representations for human languages, whereas the original goal of logic as I mentioned before, was actually to help assess human arguments. The ancient Greeks actually believed that logic was a tool, so that when you listened to George Bush say something on television you could evaluate whether what he was saying was sensible or not.

So Richard Montague had the idea that maybe now that there are all of these good formal logical tools, maybe there actually could be applied vector natural language if we have the right technology for doing it. And so he introduced these kind of techniques of using higher order logics and lambda calculus. And so the idea that Montague came up with to solve the problem of how to incorporate quantifiers into things, was to say, 展ell, it just doesn稚 work to say that noun phrases refer to individuals because we wanna be able to say things like, 摘very student likes every car, or, 摘very student likes some car, or something like that. And these quantifiers we have to represent differently. So in his model all noun phrases got turned into these higher order things. So any noun phrase meaning got turned into something that took a property and then returned a Boolean. And so the meaning of 適athy is then no longer an individual 適athy. The meaning of it is this thing that takes a property and returns that property evaluated with respect to Kathy. Slightly confusing, but sort of makes sense, but once you have this, if you want to have the semantics of 適athy runs, or you致e got the property of running things, and you stick that in as p, and you池e evaluating whether Kathy is true of it. So you致e fundamentally changed around one of the functions and arguments. A lot of modern work has sort of seen that as rather ugly and unnecessary, and so has instead gone in this direction that essentially follows Montague, that essentially says you want to allow flexible shifting between types. So Kathy is just an individual, but if you want to for semantic combination, you can certainly turn it into something that takes properties and evaluates whether it痴 true of that property. We had properties like 杜an and 田ar, and well, it proved handy for 鍍he to have this iota operator, which could map from a property to an individual that satisfies it as a partial function, which is what those dots are meant to show. And so noun phrases are commonly these days represented by allowing these kind of flexible type shifting operations. Okay. Now this point time is getting short, and I知 gonna skip a little bit, but earlier on we talked about how there are quantifiers scope ambiguities, and I will assure you if you read the handout that goes with this class really carefully, it explains this beautiful way in which you can model quantifier scope ambiguities in terms of doing different amounts of type shifting, but I知 not gonna go through it right now. But believe me, here are the two different representations for 都ome kid broke every toy. And if you look very carefully as I flip them back and forth, you see in one that the 都ome quantifier is on the outside, and in the other the 兎very quantifier is on the outside.

Okay, let me just for a couple of minutes sort of show a couple of more examples of how you can then start to do questions like this. Okay, so we want to have we want to be able to do 土es/no questions, so we wanna check on a database something like 妬s Kathy running? What things are true of the world? And perhaps more interestingly, we might want to do content questions like, 展ho likes Kathy? What is the right semantics for questions is something people still write papers about in philosophy and linguistics. The model I致e got here is that a question is just going to itself be a property. So a question is I say a property like 途ed and 妬n Palo Alto, that the meaning of the question is just that property, and that痴 going to be interpreted as what is the set of things for which that property is true, and that痴 the answer to the question. Okay. So is how I then extend my grammar to deal with questions. And there痴 one kind of tricky bit here, which is I use this idea of 堵et threading for questions. The complex thing about questions is that they in English is that they have these long distance dependency. So if you have something like, 展ho did Bill report that John said Sue likes? The 努ho at the front is actually the object of the liking, and somehow you have to connect those two positions together. And there are various ways that it could be done, but a common way in some formal grammars, and the method that I use here, is this idea of 堵ap threading where you can introduce a variable as an empty node, and then kind of shift it up through the syntax until you discharge it. So here I will have an empty introduced with a meaning of a variable z, and then I will allow rules to pass up that z until I have my whole sentence, and then I will discharge the z as being an argument for that predicate. I will show an example. And then here are my syntax and semantics for question words. So the question for a question word like 努hat it kind of doesn稚 have any meaning in this model, it just takes a property and returns that property. So if we have something like 努hat is running, we have the property of 妬s running, and what will take it as an argument and just return it does nothing to it. But some of the other 努-h words do more interesting things, so for 努ho is running, it takes the same property as an argument, but it makes a more complex property, which says the argument property has to be true, and the thing has to be human. And if I asked, 滴ow many birds are running? Then I知 taking a property of being a bird, and then I知 going to be taking a property of 妬s running, and then what I知 going to be returning is the count of the individuals for which that combined property is true.

Okay. So if I take that and put it all together, I can actually now make quite interesting queries and translate them into something that can be evaluated over a database. And again, I値l probably go through this a little quickly, but you can peer closely at the examples and see that it really does work. So here痴 an easy case, which I値l use to illustrate this gap threading idea. So, 展hat does Kathy like? So the idea is we actually have empties in our grammar now, so there痴 a missing object of like, which is introduced with this variable z, an individual meaning, so this is the has a meaning of kind of 斗ike z, but we don稚 actually know who z is yet. So here痴 斗ike z as the meaning, then we get 適athy likes z, which has the meaning of 斗ike z, Kathy. And then what we can do we kinda pass up a gap as the splash categories, so this is a verb phrase, which had supplied inside it an empty noun phrase whose meaning was z. And so now here we have a sentence, which has inside it an empty noun phrase whose meaning was z. So at any point we want to we can then discharge our gap thread at empty and say, 徹kay, this sentence痴 meaning is actually 斗ambda z, likes z, Kathy. So that we致e turned back into a property the thing that wasn稚 actually present, that was missing. So this is kind of like in predicate calculus if you do natural deduction where you assume something and then you later discharge the thing that you assumed. Okay, so now I have a property of 斗ambda z, likes z, Kathy, and so for, 展hat does Kathy like, 努hat introduces no new semantics of it痴 own, and so the meaning of the whole thing is just 斗ambda z, likes z, Kathy, i.e. here痴 a property and the set of rules that are returned from the database for that property is the answer to the question. So I get select liked from likes, were likes dot liker equals Kathy. Cool. Okay, but we can go on from that and do much more complex examples. So, 展ho does Kathy like, works essentially the same, apart from 努ho has this more complex semantic form, so that the result is that you致e got this property of 適athy likes the thing and the thing is human. So now we have select like from likes humans where likes dot likes equals Kathy, and humans dot odge equals likes dot likes, so we池e restricting it to being a human thing. Okay, but we can put in more complex question words like, 展hich cars did Kathy like? So now 努hich cars is saying so 努hich is taking the property of a car, and it痴 taking the property of 適athy like, and asking if both of them be true at the top, which gives us this database translation.

And at that point we can just keep on using all the stuff that I致e tried to present today, and it really does all fit together and work right, so I can ask, 展hich cars did every student like? And so we have the same empty z. We can make a meaning for 兎very student likes. So now we have the set of this property is the set of z such that every student likes z. And then we can combine that with something like 努hich car, and make a much more complex meaning. And we can keep on going through examples like this. 滴ow many red cars in Palo Alto does Kathy like? Or, 泥id Kathy see the red car in Palo Alto? So at this point I知 starting to be in the space that the trees that you have to build are too large to fit in my narrow margin, so you値l just have to believe me that the proofs work out okay, but they do or the derivations work out okay. So this is, 滴ow many cars in Palo Alto does Kathy like? So we kinda make up the 電oes Kathy like part here. And then for the, 滴ow many red cars in Palo Alto, we make this property of red cars in Palo Alto just like before, and then how many takes first this property, and then it takes the 斗ikes Kathy property, and so it痴 returning the count of how many things satisfy this big composite property, and then we could translate into our piece of SQL where we池e then doing a count operation on the things that are satisfied. Now, [inaudible]. So I thought then just for the last few minutes I壇 mention just a little bit about some work that痴 been happening recently that could actually connect these things together. I mean I think it痴 fair to say that for until about 2005 I guess, 2004, 2005 that there was really kind of a complete disconnect between work that was using statistical probabilistic machine learning methods and OP, and logically oriented deeper meaning representation work on NOP, that that was something that there壇 been a lot of work on in the 70s and 80s, and there was still an occasional little bits of work going on in the 1990s, but effectively the two didn稚 have anything to do with each other. That by and large people were doing probabilistic machine learning stuff, were doing more surface work, so they were doing things like building part of speech [inaudible] recognizes and statistical parsers, all the things that you guys have been doing, but really were doing nothing significant in the way of doing full sentence meaning representations. They were doing some meaning tasks, but low-level meaning tasks like doing name density recognition, or extracting particular semantic relations.

But just in the last couple of years, there痴 actually then started to be some nice threads of work where people are actually starting to return to how can you learn and use much deeper semantics while still fitting into using probabilistic and machine learning ideas, and I thought that I壇 just mention a couple of things of this sort. So one question is, 展ell, could we possibly learn representations of this sort? And that痴 something that a couple of groups have been working on recently. So Luke Zettlemoyer and Michael Collins at MIT have been doing a couple of papers on that, and then also Ray Mooney and students at Texas have been doing this sort of work. And I can稚 really go through the details here, but this is in one very short slide the outline of the Zettlemoyer and Collins approach. So that they assume that they have some stuff at the beginning. They assume that they have an initial lexicon. The initial lexicon is mainly knowing about some function word, so for things like prepositions you have a meaning of what does this preposition mean? They also though assume that they know a few specific the main specific words, so that they might assume that they know the meaning of 典exas for example. They then start off with category templates, and so category templates is effectively possible semantic forms words could have. And so that痴 kind of like I was saying that there are these dominant patters that we have properties and two-argument things and things like that. They have a set of category templates. So they start off with a little bit of stuff, not nothing, but primarily then beyond that, what they assume that they have is pairs of sentences and meaning representations. That may seem like a fair bit, and in some sense it is a fair bit, but I mean a prominent belief of how human language acquisition works is that humans don稚 actually acquire stuff about sentences until they understand what they mean, right. That basically people are working out the meaning of things from situational inference, so that after the first ten days when mommy is holding the orange juice and saying, 泥o you want juice? That you start to understand what that means, and then you map between the words and semantics. So you have a set of examples like this, and so the goal of the the goal of their papers is that you want to do combined learning of both lexicon and syntactic structure. So although you have a kind of a bootstrap for the lexicon, most words you don稚 know what they mean, and you have no information whatsoever of syntactic structure and semantic combination rules. So this isn稚 like the Pentree Bank where you池e given all of the trees, you致e been given no trees whatsoever, and no basis for which to put together compositional semantics. And so what they do is that it痴 built as a kind of an introvert bootstrapping process where you池e postulating possible lexical forms, so that痴 where you池e using these category templates, and you池e trying to use them in a parser, and the parser is a maxent parsing model. And effectively you池e then learning lexical representations for words that seem to work in terms of the fact that they池e able to combine together with words that are in your initial lexicon, and work to be producing the final semantics for sentences that you壇 like to see, and effectively through each iteration through the bootstrap, what you hope to do is figure out a successful semantic representation for some more words, and then start building up your lexicon so it grows over time. But it痴 it is a kind of a cool joint learning task because I mean at the start there痴 absolutely no parsing model whatsoever, so that they池e simultaneously trying to postulate possible lexical entries, and then optimize the parameters of a maxent parser to try to parse things in a particular way as it goes along.

There痴 then also been work on reasoning with such representations. Now in particular domains, one possibility is just to go with logical reasoning. And I mean I think that can actually be quite successful in certain domains of business rules, laws, and things like that that you can do logical reasoning. But I think in general it is the case that just like all kind of probabilistic argument in general goes, that knowledge of the world is in general so incomplete and uncertain that we just can稚 get very far with strict probabilistic inference. In particular a lot of the time things only go through assuming various plausible assumptions. That all the time we池e assuming sort of plausible things will happen in the world, and that allows our inferences to go through where they池e not actually kind of the climates that値l happen. Most of the time we just assume that a really big earthquake won稚 happen in the next ten seconds, and so therefore it痴 reasonable to make assumptions about what the world will be like in ten seconds time. So that then leads us into the work on doing probabilistic models and knowledge representation reasoning. And so that痴 also been an area that痴 been actively pursued. I mean it痴 the case that all the current work on doing probabilistic inference still assumes restricted something more restricted than first order logic. So first order logic is in general very difficult to deal with because you have this open domain of individuals that perform [inaudible] is in the first order logic, is that you have an infinite space of individuals and you can quantify arbitrarily over them, whereas all of the existing models are probabilistic knowledge representation and inference in some way uses somewhat more restricted model where you can still do certain kinds of quantification that is not quite open [inaudible] like in first order logic. And so one thread of that is definitely Collins work here on probabilistic relational models. Another thread of work that痴 been going on recently is Pedro Domingo and colleagues have been doing Markov logic, and they differ in various ways. Part of how they differ is the Daphne痴 model was a basing net directed graphical model whereas this is then an undirected Markov network star model.

Actually just the last week I know someone who痴 been doing undergrad honors with me has been trying to apply some of the Markov logic ideas to doing mappings of sentences into Markov logic and then doing inference on that with at least reasonable success on range of sentences, it痴 not that it痴 a full coverage system. So I think that this is an area where it痴 now possible and it痴 starting to be a lot of exciting ideas where you can kind of start to combine machine learning statistical methods with using these richer representations and then you have to do richer semantic inference. Okay. And that痴 the end for today.

[End of Audio]

Duration: 75 minutes