Instructor (Christopher Manning):Okay. Hello. And so today for 224n, Iím gonna start into the topic of looking at semantics. That is the assured meaning on how to understand text. I think itís fair to say that, in general, when people think of natural language processing and what it can do for them really what they want is okay, text is gonna come in and weíre gonna understand it, the meanings are gonna come out. Itís good that Iím eventually getting to the part of the course to talk about that and say something about this. Thereís some kind of technical class organizational reasons for why I donít say so much about natural language understanding in this class, which is partly that every second year Dan Jurafsky and me have been co-teaching a separate class on natural language understanding.
Itís also partly a reflection of the fact that if you actually look at whatís been happening in the world over the last decade that a lot of the work thatís actually being used and being successful hasnít really been trying to do high level natural language understanding. A lot of the time is actually being focused much more down, doing the kind of lower level analysis tasks that weíve spent a lot of the time on. Itís also clearly the case that those tasks are, sort of, useful perquisites for being able to do deep and natural language understanding.
Nevertheless, thinking globally, I mean, what everyone wants is to watch their science fiction movies and to be able to have their pet robot that will talk to them and that seems to involve being able to do natural language understanding. A lot of the other applications you can think of for natural language are doing summarization, translation, question answering, spoken user interfaces. All of these you actually have to understand whatís being said and whatís being talked about. We have to act intelligently. It still seems to be the case that if youíre thinking ahead to what can be done in the next decade as the new kind of intelligent applications that can be delivered. It seems like itís crucial to be able to deliver more intelligent applications than we have at the moment and that they actually have to understand more of the material thatís available. So something like web search, it works pretty well a lot of the time, but if you think about how could I offer a qualitively better experience of whatís going on it seems like at some point you actually have to have a system that can understand more of whatís on those web pages, so it can do more intelligent things and be able to take actions. On the other hand, I mean, thatís kind of being a viewpoint thatís time has, sort of, never quite come.
If you look back on the history of AI it seemed completely obvious to people in the 1970ís that if youíre gonna do anything useful with natural language understanding and processing then you had to have understanding in deep semantic representations and be able to process those to do natural language understanding. I really think that the most dramatic thing, in summary, of the last decade is effectively that people have shown that you can just get amazingly far in producing useful artifacts by understanding almost nothing. That thatís really been the breakthrough in the deployment of natural language technologies in recent years.
But, nevertheless, I donít think that that buys you everything. A kind of low level processing techniques are appropriate when you want to deal with speed and volume and you can get away with doing a very, very limited amount of semantic understanding to be able to address some tasks. You know, there are lots of good tasks out there. I mean, lots of things that people would like to have done for them of having their spelling corrected, finding things on the web, extracting who works at what company, processing peopleís resumes. There are tons of good tasks out there that people are now working on and can do a very good job with very little understanding.
There are also just lots of tasks in which you just have to understand much more to be able to attack them sensibly. I think, actually, one of the key distinctions will effectively be when you want to be able to have computers form tasks where they have independent responsibility for doing low level actions, which might build up to a higher level action. So for something like web search the computer finds candidate documents, but the human is right there to look at the documents and decide which documents are good ones and which ones are bad ones and they read the documents and get whatever information there is to be gotten out of them.
Whereas if you have your computer thatís responsible for higher level tasks like the computer is responsible for going out and investigating what kind of an espresso machine that you should buy and then placing the order for the best price. Well, then the computer really has to understand a lot more about whatís being said on pages to understand whatís a good espresso machine to buy. Okay.
People are starting to do deeper forms of natural language processing and natural language understanding in the context of the kind of information access thatís been so important in recent years. One obvious venue for that, and one that Iíll spend some time talking more about in the later class, is the idea of question answering. The idea of question answering originally came out of the information retrieval community as whatís a way in which we could give a richer form of information access. The standard information retrieval is an incredibly dumb form of information access. That people have something that theyíre interested in, information on, and the system says I can shovel you some pages that I think are relevant to that information need, but clearly there are much more profound and focused ways of delivering information to a person in response to an information need and one way it seems to cover a bunch of cases is people have some fairly specific question in mind and you can actually give them the answer rather than just shoveling them pages. People have worked a lot on that in recent years and in various times people have surfaced some of that in search engines. So hereís a screen shot from about three years ago, I think, of MSN Search where I can ask it the question of which is the largest African country and it actually just gives me an answer. Answer Sudan, republic in northeastern Africa, the largest country of the African continent, blah, blah, blah, blah, blah. Then one of the questions is how big a need does that fill. I mean, when MSN Search was first announced that this was one of the things that they featured as one of the things they distinctively did. If I skipped forward to a more recent screen shot of Live Search and I give exactly the same query of which is the largest African country Iím now just getting web search results. Itís not the case that theyíve completely turned off that functionality because if I ask what is the capital of Sudan? I am still getting this Sudan capital, Khartoum, relation extraction kind of answers coming back. Though you might wonder about immediately after that then is this a useful link? Which is, I guess, Iím doing a little bit of research on the side to decide whether this is actually the kind of functionality that people want to have or not.
At any rate, I mean, itís very easy to then move into spaces where you just ask slightly richer questions and these systems arenít giving you answers that are kind of real answers. Which countries does the Danube flow through? Thatís then not being recognized as any kind of special question and itís just giving me web search results. One of the interesting things, and this is why web search engines have just been so, so successful, is it turns out that providing youíre in a situation where the human can just look at things and interpret the answers rather than the machine having to automatically take the next step itself.
A surprising amount of the time that web search just works, right? Because if we look at these results, I mean, here we go. The Danube flows through nine countries, Germany, Austria, Slovakia, Hungary, and while the snippet could be a bit more perfect, but it seems like if I click through right there from the first result I can get the list of answers. This is just an aside, but if you look down a little at the results you see that the third answer says the Danube flows through ten countries. So thereís some interesting issues on information provenance there as to who you want to believe when youíre going on the web, but, at any rate, youíre clearly very directly finding answers inside the web search engine. Okay.
Since I wanted to be keeping up with the latest thatís going on, I also tried that this morning in power set and what is the largest African country? And it turns out that Iím actually getting just no particular value out of that. I mean, itís just giving me web search results as well and it seems like the web search results itís giving are actually worse than the ones that I was getting from Live Search in telling me the answer to the question in its snippets. Okay. Okay. So the comic version of what happens with our current natural language understanding systems corresponds to this comic of what we say to dogs. ďOkay, Ginger, Iíve had it. You stay out of the garbage. Understand, Ginger? Stay out of the garbage or else.Ē And what our computer system understand is ďblah, blah, Ginger, blah, blah, blah, blah, Ginger, blah, blah, blah, blah, blah.Ē
But the funny thing is that although our computer systems understand kind of in a sense nothing that thereís this funny sense in which a lot of the time with a lot of the kind of bag of words representations that we deal with they actually do a surprisingly good job at answering these things. What I thought Iíd just, sort of, do for a few minutes then is go back in time and tell you how the world used to be. For itís, sort of, a funny way in which in recent times thereís been all this exciting work in doing machine learning and building systems up from large copper and lots of probabilities and abilities to do interesting things at a large scale and that work very well over large amounts of data.
At the same time does that work? I mean, thatís kind of led to a sort of everybody forgetting how the world used to be and how in the old days people did used to build systems which were much more hand built systems that actually could in various ways do interesting semantically precise processing. So Fernando Pereira is these days a well-known machine learning probability in OP researcher. Heís one of the people centrally involved in conditional random fields and I briefly mentioned doing MEMMís and heís actually working at Google right at the moment.
But way back when, in 1980 Fernando Pereira was working on his Ph.D. dissertation and what he worked on then was this prolog system, which he called Chat-80 because it was written in 1980. Its goal was to do full natural language understanding in a restricted domain, so it can do question answering. There have been several kind of famous systems of that era that have been kind of key, kind of natural language understanding systems that were developed, but it was one of the very well-known ones. Which actually had a surprising longevity of usage. I mean, the most recent use I know of is I was sitting in a conference in 2000, 20 years after this was written, and these people were presenting a paper about a system where you could ask questions about train routes and schedules and it turned out that what they were actually using on the back end to do their processing was still a version of Chat-80 and that they had sort of done development of a different Lexicon and grammar for that system. Okay.
So Iíll just show you Chat-80 for a few minutes. This is something where just real soon now Iím gonna have to do some remedial work because it turns out the version of prolog I have for this only runs on Spark machines and weíve just about reached the point where Stanford has gotten rid of their very last Spark machine. I guess for next year Iíll have to get hold of a version that works on Linux, but fortunately for me Iíve still got until June 16th to be able to give this demo on a Spark machine so Iíll do that. No, wait, I have to load it. Thatís a good idea. Okay.
So this has a database of geography. The idea is that you can ask various kinds of questions about geography and it will give the answers. So this is kind of America. Which countries border Iraq? Told about how bad geography knowledge is in the United States. Which countries border Iraq?
Student:Syria and Kuwait.
Instructor (Christopher Manning):Syria, Kuwait.
Instructor (Christopher Manning):Iran.
Instructor (Christopher Manning):Turkey. Was that a vote for Jordan? Any more? Weíre up to five. Okay. Iran, Jordan, Kuwait, Saudi Arabia, you missed Saudi Arabia. Okay. And then you can ask more difficult questions. You can say which countries that border Turkey border Iraq. Okay. Whatís the answer to this one?
Instructor (Christopher Manning):Syria, yeah. Any others?
Instructor (Christopher Manning):Iran was that? Yeah, Iran and Syria. Okay. There are lots of other kinds of questions that you can ask it. I mean, you can ask it what is the total population of countries south of the equator. 409 million. Itís probably grown a bit since 1980. So one of the things that youíll note for this is itís not very up-to-date on modern world stuff. This is still in the era when the Soviet Union existed. All right. So I can ask which countries border the Soviet. And it also has a little cheat that for multi word things it doesnít identify as multi word things and you have to use an underscore. You can ask all the obvious stuff. What is the capital of England? It knows about that. You can kind of make them more complex.
So I could take my earlier example. Total population of countries south of the equator and not in South America? Right. No, it didnít like that one. What is largest country in South America? Brazil. Yeah. I can ask that. Iím not quite sure why the one I just asked got and I donít understand, but thatís absolutely a property of the system that I will not deny for a moment. This system has got a fix grammar and a fix Lexicon and providing you stay within the boundaries of that you can ask quite complex things and it really does manage to work them out okay, but if you do anything that goes outside the bounds of that grammar by some means then itís just no dice and you get I donít understand and thatís the end of the story.
So you can ask it to give more information about how it does all of this. You can ask for a trace, but maybe before I do that Iíll just show you a teeny bit of whatís there. So effectively, what does this system run on? That thereís little databases where the little databases are data log for those of you who have done your 145 stall class. So here are facts about countries. So you have these predicates about countries where you have the country, the regency in latitude, longitude. Thatís why you can ask about things like south of the Equator. Area, population, capital, and currency. These are facts about countries and then there are some other databases that tell you facts about other things and thereís a top level one of world zero dot PL where various relations define and various other things.
In particular, what you can see here is some of the kind of access of predicate. These access of predicates are kind of like having views on the database. If you want to know whether something is a country youíre basically seeing whether it fills the first slot of the country with relation or not. You can use adjectives like African and thatís then being realized as an incontinent relation and you can work that out from those relations. Relationships are worked out like that. Top level is then Lexicons and grammars. Get the files going. Okay.
So that the actual Ė okay. Hereís essentially where itís working out a Lexicon of different words and this then. A sentence can be a declarative sentence with a period, a WH question, imperatives. You then have how to expand out a WH question. Then down here we had structures and down phrases, so itís got a very explicit grammar like that. Okay. So once I have trace on it if I then ask it a question like which countries border Mexico? I mean, what we get is that what the system does is it builds up a paths tree, so this is showing a paths tree. Itís an indented list representation. Itís a WH question, which has the WH variable out the front here. The thing that weíre gonna want to find is the which and then thereís a sentence with a noun phrase, so itís plural.
Which countries and hereís the country. And then itís the verb border and then you have the argument MP thatís the object Mexico. It builds up a form of syntactic paths tree like that. Then what it does is converts that into a semantic representation answer. Itís a country and it borders Mexico. It actually has a little query planner in there like a database system, which decides whatís an efficient way to answer this query. It works out how to plan the best query result and then eventually here we get our answer back again. Okay. Maybe I should just show one other type of question. You can actually ask list questions in it as well.
You can ask something like what are the capitals of the countries bordering the Baltic. Itíll do all that stuff and then we can actually get back a list here. So Denmarkís capital Copenhagen, East Germanyís capital East Berlin, etcetera. I guess theyíre also yes/no questions. Does Afghanistan border china? Is it a yes or a no?
Instructor (Christopher Manning):No, thatís wrong. Itís a yes. Yeah. Thereís a little bit of China that sticks off. Okay. So thatís the Chat-80 system. Iíll do one more. Which is a long one that I knows work providing I can spell it correctly. Which country bordering the Mediterranean borders the country that is bordered by a country whose population exceeds the population of India? Wait, I didnít get it right. Eventually I can answer that question. Maybe Iíll just leave it now and go on. Okay.
Provided you stay inside the grammar of quite complex questions and that kind of structure it can give answers to it. Okay. This is just, again, showing the kind of predicates there and then how you can have one of these effective views on the database where you can get facile about things. Okay. Thatís then kind of the picture of how you can build semantically precise systems, which actually take sentence structures and understands stuff out of them. Some weíll spend today and then in more detail next Wednesday after the day off on Monday. Talking about how people build these kind of deep understanding full semantic systems.
In a way, this is a step away from as to what we do for the rest of the course since really the kind of methods that are used here are mainly logical symbolic methods of being used to develop these systems. Thereís just recently started to be some interesting work where people have started trying to combine some of the ideas of these kind of logical meaning representations with using machine learning methods to acquire them and infer them, but to some extent that thatís still very recent. I mean, if youíre actually at the level of saying, ďOkay, I want to build a system that can actually do full sentence understanding in the way that Chat-80 delivered.Ē Itís not really that thereís a lot of off the shelf technology to do that thatís come out of the last 20 years of what people have been doing a lot of machine learning.
There are obviously ways in which technology has developed since Chat-80. People might give you a Gulliant face and people might have the system hooked up to a website that has facts on it rather than using a little prolog database, but thereís kind of, in a sense, in terms of actually building a system that does full understanding of sentences like that. Thereís not a huge amount thatís very fundamentally changed since Chat-80 was built. Okay.
The kind of space weíre in is really similar to the space of when people were building compilers and doing interpretation of formal languages, except weíre gonna take the same ideas and then apply them for natural languages. If youíre building a compiler and you want to understand arithmetic expressions, well, what do you do? You path up your arithmetic expressions by having a syntax of those arithmetic expressions and then what you want to do is to be able to build up semantic interpretations for these arithmetic expressions. How do you do that?
Well, the conventional way that people do that is they have a grammar that will build up these kind of syntax trees and thatís what you write in a tall like bison or yak or something like that. Then hanging off the walls of those grammars, you then have semantic attachments that say how to interpret the various pieces. You would be looking at this kind of syntax tree and here what we have is just constants 3+5*6. You have these semantic attachments, which would then say how to interpret these things. This means multiplication, this means five and six, and, therefore, here we can then evaluate this modification of five and six. The meaning that we evaluate to here is 30 and then weíd add three and the meaning that we evaluate to is 33. Hopefully that seems familiar.
Thatís the kind of easiest case where we have constants and so we can evaluate a compile time and just work out the meaning of that is 33 and put 33 into our system. I mean, the more typical cases that we write in arithmetic expression is something like this 3+5*x. Then what we want to do is say weíre still gonna have semantic attachments off our syntax rules and those semantic attachments are gonna tell us how to work out the value of the whole thing. Now the meaning of the node canít be a fully evaluated answer and instead has to be a little piece of code that will tell us how to calculate the answer. Now, what weíre gonna have is the meaning given by semantic attachment here. Youíre going to multiply by five whatever the value of this is thatís determined when the program is run and then youíre going to add three to that.
We can still work out a meaning thatís a semantic attachment going up, but itís unspecified because we donít know the values of things. Okay. The general idea of what youíre doing here in natural language processing, this is referred to as rule-to-rule translation. The idea is we have syntax rules, so hereís our syntax rule. Expression can consist of an expression, a function, another expression, and then we have a semantic attachment attached to this syntax rule saying suppose you know the meaning of these three things. Iíll tell you how to calculate the meaning of this bigger thing. All right. Thatís the semantic attachment and thatís referred to as rule-to-rule translation because itís knowing these rules meanings. We then give a rule to method of calculating this rules meaning and you work compositionally up the tree. Okay.
Essentially the picture that weíre gonna build is letís work out how to do that same thing for natural language sentences. In some sense, weíll refer to that as understanding the sentence. I mean, this concept of what counts as understanding is actually a very tricky philosophical issue. Iím not gonna spend more than one slide on it, but Iíll just mention it. This means that we understand this arithmetic expression and weíve calculated doing the semantics for it as to what these arithmetic expressions mean. Thereís a, sort of, sense in which that seems to be true, but in another sense you could say, well, no, youíve taken one bunch of symbols and youíve converted them into another bunch of symbols. Why is that understanding anything?
Thatís this question of then what counts as understanding. So, I mean, there are various ways that you can think of what understanding means. One way of thinking of semantics is operational semantics. If you can carry out actions that means that you understand things. If the person can type throw axe at dwarf and that happens, well, that means that youíve understood it. In logical philosophical work a very common idea is this notion of true conditional semantics. If you can determine the truth of the statement relative to a world that means that youíve understood it. Okay.
Similarly, if you can answer questions that means that youíve understood it. For many years, people have suggested in natural language processing that ability to translate is proof of understanding. If you can take something and translate it into Chinese that means that youíve understood it. If you start thinking about some of these notions theyíre very, very slippery, right? Because while we worked on machine translation and, okay, the machine translation systems that you guys built were very good, but hopefully your model two output of the decoder was getting to be almost intelligible sometimes. The bigger peoples statistical empty systems and sometimes they can translate sentences perfectly into different languages, but if youíre thinking about what you built for programming assignment two you probably donít feel in your heart of hearts that your system understood the language.
It was just doing these probability tables and sometimes something okay came out at the end of it. Itís very unclear what then counts as translation, but, nevertheless, the goal that weíre gonna work with at the moment is this traditional goal of understanding means translating into formal logic. Essentially a first order prepositional calculus version, predicate calculus version. The picture that Iím then gonna show you next Wednesday is that, at least in certain circumstances, will let us do this top high level thing of being able to perform tasks. We can then fairly readily convert those representations into something where we can quarry a database. Weíll put stuff into a database, so that we have an operational task semantics. Okay.
Yeah. Today Iím gonna say some of the general ideas of compositional semantics as an introduction and then Iím gonna briefly mention at the end semantic grammars that are an alternative that have sometimes been used. Next Wednesday Iím gonna talk in detail about how you can build compositional semantic representations. Then for the final week of the Ė and then Iíve sort of been saying nothing about representation reasoning. Then for the final week of the class on Monday Iím gonna talk about lexical semantics, which is how do you actually work out what words mean because as well as you have to combine meanings you have to know what words mean. Then on the final class, Iíll talk a bit about some of the high-level semantic applications that people have been working on of doing question answering, semantic search, textual entailment, and problems like that. Okay.
I know thereís this problem that people who do computer science and similar subjects often end up feeling like they sit through five different introductions to logic, so Iím not really gonna do much of introduction to logic and assume that youíve seen logic before. Basically what I want to work towards is how can you construct logical representations? Which, in general, in logic courses no one ever says anything about. Often in logic classes in the first two weeks there are exercises where youíre meant to take sentences in English and convert them into logic. The way you do that is you use your brain in serious ways and write down whatís the correct logical interpretation. Then for the rest of the semester you work with predicate calculus P implies Q of X or YZ or something and no one talks about human languages ever again.
Which is, in some sense, wrong. Historically why formal logic was invented was it was meant to help people understand human reasoning and human languages. That was actually the motivation, but itís sort of tended to be lost in the logical tradition when people have worked with formal representations. In particular, most of the time people just donít actually work out how can you do any kind of automatic translation from human language sentences to logical sentences. Thatís what weíre gonna look at. Okay. Nevertheless, what are we gonna have underlined here? What weíre gonna have underlined here is weíre gonna Booleans, you know about those, we have individuals, so there are things like people, might be other things like times, and then weíre gonna have functions of various times our predicates, so frog(x), green(x), and, in particular, weíre gonna cause predicates once, which have a Boolean return value.
So is it green is the predicate. Okay. Predicates define sets who are the individuals that satisfy the predicate. If the predicate has only one argument, as a lot of the ones we look at are, Iím gonna call them properties. Essentially we can make more complex functions out of simple functions. In particular, with the kind of techniques that people use to represent natural language meanings and the construction of natural language meanings tend to make quite a lot of use of higher order functions. Higher order functions are then functions that take functions as arguments and maybe return functions as their values.
To be able to do that we use lambda calculus. How many of you guys have seen lambda calculus in another class like programming languages? Some, but not all. Okay. When we write functions or methods in Java, right? That we give them names, so we have a name and it takes some arguments and then it maybe returns the value. Okay? We have a function name, right? But in some sense the key thing is the behavior of the function and we should be able to have functions that can do some things without giving them names. The idea of lambda calculus is itís a notation for being able to produce and process functions that donít happen to have names. The advantage of having functions that donít have to have names is that you can just invent them on the fly as you need them. All right.
You could write down a function thatís a squaring function and you say itís sort of square of X and itís value is that it returns X*X, but you should also be able to just write a function that doesnít have a name. Hereís the lambda function for swearing function. So youíre writing a lambda before each argument of the function, then you commonly write a dot or a square bracket or something like that, and then youíre saying the value of the function after that, which is whatíll be returned. This takes one argument and calculates P*P and thatís the value that gets returned.
What weíre gonna be often doing for our natural language understanding is constructing these kind of lambda functions. Once we have a function like lambda P, P*P we can then take this function and do function applications, so we can give it an argument three. The way that we then evaluate this function is we substitute three for the argument P and then we evaluate this 3*3 and we return that as the return value of the function. Okay. That make sense? I can just hit spacebar and go on? Okay.
The crucial thing to note is in reality lambda functions only take one argument at a time. There isnít any inherent notion of functions that take multiple arguments in lambda calculus. What you do instead is you fake functions that take multiple arguments. The way you fake functions that take multiple arguments is you make use of this notion of higher-level functions. So suppose I want to have a function that multiplies two numbers. Well, the way I can have a function that multiplies two numbers is I can write a function that takes a number and what itís gonna do is return as its value a function. This function is gonna be a function that takes a number and then returns times XY, does the multiplication. If I take that function and then give it two arguments times(5,6) whatís gonna happen? Well, whatís gonna happen is Iím gonna take this function and apply it to the value five and itís gonna return as its value lambda Y*XY where X has become five. So that will be lambda Y*5Y and then Iím going to apply that function to the value six. So Y will become six and Iíll return times(5,6). Okay.
I can make use of the idea of higher order functions to simulate multiple argument functions. Okay. That process is referred to as currying after a logician whose name was Curry, maybe not surprisingly. Okay. For our individuals we have some of them are going to give names as constants, right? So that we can have a constant thatís a person, which is George Bush. We can have a constant that is a named predicate, so whether something is red that canít be a constant. And then we can have constants that are kind of predicates and arguments. We have love predicate, where itís a known predicate of two arguments and we have facts in the world about whether people love each other.
Essentially weíre going to assume that for things that we know about we have little database tables where we can look up who loves each other. Okay. Then we can go on from there and we can also then make representations of more interesting kinds of function. Hereís the kind of interesting function that we can build. We might want to build a most function. Well, what weíd like to be able to say is most elephants are big. When weíre saying most we have two predicates that are involved in the most representation. So we have the elephant predicate, which is what weíre talking about, and we have the big predicate.
In both of those properties there are things that limit the set of individuals. Theyíre the set of things that are elephants and the set of things that are big. We want to then take those two functions and then make a most function that operates over them. Here, again, weíre into the space of higher order functions. Our representation of most is going to be that most is, I think itís kind of easiest to see here, but most is gonna be something that takes two functions, two properties, and then its going to itself be returning a Boolean value, which is saying is it true that most pigs are big? Okay.
So if we want to evaluate that how do we do it? Well, the way we do that is weíre gonna have this. If we look at this function it picks out in our world a set of things that are pigs and this function picks out a set of things that are big and what we want to know is if you look at the set of pigs and then you ask which ones of those are big that over half of the things that are pigs Ė I had elephants, sorry, before. Whatever. Pigs, elephants. Over half of those things are big. Okay. Just like in our example of 3+5*6, we can write a little way to calculate whether things satisfy the most representation.
The crucial thing to note here, a slightly different way of doing things, is normally you donít see this when you do a predicate logic course. What you see are these two quantifiers, which I hope you remember well. Hereís the four quantifier and thereís the exist quantifier. That you are using this categorimic notation where youíre introducing all this special stuff to represent the two quantifiers of their existence for all. When weíre doing natural language translation by and large we donít want to do that. We want to be able to treat everything as a function. We want to have an all function and an exist function. The reason why we want to do that is because building functions is something weíll easily be able to do in our translation. Then stuff like this will be how we evaluate a function to see whether itís true.
The existing pig big function will be evaluated by looking at the set of big things and seeing if thereís one thing or more in it thatís a pig thing. Itíll be a method of evaluating this function, so thatís equivalent to this, but will just be seen as a method of evaluation. As well as the fact that we can generate these things to translate, the other reason we want to do this is because natural languages donít only have their existence for all. Thereís, sort of, a sense in which those are fundamental for the purposes of logic, but natural languages have all kinds of quantifiers. They have ones like most and a few, but they also just have an infinite space of quantifiers. At least seven with a quite a large number for a small country.
You can make quantifiers of all kinds of sorts. Okay. The next thing you need to know about natural languages is as soon as you have quantifiers, quantifiers in natural languages have scope. In the semantic level itís kind of just like what we saw at syntax level with prepositional phrase attachments. We also have the semantic level you get these ambiguities of interpretation. Okay. Hereís the funny Groucho Marx example. ďIn this country a woman gives birth every 15 minutes. Our job is to find that woman and stop her.Ē This sentence has two possible readings. Okay. Which is the funny reading? The first one, right? There exists a woman and she gives birth every 15 minutes. And this is the kind of normal reading where for every 15-minute interval thereís some woman who gives birth in it.
Itís exactly the same situation of just like our PP attachments. The funny thing is, this is just the way natural languages are. That you get these scope ambiguities. Most of the time they donít cause people trouble, but if youíre thinking for the formal understanding perspective basically whenever you have quantifiers you get a number of readings, which is exponential in the number of quantifiers and thatís just how the world is. Iíll show another example of that later. Okay.
But what I want to talk about for a moment is this method of constructing compositional semantic representations. Today I just want to sort of do a very simple example that gives the general idea and makes life look very easy, which makes the connection between the programming language semantic interpretation and what we want to do for natural language. Then next week weíll start looking at the cases where things get rather more complex and you actually have to be able to do tricky things with lambda calculus to make it work.
Our basic model of what weíre gonna do is we get a sentence, we pass it through to get a syntax tree, we have a Lexicon where we can look up a meaning representation for each word, and then what we want to do is walk up our syntax tree and calculate a meaning for each bigger thing. The way weíre gonna be able to do that is attach to each syntax role thereís gonna be a semantic attachment saying how you can combine the meanings of the lower level parts to make the meaning of the parent load. If we start at the bottom and work up weíll be able to apply those rules and come up with a meaning for larger level units.
The idea that you can do that is referred to as the principle of compositionality. By in large, basically everyone believes in the principal of compositionality because if it werenít true natural language understanding would be impossible. We can construct new different sentences every day and people understand them. How can they understand them? They know what the words mean, they know what it means when you put them together, and if youíre saying more complex sentences like Sue claims that Bill said that Fred thinks that Joe acknowledges that Amanda claims that blah, blah, blah, we can just compositionally work it out in terms of the pieces.
On the other hand, compositionality isnít quite true in all places. There are various places from idioms onward in which there seem to be meanings for units, which arenít easily expressed in terms of meanings of the parts. Okay. So how can we realize this notion of having semantic attachments to syntax rules? One famous early way of doing this was the representation of definite clause grammars. Definite clause grammars are getting us back to prolog technology and what we saw for Chat-80. One of the cool things about prolog, if youíre a natural language person, is that prolog came built into it as a syntactic sugar where you could write context free grammars and you just write down the context free grammars and then you could just pause lists of terms in terms of those context free grammars without actually having to do any extra work for yourself.
This is how you wrote down a context free grammar in prolog, which is fully equipped with what weíve seen the context free grammars. Sentence goes to noun phrase, verb phrase. Noun phrase goes to proper noun. Noun phrase goes to its determiner noun. Very phrase, verb, noun phrase. Hereís how you wrote the Lexicon. The items of the Lexicon were written as one element lists. If you just wrote that into prolog and then you asked the question sentence [list of John ate a lion] or something like that it would pause it and you get a paths tree. The thing that was slightly cooler than that about what you could do with these definite clause grammars that the definite clause grammars were actually being translated into terms, so that you had a term like sentence of X with arguments.
Rather than just having a plain CFG where your symbols were atoms, you could actually give your symbols arguments. Hereís a slightly better CFG and this is a CFG that actually enforces subject verb agreement. Iím not sure if this worried anybody, but itís something that might have worried you when you were building your parses for the third assignment was the grammars that we were using didnít actually do anything to enforce agreement, right? If you gave them sentences like the man eat or lions eats they paths them perfectly well. They werenít attempting to check for agreement in any sense.
You might like a grammar that can check for agreement and see that youíve got correct subject verb agreement. Hereís how you can do this with a definite clause grammar. I can change my sentence rule to be adding now an argument, so I can say that the sentence consists of a noun phrase, which has some number and a verb phrase that has some number. Just by writing the same number there those two variables will be equated in prolog and both the noun phrase and the verb phrase will have to have the same number. Okay?
Then for the rule for the noun phrase and the verb phrase I essentially just inherit numbers. Iím saying the number of the noun phrase will be the number of the proper noun or the number of the noun and if thereís a determinate it should share number as well. The verb phrases number will be the verbs number, doesnít matter what the noun phrases number is. Then down in my Lexicon Iím then saying facts about the number of things. Iím saying that Mary is a singular proper noun, but can be either a singular or a plural determinate. Lion is a singular noun. Lions is a plural noun. Okay.
With just this grammar, I can now send it back into my prolog system and if I say the lion eats it will paths correctly because eats is singular, lion is singular, those will get inherited, and, therefore, this will match. Whereas if I try and give it the sentence the lions eats then this will be singular, that will be plural and so the top level S rule just wonít match. Itíll return that that sentence isnít accepted by the grammar. Okay. So thatís kind of cool and so that leads us down to this space of writing feature based grammars because I can then start thinking of other things where I want to put features into my grammar and have those features checked as I paths.
One particular kind of feature that you can put into your grammar is you can actually then say, well, why donít I make use of the fact that I can put variables into my grammar and why donít I put variables in that actually have a meaning of the sentence. This is the idea of these semantic attachments. So here I now say Iíve got the sentences meaning and Iím gonna do rule-to-rule translation and how am I gonna do that? Well, Iím gonna have the noun phrase has some meaning and the verb phrase has some meaning and then the one other thing I could do with the DCG is put something after it in these curly braces and that meant this is the semantic attachment for this role.
The semantic attachment of this rule said combine the noun phrase meaning and the verb phrase meaning to give the sentences meaning. The way I do this combination is simply I do function application. Iím gonna take the meaning of the verb phrase as a function, apply the meaning of the noun phrase to it as an argument, and Iím going to return the result as the sentences meaning. Prolog was all written in relation late. There werenít actually any functions in prolog. Everything was a relation, but the applied predicate is saying take this function, apply this argument to it, and that sits return value and thatís what Iím returning. Okay. So my grammar is just using function application everywhere to combine meanings here and then my Lexicon now has meanings for terms and my Lexicon is being written using lambda calculus.
John and Mary here are just constants, but for jumps Iíve got my lambda expression, so itís a function. Lambda X jumps X and for loves Iíve then got this higher order function that takes an argument and turns another function, which takes an argument, which returns loves as its value. If Iíve got this, now I can actually work out the meaning of the sentence. If you look at the meaning thatís constructed from the sentence in my DCG and I want to have a meaning for John loves Mary, well, the Lexical items have some semantic translation. The name of the noun phrase just say our meaning is the same as our one childís meaning. This goes up and simile the verb. The first place something interesting happens is with the verb phrase.
The verb phrase of semantic attachment says take this function and apply this argument to it and my meaning is the result of that. You take Mary, Mary becomes Y, the return value is lambda X loves X Mary, and thatís returned as the value. Then you do the same thing with John and you get loves John, Mary as then being returned as the meaning of the sentence. That seems kind of hopeful. Now, we can work up the meaning of the two sentences John loves Mary Ė well, more sentences. John loves Mary, Mary loves John, and Mary jumps and John jumps. Okay.
Weíve got a four-sentence language and everything gets worked out correctly. This is essentially what Chat-80 was doing and the kind of technology weíre gonna do. However, this case here is very easy. Iíve just taken names and represented them as constants and then theyíre just combined in the obvious way by a verb phrase as the function. All of the trickiness comes in because of the fact that a lot of the time natural languages just arenít that easy. Now, I want to say a word about that, but before I do there is this slide.
I showed a little bit of DCG notation there just because it connects up to Chat-80. To do this you donít have to use DCG notation. You can just say Iíve got a context free grammar rule just like before. Iím gonna put next to it a semantic attachment, which is saying how Iím gonna work up the meaning corresponding to the parent in terms of the meaning that it corresponds to the children. Just like my little baby example, normally the form of these semantic attachments for natural language grammars is inherently trivial. We identify one argument as the semantic head and it takes other children as arguments and thatís essentially all they do. The semantic attachment is normally just function application.
All the trickiness actually comes down in the Lexicon of coming up with appropriate semantic representations for different words, so that you can combine the meanings together by function application. Okay. All right. Mary loves John is really easy, but the world gets more complex very rapidly. Hereís just one more example of how the world gets more complex. A few years ago we actually worked a little on this system that still seems to me a cool idea. In the olden days, in the GRE exam there used to be an analytic section, which was multiple choice, which had various kinds of questions. One kind of which was logic puzzles.
Theyíve now gotten rid of that and now have this incomprehensible analytic essay that I think no one knows what to do about in graduate school admissions. I think in the LSAT they still have these logic puzzles. At any rate, commonly computer scientists are the kind of people who when there were young did logic puzzles for fun at some point, so most of you have probably seeing these kind of logic puzzles. Here we have six sculptures. C, D, E, F, G, H are to be exhibited in rooms 1, 2, and 3 of an art gallery. Sculpture C and E may not be exhibited in the same room, dot dot dot. Then for each scenario there are sequence of questions, which in the GREís start is the easiest one and got a bit harder.
I thought this was a cool problem. For the GRE, the hard part of this is meant to be solving the logic puzzle. That the natural language understanding is just a God given skill that all human beings have, but solving logic puzzles that is hard. Whereas if you think about this as a computer science problem, if you can represent the problem of the logic puzzle, solving it is trivial. All right? So all of these questions are over such small domains that you donít have to do anything smart. You can just use brute force. You can just evaluate every possible assignment and see whatís possible. If you paid attention in 221 you might think, well, this is like a constraint satisfaction problem. Maybe I can use ford changing or R consistency or one of those things and itís true you could.
You could come up with the answer in .001 seconds instead of .01 seconds, but really that part just isnít hard. The hard thing is how could you possibly take this natural language description and turn it into something that you can actually construct a model from, so that you could run your model checker or your R consistency algorithm to see whether it holds. This is, again, the kind of place in which we get into all of these troubles and translation to logic becomes difficult for all sorts of reasons actually. Just to mention one of them again that I all ready mentioned, this notion of quantifier scope and interpreting quantifier scopes.
Now, the people at the ETS who build GRE questions try really hard to make them unambiguous. Thatís part of their job because if the questions are perceived by regular people as being ambiguous then upset parents sue them and things like that, right? They really, really, really want to have unambiguous questions. The fact of the matter is that natural language you just canít do that because there are latent ambiguities. This is just illustrated for that example. The first sentence is at least one sculpture must exhibit in each room. Well, there are two quantifiers. At least one sculpture each room, so there has to be multiple readings, right? And there are, right?
Thereís one reading in which there exists at least one sculpture and for that sculpture necessarily in every single room that sculpture is exhibited. Now, that oneís kind of ruled out because itís just impossible to satisfy in the real world and human beings donít pay that one that much attention. That means that somehow we have to be able to consider the semantic interpretations and work out which readings to exclude and settle on the right reading. If we then go on to the next sentence of no more than three sculptures may be exhibited in any room it turns out that we get more than two readings here because you also get extra scope ambiguities when you have elements like may.
May and various other modal operators, that they also introduce scopes. There are at least three readings that you can get for this sentence. One reading is for every room itís the case that there are no more than three sculptures that are exhibited in it. Thatís the reading that youíre meant to get, but you can get other readings as well. These arenít readings that you can kind of rule out as impossible to satisfy in the world. Another reading is that, in total, at most three sculptures are exhibited. Thatís saying in any room whatsoever there can be no more than three sculptures exhibited and all the rest of them have to be kept in the attic or whatever. Thatís a possible state of the world. Thatís not impossible to satisfy. Itís not whatís intended.
A third possible reading for this sentence is to say that thereís somehow a special set of three or less sculptures and that special set of sculptures can be exhibited in any room whatsoever, but for all the other sculptures there are restrictions on which rooms youíre allowed to exhibit them in. Again, thatís a possible state of the world, which canít be sort of ruled out categorically. Somehow you have to be able to build and consider these representations and work out how to possibly evaluate them and so that starts to make the natural language understanding problem look much harder. In particular, none of those questions about scope were represented whatsoever in our syntactic representation of the sentence.
If we take the first sentence, at least one sculpture must be exhibited in each room, and we give it to our paths, the path is going to give a syntactic representation to that sentence. As we build our PCFGís, itíll just return the best one. Now, admittedly, there are other paths representations for that sentence, but the different paths representations arenít meant to be representing different semantic structures. All of these semantic differences are being represented at a level beyond the syntax. The syntax isnít really specifying all the details of the semantics. There are other things yet to figure out.
In general, itís the case that traditional syntactic grammars donít really represent the semantics of a sentence in a straightforward way. That they show the syntactic building blocks of that sentence, but itís just not their job to say, well, what is the scope of a quantifier? Whatís the scope of negation? Which things are predicated? Which things have the long distances? Theyíre not doing that. Because of that, taking a syntactic grammar and then coming up with an interpretation in terms of semantics is a difficult thing. If you want to do that task, the way you do that difficult thing is you write complex lambda expressions and you write complex combination and imprint print rules, so you can get out semantic meanings even though that they can be quite distant from the syntactic form.
Next time Iím gonna talk about how to do that, but briefly just for the end of this time I want to mention an alternative. Some people think that thatís much too much work to do. The alternative thing that you could do is just change your grammar, so that the grammar is trying to directly reflect the semantics. At that point you kind of give up on your grammar being a good representation of the syntax because youíre effectively mucking with the grammar to make it look more like your semantic representations. This is known as semantic grammars.
Now, semantic grammar is a dumb name Iíve always thought because, I mean, a semantic grammar is just a grammar like any other grammar. Youíre writing a context free grammar. The semantic part is all in the person who wrote the grammars head. Theyíre saying Iím gonna try and write the grammar to make it easy to get out the kind of meanings that Iím interested in. Where these grammars have been very widely used is where you want to do some limited kinds of semantic processing normally in restricted domains. They can be used in things like spoken dialogue systems.
Maybe Iíll just go straight on and show the examples. Hereís a famous old example of that from 1978 on the Lifer system, which was another system for natural language understanding. Its goal was to be able to answer questions about US Navy ships, so you can figure out who was funding this one. In the general space that this was in, and I guess also the space that Chat-80 was in, again, an interesting thing in the way the world has gone. In those days doing natural language interfaces to databases was seen as a hot area and a place that natural language could be used. Clearly thatís just not something thatís taken off at all.
In practice, environments came along and everyone uses quarry builders and Microsoft Access or something like that and basically no one uses natural language findings to databases. Hereís the idea of the semantic grammar. At the end of the day we have a task of weíre going to be answering questions like what is the length of Kitty Hawk class ships? We could write a regular grammar that could path sentences of these kinds, and thatís what Chat-80 did. Another way of doing things is to more directly look at the kinds of questions you want to answer and just write a grammar around those.
Here the grammar says a sentence is rewritten as present the something of ship, right? Present is you can say what is can you tell me or just tell me? Iím making no attempt to really understand, give a good syntactic structure to sentences. Iím just saying, well, thereís various kind of ways you can ask questions about things. What is can you tell me, tell me. Iíll just write them down. Attributes say things like link, beam, and clasp. Ship is the ship name. Here are some ship names. I have this kind of grammar thatís specific to answering the restricted space of questions about this application.
Most of my categories arenít things like noun phrase anymore. Most of my categories are things like class name or attribute or this present thing. Words are recognized largely by their context. Itís not really saying that theyíre nouns, but theyíre a kind of ship. You have this strongly directed recognition of semantic categories, but you donít have any kind of general reusable grammar. These systems are still commonly used. I mean, if you want to just do a restricted domain recognition task, writing these kind of semantic grammars for a restricted range of English is just the easiest way to do it. It comes at a high cost, since you have this very specific unreusable thing thatís very directed to a particular application. Okay.
Iíll stop there and next time come back to the general problem of how you can kind of translate syntax from the semantic forms.
[End of Audio]
Duration: 76 minutes