Instructor (Julie Zelenski):That it felt it was too long, I asked too much of you, and it decided the best thing I could do for you was just to throw a page out of the list. So you will notice the handout that we gave out on Wednesday, it goes straight from Page 4 to Page 7 I think or something like that. It doesnít really quite make sense when you read it because itís in the middle of talking about one problem, and then suddenly itís talking about something else. We photocopied the missing page and itís in the lobby on the way out. You can also just get it from the web. The PDF thatís up thereís complete. You can print the middle page out by yourself or just grab the whole PDF to take a look at, but you do wanna have that handy when you start working on the assignment because it will be quite mysterious if you try to read it as is with that missing page.
What are we going on to talk about? Iím gonna talk about functions as data a little bit and client callbacks, and hopefully a little bit of introduction to recursion. We wonít get very far in recursion today, just enough to get some terminology and see a couple simple examples, but recursion is gonna be sort of the entirety of next week, so weíll have lots of time to soak this in and become masters of it. And that reading that matches with that is basically Chapter 4, 5, and 6 in order, which is all the recursion chapters in the text.
As you can probably tell, Iím still working on getting rid of my strep. Technically, Iím not contagious. Apparently, once youíve been on antibiotics for 24 hours, youíre no longer spreading the germs, but Iíd just as soon not want you to hang out with me today more than you need to. Iím not sure itís good for you or for me for that matter, but I will be actually in my office for a while after class, so if you were hoping to come see me on Wednesday, and you couldnít because I canceled my hours or you just have something on your mind, feel free to walk back with me and talk a little bit today. Anything administratively? Way in the back?
Student:[Inaudible] that handout [inaudible]?
Instructor (Julie Zelenski):Oh, it will. Heís asking about solutions to Section 2, and weíre Ė I bet you Jason could have it up before the end of the Ė he could put it up right now. How about that? We try to hold them back until actually after people have had section, just because it helps to kind of increase the mystery, keep the suspense, but then sometimes we forget to kind of patch together. So if we ever do that, send us an e-mail. Jasonís actually your section go-to guy actually, so if you actually have questions about the section materials, he is the most on top of that, more so than I am. Anything else? Okay, so thank you for coming out in the rain. I appreciate you trudging over here and risking your health and life. So Iím gonna do a little diversion here to explain this idea of functions as data before I come back to fixing the problem we had kind of hit upon at the very end of the last lecture about set. So what Iím just gonna show you is a little bit about the design of something in C++ that is gonna helpful in solving the problem weíd run into. What Iím gonna show you here is two pieces of code that plot a single value function across the number line. So in this case, the function is that itís the top one is applying the sine function, the sine wave as it varies across, and then the square root function which goes up and has a sloping curve to it, and that both of these have exactly the same behaviors.
Theyíre designed to kind of go into the graphics window, and theyíre just using a kind of a very simple kind of hand moving strategy. It starts on a particular location based on what the value of sine is here, and then over a particular interval at 0.1 of one tenth of an inch, it plots the next point and connects a line between them. So it does a simple kind of line approximation of what that function looks like over the interval you asked it to plot. The same code is being used here for plotting the square root. And the thing to note and I tried to highlight by putting it in blue here was that every other part of the code is exactly the same except for those two calls where I need to know whatís the value of square root starting at this particular X. So starting at Value 1, what is sine of one? What is square root of one? As I work my way across the interval, whatís a square root of 1.1, 1.2, 1.3 and sine of those same values? And so everything about the code is functionally identical. Itís kind of frustrating to look at it, though, and realize if I wanted to plot a bunch of other things Ė I also wanna be able to plot the cosine function, or some other function of my own creation Ė across this interval that I keep having to copy and paste this code and duplicate it.
And so one of the things that hopefully your 106A and prior experiences really heightened your attention to is if I have the same piece of code in multiple places, I ought to be able to unify it. I ought to be able to make there be a plot function that can handle both sine and square root without actually having to distinguish it by copy and pasting, so I wanna unify these two. And so there is Ė the mechanism that weíre gonna use kinda follows naturally if you donít let yourself get too tripped up by what it means. Just imagine that the parameter for example going into the function right now are the start and the stop, the interval from X is one to five. What weíd like to do is further parameterize the function. Weíd like to add a third argument to it, which is to say and when youíre ready to know what function youíre plotting, hereís the one to use. Iíd like you to plot sine over the interval start to stop. Iíd like you to plot square root over that. So we added the third argument, which was the function we wanted to invoke there. Then we would be able to unify this down to where we had one generic plot function.
The good news is that this does actually have support features in the C++ language that let us do this. The syntax for it is a little bit unusual, and if you think about it too much, it can actually make your head hurt a little bit, think about what itís doing. But you can actually use a function, a functionís name and the code that is associated with that name, as a piece of data that not just Ė you think of the code as weíre calling this function, and weíre moving these things around, and executing things. The things that we tend to be executing and operating on you think of as being integers, and strings, and characters, and files. But you can also extend your notion of whatís data to include the code you wrote as part of the possibilities. So in this case, Iíve added that third argument that I wanted to plot, and this syntax here thatís a little bit unusual to you, and Iíll kind of identify it, is that the name of the parameter is actually FN. Its name is enclosed in parentheses there. And then to the right would be a list of the arguments or the prototype information about this function, and on the left is its return value. So what this says is you have two doubles, the start and stop, and the third thing in there isnít a double at all. It is a function of one double that returns a double, so a single value function that operates on doubles here.
That is the syntax in C++ for specifying what you want coming in here is not a double, not a ray of doubles, anything funky like that. It is a function of a double that returns a double. And then in the body of this code, when I say FN here where I wouldíve said sine, or square root, or identified a particular function, itís using the parameter that was passed in by the client, so it says call the clientís function passing these values to plot over the range using their function in. So the idea is that valid calls to plot now become things like plot Ė and then give it an interval, zero to two, and you give the name of a function: the sine function, which comes out of the math library, the square root function, also in the math library. It could be that the function is something you wrote yourself, the my function. In order for this to be valid though, you canít just put any old function name there. It is actually being quite specific about it that plot was to find [inaudible]. It took a double, returned a double. Thatís the kind of function that you can give it, so any function that has that prototype, so it matches that format is an acceptable one to pass in.
If you try to pass something that actually just does some other kind of function, it doesnít have the same prototype, so the get line that takes no arguments and returns a string just doesnít match on any front. And if I try to say here, plot the get line function of the integral two to five, it will quite rightfully complain to me that that just doesnít make sense. Thatís because it doesnít match. So a little bit of syntax here, but actually kind of a very powerful thing, and it allows us to write to an addition to kind of parameterizing on these things you think of as traditional data, integers, and strings, and whatnot. Itís also say as part of your operations, you may need to make a call out to some other function. Letís leave it open what that function is and allow the client to specify what function to call at that time. So in this case for a plot, what function that youíre trying to plot, let the client tell you, and then based on what they ask you to plot you can plot different things. All right. Way in the back.
Student:Is there a similar setup for multivariable functions [inaudible]?
Instructor (Julie Zelenski):Certainly. So all I would need to do if this was a function that took a couple arguments, I would say double comma double comma int. If it returned void, [inaudible] returned. Sometimes it looks a little bit like the prototype kinda taken out of context and stuffed in there, and then those parens around the function are a very important part of that which is telling you yeah, this is a function of these with this prototype information. Behind you. No? Youíre good now. Somebody else? Over here?
Student:Is that FN a fixed name [inaudible]?
Instructor (Julie Zelenski):No. Itís just like any parameter name. You get to pick it. So I couldíve called it plot function, my function, your function, whatever I wanted. Here in the front.
Instructor (Julie Zelenski):So Java doesnít really have a similar mechanism that looks like this. C does, so C++ inherits it from C. There are other ways you try to accomplish this in Java. It tries to support the same functionality in the end, but it uses a pretty different approach than a functions as data approach. [Inaudible]?
Student:Can you pass like a method, like an operator?
Instructor (Julie Zelenski):So typically not. This syntax thatís being used here is for a free function, a function thatís kind of out in the global namespace, that level. There is a different syntax for passing a method, which is a little bit more messy, and we wonít tend to need it, so I wonít go there with you, but it does exist. It just as it stands does not want a method. It wants a function. So a method meaning member function of a class. Okay, so let me Ė that was kind of just to set the groundwork for the problem we were trying to solve in set, which was set is holding this collection of elements that the client has stuffed in there. Itís a generic templative class, so it doesnít have any preconceived notion about whatís being stored. Are the strings? Are they student structures? Are they integers? And that in order to perform its operations efficiently, it actually is using this notion of ordering, keeping them in an order so that it can iterate an order. It can quickly find on the basis of using order to quickly decide where something has to be and if itís present in the collection.
So how does it know how to compare something thatís of unknown type? Well, what it does is it has a default strategy. It makes an assumption that if I used equals equals and less than that would tell me kinda where to go. And so itís got this idea that it wants to know. Weíll give it two things. Are they the same thing? In which case, it uses kinda the zero to show the ordering between them. If one precedes the other, it wants to have some negative number that says well this one precedes it, or some positive number if it follows it. So it applies this operation to the [inaudible] that are there, and for strings, and ints, and doubles, and characters, this works perfectly well. And so thatís how without us going out of our way, we can have sets of things that respond to the built in relational operators without any special effort as a client. But what we can get into trouble with right is when equals equals less than donít make sense for type.
So let me just go actually type some code, and Iíll show you. I have it on the slide, but Iím gonna Ė wow, this chair suddenly got really short. [Inaudible] fix that. Okay. Weíll go over here because I think itís better just to see it really happening, so Iím gonna ignore this piece of code because itís not what I wanted. But if I make some student T structure, and itís got the first and last name, and itís got the ID number, and maybe thatís all I want for now Ė that if down in my main Ė canít find my main. There it is. Iím gonna need that piece of code later, so Iím gonna leave it there. If I make a set, and I say Iíd like a set of students Ė my class Ė and I do this. And so I feel like I havenít gotten [inaudible]. I made this structure. I say Iíd like to make a set of students, that each student is in the class exactly once and I donít need any duplicates, and I go through the process of trying to compile this, itís gonna give me some complaints. And its complaint, which is a little bit hard to see up here, is thereís no match for operator equals equals in one, equals equals two, and operator less than in one equals two.
So itís kind of showing me another piece of code thatís kind of a hidden piece of code I havenít actually seen directly, which is this operator compare function. And that is the one that the set is using, as it has this idea of whatís the way it should compare two things. It says well I have some of this operator compare that works generically on any type of things using equals equals and less than. And if I click up here on the instantiate from here, itís gonna help me to understand what caused this problem. The problem was caused by trying to create a set who was holding student T. And so this gives you a little bit of an insight on how the template operations are handled by the compiler, that I have built this whole set class, and it depends on there being equals equals and less than working for the type.
The fact that it didnít work for student T wasnít a cause for alarm until I actually tried to instantiate it. So at the point where I said Iíd like to make a set holding student T, the first point where the compiler actually goes through the process of generating a whole set class, the set angle brackets student T, filling in all the details, kinda working it all out, making the add, and contains, and whatever operations, and then in the process of those, those things are making calls that try to take student T objects and compare them using less than and equals equals. And that causes that code to fail. So the code thatís really failing is kind of somewhere in the class library, but itís failing because the things that weíre passing through and instantiating for donít work with that setup. So itís something weíve gotta fix, weíve gotta do something about. Let me go back here and say what am I gonna do. Well, so thereís my error message. Same one, right? Saying yeah, you canít do that with a [inaudible].
Well, what we do is we use this notion of functions as data to work out a solution for this problem that if you think about kinda whatís going on, the set actually knows everything about how to store things. Itís a very fancy efficient structure that says given your things, keeps them in order, and it manages to update, and insert, and search that thing very efficiently. But it doesnít know given any two random things how to compare them other than this assumption it was making about less than and equals equals being a way to tell. If it wants to have sort of a more sophisticated handling of that, what it needs to do is cooperate with the client Ė that the implementer of the set canít do it all. So thereís these two programmers that need to work in harmony. So what the set does is it allows for the client to specify by providing a function.
It says well when I need to compare two things, how about you give me the name of a function that when given two elements will return to me their ordering, this integer zero, negative, positive that tells me how to put them in place. And so the set kind of writes all of its operations in terms of well thereís some function I can call, this function that will compare two things. If they donít specify one, Iíll use this default one that maps them to the relationals, but if they do give me one, Iíll just ask them to do the comparison. And so then as its doing its searching and inserting and whatnot, itís calling back. We call that calling back to the client. So the client writes a function. If I wanna put student Ts into a set, then I need to say when you compare two student Ts, what do you look at to know if theyíre the same or how to order them. So maybe Iím gonna order them by ID number. Maybe Iím gonna use their first and last name. Whatever it means for two things to be equal and have some sense of order, I supply Ė I write the function, and then I pass it to the set constructor.
I say hereís the function to use. The set will hold on to that function. So I say hereís the compare student structure function. It holds onto that name, and when needed it calls back. It says Iím about to go look for a student structure. Is this the one? Well, I donít know if two student structures are the same. Iíll ask the client. Hereís two student structures. Are they the same? And then as needed, itíll keep looking or insert and add and do whatever I need to do. So letís go back over here. Iíll write a little function. So the prototype for it is it takes two elem Ts and it returns an int. That int is expected to have a value zero if they are the same, and a value that is negative if the first argument precedes the second. So if A is less than B, returns some negative thing. You can return negative one, or negative 100, or negative one million, but you need to return some negative value. And then if A [cuts out] some ordering, so theyíre not equal [cuts out] later, it will return some positive value: 1, 10, 6 million.
So if I do [cuts out] say I use ID num as my comparison. Based on [cuts out] are the same, if they are I can return zero. And if ID num of A is less than the ID num of B, I can return negative one, and then in the other case, Iíll return one. So it will compare them on the basis [cuts out] figuring that the name field at that point is nothing new. And then the way I use that is over here when Iím constructing it is there is [cuts out] to the constructor, and so thatís how I would pass [cuts out] add parens to the [cuts out] as Iím declaring it, and then I pass the name. Do I call it compare student or compare students? I canít remember. Compare student. Okay. And this then Ė so now thereís nothing going on in the code Ė causes it to compile, and that if you were to put letís say a see out statement in your comparison function just for fun, you would find out as you were doing adds, and compares, and removes, and whatnot on this set that you would see that your call kept being made. It kept calling back to you as the client saying I need to compare these things. I need to compare these things to decide where to put something, whether it had something, and whatnot.
And then based on your ordering, that would control for example how the iterator worked, that the smallest one according to your function would be the one first returned by your iterator, and it would move through larger or sort of later in the ordering ones until the end. So itís a very powerful mechanism thatís at work here because it means that for anything you wanna put in a set, as long as youíre willing to say how it is you compare it, then the set will take over and do the very efficient storing, and searching, and organizing of that. But you, the only piece you need to supply is this one little thing it canít figure out for itself, which is given your type of thing, how do you compare it.
For the built in types string, and int, and double, and car, it does have a default comparison function, that one that was called operator compare. Let me go open it for you [inaudible] the 106. So there is a compare function dot H, and this is actually what the default version of it looks. Itís actually written as a template itself that given two things it just turns around and asks the built in operators to help us out with that. And that is the name thatís being used if I open the set and you look at its constructor call. I had said that I would come back and tell you about what this was that the argument going into the set constructor is one parameter whose name is CMP function that takes two elem type things Ė so hereís the one in the example of the two argument prototype Ė returns an int, and then it uses a default assignment for that of operator compare, the one we just looked at, so that if you donít specify it, it goes through and generates the standard comparison function for the type, which for built-ins will work fine, but for user defined things is gonna create compiler errors.
So you can also choose if you donít like the default ordering works. So for example, if you wanted to build a set of strings that was case insensitive Ė so the default string handling would be to use equals equals and less than, which actually does care about case. It doesnít think that capital [inaudible] is the same as lower case. If you wanted it to consider them the same, you could supply your own. A compare case insensitively function took two strings, converted their case, and then compared them. And then when you establish a set of strings, instead of letting the default argument take over, go ahead and use your case insensitive compare, and then now you have a set of strings that operates case insensitively. So you can change the ordering, adapt the ordering, whatever you like for the primitives, as well as supply the necessary one for the things the built-ins donít have properties for. So then thatís that piece of code right there.
All right. So does that make sense? Well, now you know kind of the whole range of things that are available in the class library. All right, so we saw the four sequential containers, the vector, stack, queue, and the grid that you kinda indexed ordering, and kinda allowed you to throw things in and get them back out. We went through the map, which is the sort of fancy heavy lifter that does that key value lookup, and then weíve seen the set, which does kind of aggregate collection management, and very efficient operations for kind of searching, retrieving, ordering, joining with other kind of sets, and stuff that also has a lot of high utility. I wanna do one quick little program with you before I start recursion just because I think itís kinda cool is to talk a little about this idea of like once you have these ADTs, you can solve a lot of cool problems, and thatís certainly what this weekís assignment is about. Itís like well here are these tasks that if you didnít have Ė So ADTs, just a reminder Ė I say this word as though everybody knows exactly what it means Ė is just the idea of an abstract data type. So an abstract data type is a data type that you think of in terms of what it provides as an abstraction. So a queue is this FIFO line, and how it works internally, what itís implemented as, weíre not worried about it at all. Weíre only worried about the abstraction of enqueue and dequeue, and it coming first in first out.
So we talk about ADTs, we say once we have a queue, a stack, or a vector, we know what those things do, what a mathematical set is about. We build on top of that. We write code that leverages those ADTs to do cool things without having to also manage the low level details of where the memory came from for these things, how they grow, how they search, how they store and organize the data. You just get to do real cool things.
So you probably got a little taste of that at the end of 106A when you get to use the array list and the hashmap to do things. This set kinda just expands out to fill out some other niches where you can do a lot of really cool things because you have these things around to build on. So one of the things that happens a lot is you tend to do layered things, and youíll see a little bit of this in the assignment youíre doing this week where itís not just a set of something, itís a set of a map of something, or a vector of queues, a map of set. So I gave you a couple of examples here of the things that might be useful.
Like if you think of what a smoothie is, itís a set of things mixed together, some yogurt, and some different fruits, some wheatgrass, whatever it is you have in it. And that the menu for a smoothie shop is really just a bunch of those sets, so each set of ingredients is a particular smoothie they have, and then the set of all those sets is the menu that they post up on the board you can come in and order. The compiler tends to use a map to keep track of all the variables that are in scope. As you declare variables, it adds them to the map so that when you later use that name, it knows where to find it. Well, it also has to manage though not just one map, but the idea is as you enter and exit scopes, there is this layering of open scopes. So you have some open scope here. You go into a for loop where you open another scope. You add some new variables that when you look it actually shadows then at the nearest definition, so if you had two variables of the same, it needs to look at the one thatís closest. And then when you exit that scope, it needs those variables to go away and no longer be accessible. So one model for that could be very much a stack of maps where each of those maps represents the scope thatís active, a set of variable names, and maybe their types and information is stored in that map.
And then you stack them up. As you open a scope, you push on a new empty map. You put things into it, and then maybe you enter another scope. You push on another new empty map. You stick things into it, but as you exit and hit those closing braces, you pop those things from the stack to get to this previous environment you were in. So let me do a little program with you. I just have this idea of how this would work, and weíll see if I can code something up. So I have Ė letís go back over here. Iím going to Ė this is the piece of code that just reads words, so thatís a fine piece of code to have here. I have a file here Ė let me open it up for you Ė that just contains the contents of the Official Scrabble Playersí Dictionary, Edition 2. Itís got a lot of words it. Itís pretty long. Itís still loading. Letís go back and do something else while itís loading.
It happened to have about 120,000 words I think is what it would be right now. So itís busy loading, and I have this question for you. There are certain words that are anagrams of each other. The word cheap can be anagrammed into the word peach, things like that. And so I am curious for the Official Scrabble Playersí Dictionary what Ė so if you imagine that some words can be anagrammed a couple times, five or six different words just on how you can rearrange the letters. Iím curious to know what the largest anagram cluster is in the Official Scrabble Playersí Dictionary.
So Iíd like to know across all 127,000 words that they form little clusters, and Iíd like to find out whatís the biggest of those clusters. Okay. Thatís my goal. So hereís what Iíve got going. Iíve got something thatís gonna read them one by one. So letís brainstorm for a second. I want a way to take a particular word and kinda stick it with its other anagram cluster friends. Whatís a way I might do that? Help me design my data structure. Help me out. [Inaudible]. Iíve got the word peach. Where should I stick it so I can Ė
Student:You could treat each string as like a set of Ė
Instructor (Julie Zelenski):So I have this string, which represents the letters. Iíve got the word peach. I wanna be able to stick peach with cheap, so where should I stick peach in such a way that I could find it. And youíve got this idea that the letters there are a set.
Theyíre not quite a set, though. Be careful because the word banana has a couple As and a couple Ns, and so itís not that Iíd really want it to come down to be the set BAN. I wouldnít wanna coalesce the duplicates on that, so I really do wanna preserve all the letters that are in there, but your ideaís getting us somewhere. Itís like there is this idea of kind of like for any particular word thereís the collection of letters that it is formed from. And somehow if I could use that as an identifier in a way that was reliable Ė anybody got any ideas about what to do with that?
Student:If you did that a vector, each letter and then the frequency of each letter in the word?
Instructor (Julie Zelenski):So I could certainly do that. I could build kind of a vector that had kinda frequencies, that had this little struct that maybe it was number of times it occurs. Then I could try to build something on the basis of that vector that was like here is Ė do these two vectors match? Does banana match apple? And youíd say well no. It turns out they donít have the same letters. Iím trying to think of a really easy way to represent that. Your ideaís good, but Iím thinking really lazy. So somebody help me whoís lazy.
Student:Could you have a map where the key is all the letters in alphabetical order and the value is a vector of all the Ė
Instructor (Julie Zelenski):Yeah. That is a great idea. Youíre taking his idea, and youíre making it easier. Youíre capitalizing on lazy, which is yeah Ė I wanna keep track of all the letters that are in the word, but I wanna do it in a way that makes it really easy for example to know whether two have the same Ė weíll call it the signature. The signature of the word is the letter frequency across it.
If I could come up with a way to represent the signature that was really to compare two signatures quickly to see if theyíre the same, then I will have less work to do. And so your idea is a great one. We take the letters and we alphabetize them. So cheap and peach both turn into the same ACEHP. Itís a nonsense thing, but itís a signature. Itís unique for any anagram, and if I use a map where thatís the key, then I can associate with every word that had that same signature. So letís start building that. So let me write Ė I wanna call this the signature. Given a string, itís going to alphabetize it. Iím going to write the dumbest version of this ever, but Iím just gonna use a simple sorting routine on this. So smallest is gonna small index Ė weíll call it min index. Thatís a better name for it. Min index equals I, and then Iím gonna look through the string, and Iím gonna find the smallest letter thatís there and move it to the front. Thatís basically gonna be my strategy. Iíve got the wrong place to start, though. Iím gonna look from I to the one.
So if S sub J is less than S sub min index, so it is a smaller letter, then the min index gets to be J. So far, what Iíve done is Iíve run this loop that kind of starts in this case on the very first iteration and says the first character in Slot 0, that must be the min. And then it looks from there to the end. Is there anything smaller? Whenever it find anything smaller, it updates the min index, so when itís done after that second loop has fully operated, then min index will point to the smallest alphabetically character in the string that we have. Then Iím gonna swap it to the front. So S sub I with S sub min index, and Iíll write a swap because swap is pretty easy to write. See about how we do this, okay. So if I Ė I like how this works, and then let me stop and say right here, now is a good time to test this thing. I just wrote signature. It probably works, but it potentially could not. And Iím just gonna show you a little bit of how I write code. This is good to know. As I say, I put some code in here that I plan on throwing away. Enter word Ė and then I throw it through my function. I think I called it signature, didnít I? Signature S Ė so the idea being if it doesnít work, I wanna find out sooner rather than later. It doesnít like my use of get line. Is that because itís not included? Yes, itís not. So letís go get the right header files. This is half the battle sometimes is figuring out what headers are needed to make your compiler happy.
So now itís up here at enter word, and I say what does cheap come out as? It goes into the debugger. Thatís good. So it wasnít right. Now weíre gonna get to find out what does it not like about that. It says Ė did we forget to return something? Letís go look at our code. So itís complaining about Ė oh yeah, I can see where weíre gonna get in trouble here. Itís complaining that the return Ė it was saying that Iím trying to print something and it looks like garbage, that what Iím trying to print didnít make sense at all. I could say well thatís funny. Letís go look at signature.
Student:[Inaudible] pass the [inaudible].
Instructor (Julie Zelenski):Thatís probably a good idea. We ought to fix that while weíre there. Let me leave that bug in for a minute because Iím gonna fix my first bug, and then Iíll come back to that one. So it turns out what itís really complaining about though is it has to do with Ė I said well what does signature return. Somehow, whatís being returned by signature is being interpreted as total crap when it got back to main, and thereís a very good reason for that because I never returned anything. So maybe if I had been less cavalier about the fact that it was giving me a warning over here that said control reaches the end of non-void function, but I was being lazy and I didnít look Ė was that I didnít pay attention to the fact.
So letís leave my other bug in that youíve already pointed out because this is exactly what itís like when youíre doing it. You enter words and you say cheap, and it says cheap, and youíre like no. And then youíre like about how about an. Hey look, thatís in alphabetical order. And then you spend all your time thinking about words for a while. Well, tux. Thatís in alphabetical order. It seems to work. You could do that for a while, but then youíre like this whole cheap, not good. Okay, so I come back here and my friend whoís already one step ahead of me has pointed out that my swap is missing the all important pass by reference that as it is it what swapping the copies that got passed to the function, but of course nothing was happening back here in signature land. So if I fix that, and I come back in here, Iím gonna feel better about this. Oh, my gosh. And even tux still works. And if I say banana, I should get a bunch of As, a bunch of B and an N, so thereís various cases to test out if you have multiple letters, and if you have letters that are the same letters like apple so that it doesnít lose your duplicates, and it seems to come out right. So given this, the word peach should come down to the same signature as cheap, and so that seems to indicate weíre on the path toward building the thing we wanted to build. So I have this read file thing that I have left over from last time that reads each word, and I wanna change my vector into a map of set of string. What capitalization do you not like?
Instructor (Julie Zelenski):Yeah. So it turns out this file happens to all be lower case, but thereís no harm in doing this. That way even if they werenít, itíll take care of that problem for us if we wanted it to.
Instructor (Julie Zelenski):I want lower case. Oh yeah, I do. Well, your case. You guys are so picky. All right. Hereís my deal. Iím gonna take my map, and this is gonna be a line of code thatís gonna make your head spin. Just go with it. This is the do all craziness. Okay. So Iíve got this map, and what I wanna do is under the signature of this word, I want to look up the set of strings thatís associated with it and tack this one in. And the add in this case with the set, I know thatís gonna do non-duplication. In fact, the file doesnít contain duplicates, but if it did, I certainly donít want it to record it twice anyway, so I might as well do this. Now this form of this is a heavy lifter for a small piece of code. The signature then went and converted into the ACEHP form. I used that as the key into the table. If it was already there, itís gonna retrieve me an existing set that Iím gonna just go ahead and add a word onto. If it wasnít there, the behavior for the brackets is to create kind of a new empty value for that. So itíll use the key and create a default value for type.
Well, the default value for set of string Ė so if you just create a set without any other information in the default constructor will always create you a nice clean empty set. So in fact, it will get me exactly what I want which is to put it in there with an empty set that I will immediately add the word into. So after I do this, I should have this fully populated map. And then Iím gonna do this just as a little test. Iím gonna say num words when Iím done to feel a little bit confident about what got put in there. How about I call it Ė thatís a good idea. Itís so fussy. C++ never does what you want. I think I called this thing OSPD2.txt that has the words in it. And then I need to declare the variable that Iím sticking all this stuff into is a set. So go in, load stuff, doing itís thing, the number of words 112,882. Okay. Thatís close enough. I canít remember the number thatís in there, but that sounds like a fine approximation of it. So I feel like it did sort of manage to do something for it. And I can actually do this if I just wanna get a little glimpse of it is to use my iterator to look at something thatís in there. Wait, is map a set of string? Iterator -- file iter.hasNext. Iím gonna say key equals iter.next, and Iím going to print that key, and Iím going to print the size of the set because Iím at this point Ė I should see gobs of printing come out of this thing.
It takes a little bit of a while to process the thing, and then see gobs and gobs of stuff going by. It looks like a lot of things are ones if you can imagine reading them as they go by because a lot of words are really just not anagrams of anything else. But some of the shorter ones have sort of a better chance. So you can find out here at the end that there are EEIKLPST. I donít know what that is. Leakiest? No. I donít know what that word is. I should write it in for any of them. This dictionary has a bunch of really crazy words in it too, so it makes it especially challenging. What is that? I donít know. That one almost looks like beginners, but itís got an F in it. Itís the F beginners, a very famous word. You guys have probably heard of it. So Iíve seen that, and now I wanna do the thing where I will pick the largest one. Iíd like to know. Somebody should tell me. Int max size [cuts out] max key, so Iíll set this to be zero, and max key is that. And then Iím gonna do this. If the size of this key is greater than my max size, then itís gonna get to be the new key. So after I did all of that then [cuts out] key. And then I probably wanna see what they are, so why donít I go ahead and take a look. Ah, I have to go to the type, though. [Inaudible] to equals M of key Ė max key, I guess, dot iterator, [inaudible] IT has next, CLIT.next [inaudible]. So it went back. It found the maximum, in this case using the size of the sets as the distinguishing feature. And then max is AEPRS, which itís got a big old list of about 12. I think thatís 12, actually. Maybe 13.
So now you know. You can impress your friends at parties. This is the kind of thing you can win bar bets on. Oh, yeah. Whatís the size of the largest anagram cluster? Everybody wants to know this kind of stuff. I canít believe you guys can sleep at night without actually knowing this. And whatís neat to know though [inaudible] just to point out a couple of things that Ė you can use a little decomposition on this code, but thereís kind of a very small amount of things weíre having to do. For example, one of the things thatís really powerful, things like the map where we can just figure out how to key the things to store the collection right under that key, then looking it up and adding something is a very efficient sort of direct operation, just building on these things and it going through and doing all the work of storing them, sorting them, making it efficient for us to retrieve them and look them up such that I can process 100,000 words in the blink of an eye, and then go back through, look at them all, find the biggest one, get my information out.
Student:When you make the call to M.size, is that the number of words? [Inaudible].
Instructor (Julie Zelenski):That is the number of keys.
Student:Keys, right. So thatís not actually [inaudible].
Instructor (Julie Zelenski):Yeah. So it doesnít know anything about everything else that was [inaudible], but in fact thatís why itís [inaudible]. I know the dictionary has about 127,000 words. It turns out they form about 112 unique signatures, and so thereís actually another 20,000 words that are clung onto some existing signature. Thatís the number of unique signatures across the dictionary, not the number of words, so thatís probably the wrong name to call it.
Student:For the M signature word thing where [inaudible] of the default just to create a new stack, that works as well for vectors [inaudible]?
Instructor (Julie Zelenski):Yeah. It works for anything if you were just to declare it on the stack and the right thing happened, so vectors, set, maps. All those things do. But the primitive types like int and double, it doesnít. So it would work for string. String is an empty string.
So for some of the fancier, more modern types tend to actually know how to just default construct themselves into a good state, but the primitives donít do that. So if you were having a map of ints and you wanted to have them start at zero, you need to really start them at zero. You can call just M sub this. It would be garbage, and it would just be operating with garbage from that way forward. All right. Well, weíre good. What Iím gonna give you is the eight minute discussion of recursion that whets your appetite for the things weíre gonna be doing next week.
So recursion is one of those things I think that when you havenít yet had a chance to explore it first hand and other people tell you about it, it has sort of an awe inspiring sort of mystery, some fear, and whatnot. So first off, I wanna kinda shake that fear off. It is a little bit hard to wrap your head around the first time you see it, but weíre gonna have a whole weekís worth of time to spend on it, so weíre gonna try to give you a lot of different ways to think about it, and different problems to see to kinda help you do it. And I think once you do get your head around it, it turns out then youíll discover how infinitely powerful it is, that there is kind of a simple idea in it that once you kinda fully get your head around, you can explore and solve lots of different problems using this just one technique again and again.
So in itself, itís a little bit mysterious at first glance, but then once you kind of master it, youíll be amazed at the kind of neat things you can do with it. So it is certainly what Iíd consider an indispensable tool in a programmerís tool kit. The kind of problems you can solve using the techniques you have so far is fundamentally limited. And part of what we need to do in this class is expose you to these new ways of solving harder, more sophisticated problems that the old techniques donít work for. One of the cool things about recursion is it actually lends very simple, elegant, short solutions to problems that at first glance seem completely unsolvable. That if you can formulate a structure for [inaudible], you will discover that the code is not long to write. Itís not tedious to write. The tricky part is to figure out how to express it, so itís more of a thinking puzzle than it is a coding puzzle. I certainly like thinking puzzles as much as coding puzzles if not more.
The general sort of idea is that you are going to try to solve a problem Ė instead of sort of breaking it down into component tasks like if I need to make dinner, I need to go to the store and buy things, and I need to come home, and chop them, and get the recipe. You think of what your standard decomposition is all about Ė breaking down your tasks into A, B, and C, and D, and then you add them all together to get the whole task done. Recursion has this kind of very different way of thinking about the problem, which is like well if I needed to get Task A done, and I had Task A prime, which was somehow a lot like the task I was trying to solve, but it somehow was a little bit simpler, a little bit easier, a little bit more manageable than the one I started out to solve, and if I had that solution Ė so if somehow I could delegate it off to some minion who works for me, and then I could use that to solve my problem, then my job would be made much easier by using that result of solving a similar problem thatís a little bit [inaudible].
Okay, that seems a little bit wacky. Let me give you sort of an example of how this might work. So your standard problem, I said yeah, itís like you do these dissimilar sub problems. Letís imagine I had this goal where I wanted to survey the Stanford student body. I donít want just like a haphazard most of the people involved. Letís say I really wanted to get input from every single person on campus whether they think having cardinal as your mascot is a ridiculous choice or not. So letís imagine I really wanna hear from all 10,000 students. Now I can stand out in White Plaza with a big note pad and try to accost people and sort of work my way down the list. And then Iíd be there for eons and never solve my problem. Instead, what I decide to do is I say well Iím gonna recruit some people to help me because Iím lazy as weíve already established, and I would like to get some other people to join in my quest to answer these burning questions and to solve the survey.
So what I do is I round up ten people letís say, and I say would you help me, and I decide to divide the campus into kind of ten partitions. And I say if you could survey all the people whose names begin with A, B, and C, that would really help. And if you could do [inaudible] Gs, and if you would do Ė and if I divide up the alphabet that way, give each of them two or three letters, and I say if you would go get the data, itíd be really easy for me to do my job then. If I just took all their data [inaudible]. Well, being the kind of lazy person that I am, itís likely that the ten people I recruit would have similar lazy qualities because lazy people hang out with other lazy people. And so the person who was in charge of A, B, C, the first thing they do is turn around and find ten more friends, and then they divide it up and say could you do the AA through AM and so on. If they divide it into these pools of one tenth of what they were responsible for, and say you can go get the information from these people, and if they did the same thing Ė so if everybody along the road. We started with 10,000. Now each person had 1,000 to survey. They asked their friend to do 100. Their friend asked ten people to do ten. And then at some point, the person who has ten says well I just need to ask these ten people. Once I get their data, we donít need to do anything further.
So at some point the problem becomes so small, so simple, even though it was kind of the same problem all along. I just reproduced the same problem we had, but in a slightly more tractable form, but then I divided it around. Divide and conquer sometimes they call this to where I spread out the work around a bunch of people to where each personís contribution is just a little part of the whole. You had to find the ten volunteers around underneath you and get their help in solving the problem, but nobody had to do much work, and thatís kind of a really interesting way to solve a problem. It sounds like a very big problem of surveying 10,000 people, but by dividing and conquer, everybody has a little tiny role to play. You leverage a lot of people getting it done, and there is this self-similarity to it, which is kind of intriguing that everybody is trying to solve the same problem but just at different levels of scale. And so this idea applies to things like phone trees rather than trying to get the message out to everybody in my class, it might be that I call two people who call two friends who call two friends until everybody gets covered. Nobody does the whole job. Everybody just does a little part of it.
Sometimes youíll see these fractal drawings where there is a large leaf which when you look closer actually consists of littler leaves, which themselves are littler leaves, so that at every level of scale the same image is being reproduced, and the result kind of on the outside is something that in itself if you look deeper has the same problem but smaller embedded in it. So itís a pretty neat sort of way of solving things. I am gonna tell you a little bit about how the code looks, and then I really am not gonna be able to show you much code today. I think itís actually even better to show you this code on Monday when we can come back fresh, but that it involves taking Ė So weíre looking at functional recursion first, and functional recursion is taking some sort of functions, a function that takes an input and returns an answers, returns non-void thing that comes back. And weíre gonna be looking at problems where youíre trying to solve this big problem, and that if you had the answer to making a call to yourself on a smaller version of the problem Ė maybe one call, maybe two calls, maybe several calls Ė that you could add those, or multiply those, or combine those in some way to answer the bigger problem. So if I were trying to solve this campus survey, having the answers to these smaller campus surveys gives me that total result. And so the Ė I really should not try to do this in two minutes. What I should do is try to tell you on Monday. Weíll come back and weíll talk about recursion, and it will be an impressive week. Meanwhile, work on your ADT homework.
[End of Audio]
Duration: 50 minutes