We stand with Ukraine

Webcrow: The Multilingual Crossword Solver

Crossword puzzles aren’t just a fun language game, they are a popular area of research in artificial intelligence. Solving them requires a combination of language comprehension, rich and extensive knowledge and sound reasoning. This is a unique skill set for humans to possess and unheard of for machines…until now.

WebCrow is the first AI software capable of solving multilingual crossword puzzles using expertise and the richest self-updating repository of human knowledge available: the web.

In this stream, Marco Ernandes discusses

  • “Webcrow@WCCI22” – Human-vs-AI Challenge: As it happened
  • Demonstrate how Webcrow works
  • Webcrow’s missions and next steps

Transcript:

Brian Munz:

Hi everybody. Welcome to this week’s NLP stream which is our weekly live stream, where we cover topics relevant to NLP and NLU. And I wanted to give a quick plug for last week’s live stream. If you didn’t see about our hackathon, I wanted to kind of give a quick plug of that. We already have a few hundred people signed up, so if you want to take a shot at something using our APIs and win some money in the process and do some good in the process please take a look at that live stream. But this week, a little while ago we had someone talk about this project and it was very, very interesting. I’m kind of a crossword puzzle head myself I do one every day. So I think it’s going to be really interesting to hear what’s been going on with this project again, so without further ado Marco Ernandes. Welcome.

Marco Ernandes:

Hello. So hello. Hello, Brian. Thank you. Thank you for having me. Okay, great. So do you see my screen? Yes. Very brief introduction. So I’m Marco Ernandes, I’m currently coordinating the hybrid language technology team in Italy for expert AI. So what we basically do is research on development on combining symbolic and machine learning or sub symbolic processing in developing solutions for the company. But yeah, one thing that is interesting actually is that in my background and you had a colleague of mine in an earlier stream Giovanni Angelini, we actually did a sort of PhD career where we tackle this crossroad problem. And now at the University of Siena, which by the way is just behind my window here. We have the computer science department just here. And there’s a strong collaboration with the team there and we wear students there, PhD students there tackling this specific problem. And with the professor Marco Gori and some collaboration with the university of Sienna and an expert AI, obviously. Plus the Italian association for intelligence, intelligenza artificiale, artificial intelligence, we actually decided to renew, restart the project and to take the challenge to a sort of new level, next level. So we’ll see a little bit of the regions here.

Marco Ernandes:

And my talk here will be sort of divided into two parts. First one will be we’ll see what happened in a specific day, the 19th of July 2022, when we held an official crossword challenge against humans, humans participants at the WCCI venue in [inaudible 00:03:41]. And then the second half, second part will be, let’s go a little bit into the details of how WebCrow works and some ideas about the experiments we run and things like performances and all that stuff. So just very briefly. As I mentioned we were, very happy, we have this honor of being invited to the WCCI conference to organize such an event. The WCCI conference is a joint conference that happens I think every two years and is organized by three different conferences that join together. One of them is AJCN so one of the largest neuro network conference in the world, and it happened to be part of it. So what we actually did was we organized this challenge in two languages, and we will see some details here. Two languages Italian and English, American crosswords and we had, as you will see, we had both online participants and onsite participants.

Marco Ernandes:

So let’s start from a [inaudible 00:05:10] here, I think that’s the best way to give you an idea what was happening. So we had this very nice room where we organized the onsite event. And in parallel, we have also the online event where people from all over the world could join and participate and challenge WebCrow and see how they could perform against the AI machine. By the way you see there the t-shirts and I’m wearing one right now. So this is exactly the set up we had. So again the, let me see here, we had this participation, both languages, Italian and American crossroads and there were prices that were given for winners and for participants actually. And just to give you an idea of what was going on, the actual challenge was organized using this I will share here, this platform here the WebCrow crossword arena where there is multiple challenges that are posted. And for example, these two here where the official ones open and closed on the same day. You can see here we… you can actually if you want any one of you can jump in and look at each solution and see how the AI and also other participants performed.

Marco Ernandes:

And yeah, sorry, just to mention we also still have open challenges and you can, for example, jump into one of them. I think that for the Italian one in the open practice challenge we have right now nobody has beaten WebCrow yet but somebody did in the American version. Okay. So that’s to give you an idea of how this was held. Actually, it was a little bit of a crazy moment because as you can imagine, this large venues, these large conferences have a lot of parallel sessions. So we had to win the interest of the people at the conference that were also very involved in speeches and poster sessions and so on. But we were very happy with the participation and so we can see a little more about it here. Okay. Let’s jump into the results. By the way, if you want there’s a full report of this page. But for me in a few words what happened.

Marco Ernandes:

Here you see the ranking one for the Italian and one for the American I’m jumping from one to another, we have 40, 50 participants between onsite and online. In the onsite participate in the site challenge, WebCrow had no challenge, it’s won by quite an improved margin. And the same happened on American crossroad that is more… on American crossroads is more understandable because the venue was in Italy, was international, but probably American participants were very few it’s a venue with thousands of persons but from all over the places. Whereas there was quite a strong participation from the Italian researchers there. But we were very happy to see that WebCrow won the competition onsite both languages and this was with a strong margin. And on the online participation now this wasn’t really the case for both languages we had participants that did beat WebCrow. We had mixed feelings because we know that WebCrow can improve a lot and I will show you what we will be doing to do so. But at the same time, we know that online participation can be a little easier for humans because they can join together, they can team, something we cannot control. So we’re happy to see that kind of participation too. Okay. So this is I guess a little bit the outcome of the result.

Marco Ernandes:

One thing I want to stress here for the Italian competition we had multiple versions of WebCrow running because we did a little more work there where we prepared different configurations with turning off modules here and there. I will give you a little more details in a few minutes. And it’s interesting that you can see here for example, that the sort of full version of WebCrow was able to challenge the human experts more than versions where we did some ablation here and there of modules. Okay. So that’s a little bit about the competition itself. I just will stress again that we have you just Google WebCrow, find the webpage, you will jump into the WebCrow official page and you can see what happened at WCCI and jump on the arena and see each of the practice and official challenges that were organized. We had earlier challenges that were open. This one was here at the University of Siena, I guess so, and in this one I think WebCrow won. So what I was just showing is what came out of the WCCI competition. By the way, if you’re interested I will repeat a little bit myself, but you can jump onto the solution and spot the errors. I think that is very funny.

Marco Ernandes:

You can see the errors, this is for Italian and instead of cargo corn there’s an error made by WebCrow in a mistake and I can translate that as something that more people don’t need and the answer would be in Italian hair dryer, and it got it wrong. And you can start seeing one thing here that I think that for American crossroad lovers, it’s interesting it’s that the WebCrow was missing a letter that is not constrained by a vertical work. It’s something that American crossword don’t allow as a layout and Italian crossword do. And obviously this kind of word you need to get that letter right by answering, there’s no other way because that letter is not constraint. If we give a look at the American one, we will see, for example, I think something that shows a WebCrow for, let’s see, for example yeah, I spotted an error, things like this. It is interesting signed backwards. That’s very nice. I think WebCrow wasn’t able to answer the news and that’s actually because it still lacks this kind of specific processing of puns or word tricks that can be absolutely present in challenging American crosswords. Okay. So let the audience explore more on the platform. No, let’s see what about it’s under the hood our scientific goals, technicalities, and a few experiments.

Marco Ernandes:

Yeah, first of all we always try to answer a question, why are we interested in crosswords? By the way, I think it is such a old language game that has so many people passionate about around the world that it’s sort of an obvious choice, it’s relevant because people love it and people find it challenging. So there is linguistic challenges that we’re trying to face here. And in fact, actually there is I love this paper from Littman that actually provides a sort of division between problems where there is like chess or goal that are closed world problems or problems with a closed world assumption. Whereas crosswords is really an open world problem that… Who designs a crossroads? The crossword author sort of defines the rules that you need to understand to solve. And just a very brief mention, you see how old the game is, we’re talking about the official inventor that goes back to 1913 but actually they is even an Italian version from 1890 that is not regarded as an official crossword simply because it’s lacking the black cells. But these kind of word and letter games have been part of our interest and culture for quite a long time.

Marco Ernandes:

And the three main things I would like to stress about crosswords is that it’s actually we need to merge three important processes. One is understanding, one is accessing knowledge, and one is reasoning about that knowledge. So it’s really squeezing to one problem a lot of fascinating AI challenges. Before we dive into how it works, it’s useful to notice a few differences between Italian and American crosswords because it highlights actually what we earlier I said a few minutes ago. The fact that this is an open problem. It’s difficult to say that a specific crossword has a defined set of rules. And in fact, Italian American crossroads don’t share the same rules, as I mentioned, yeah, Italian crossword have two letter cells, they have blind cells, they have very common infections. These are very uncommon for example, in American, but for example American is more free, American crosswords it’s more sort of free rules in creating multi word answers. You can see here examples OHNOES it’s really something that it’s that obvious is really very free in the sense that they also can design the crossword they made and they had for this kind of nuance.

Marco Ernandes:

Whereas Italian follows a little more strict rules which at least [inaudible 00:17:27]. So not to say that we cannot really define the American crossword or the Italian crossword as the unique version, unique set of rules that can define the crossword problem. Each culture, each region has its own rules. And I’m sure that Spanish or [inaudible 00:17:53] or German or French crosswords they really do have different rules and nuances that have to be picked by the AI. So I think that is a strong point because as we will see we are now facing also another kind of challenge. And I think this is very important to mention here because we are not the only project devoted to crossword solving, actually for quite a long time in the last years there’s been a very strong crossword project called Dr. Fill and that it was challenging American crosswords. And that now very recently was even surpassed by the Berkeley crossword solver that was recently published in ACL 2022 something that is extremely recent because I think it’s probably June 2022 as a publication. And this solver which by the way if you’re interested, uses actually I think a mixture of three different large language modules, it’s truly actually very interesting piece of work and it did achieve extremely high results in American crosswords.

Marco Ernandes:

As you can see here is mentioned 99.7% of correct letters, which is surprisingly high. The interesting thing is that it’s mostly based at least the answering block on six million plus of clue answer pairs from the historical available data from, I think, Los Angeles Times and so on forth. And it strongly relies on this database. What we are actually stating here is that the approach doesn’t think, at least as a WebCrow team, we don’t think that this is the way forward if we want to challenge the game as I mentioned before. Challenge the Italian, French, Spanish, and so on languages, because first of all, not all of them share this great amount of available clue answer pairs or solve crosswords. And they changed so much that some of the principles behind the Berkeley solver may not apply. So on the contrary we say, okay, let’s move to a more scalable approach where we rely less on available data but we can access knowledge through the web, which is our sort of source of common knowledge of fresh and open knowledge.

Marco Ernandes:

So that’s I think a relevant point, we think that through this approach we can scale and say, okay, we’ll challenge humans in any possible language. Now, here’s the joke. We noticed that also America for example has crossroads so that just to say how culturally different is the game around the world. But our immediate aim would be here lifting our mission would be to challenge Western languages with the Roman alphabet, at least starting from those. And here comes also the collaboration with expert AI that provides an important source of knowledge and language tools for these languages. So, that’s, I think, our recipe for scaling the problem up to a worldwide challenge rather than confining ourselves to American crosswords. And in fact, we’ll see that we perform local in solving and challenging humans in the event, in WCCI even without any database. So we have a specific ablation test where we said, okay, let’s remove any information that we have from the word cross the history of previously solved crossword. That is an extreme example, but we will see that that is very interesting because the engine per se is decently accurate.

Marco Ernandes:

And yeah. Okay. Just to give you an idea of what six millions clue pairs is because on the country for Italian we have something around hundreds of thousands of clue pairs, six millions is something like solving three, I need to recheck this, but it’s something like solving three crossroads every day for 50 years. So it’s really an extremely large amount of… By the way, not only solving them and storing them into memory so each possible clue answer. So I think that is probably a superhuman experience that it’s used when leveraging mainly on a database approach. Okay. Let’s just jump briefly into the solving process. The solving process is made of three different steps with the first one we call it clue answering. So it’s a little bit different from what a human would do that would start probably from the top to left corner and then start answering. The engineer reads all the clues and tries to find possible answers to all of them, with a specific goal that of having the correct answer that’s possible on top of the list of all the candidate answers for each crew, and not missing the correct answer anywhere in the list.

Marco Ernandes:

Those are the main goals. Actually, it’s impossible, at least for now we consider not really possible to have always the correct answer on top of the list, even because that’s the characteristic of the game. You need to see crossing clues and crossing answers to take a decision on a specific spot. So first clue answering then belief propagation, belief propagation is a second step that comes before the actual putting the answers in the list, sorry in the grid, and it means that we are trying to recalculate the confidences about the answers given the grid itself. Before placing the answers we say, okay, yes is this answer does it make sense with the definition that we were tackling, and at the same time, does it make sense with the grid and the surrounding possible answers? Then there’s the actual grid filling which means, okay, let’s start tapping down the answers. Okay, here we have a little bit of a deep drill down into some of the components. And you can see here the three parts clue answering, belief propagation, and grid filling. So the clue answering is probably one of the most fascinating and interesting tasks for us as a project and for expert AI as a partner in the project because it’s probably the subtask here, the sub problem that is more connected to NLP.

Marco Ernandes:

So as you can see here we are exploiting first of all four types of knowledge, roughly speaking. So as I mentioned, common knowledge that comes from web search, these knowledge about previously solved crosswords that come from the crossword database. Knowledge about the language we are playing with and the concept that the language conveys. So we talk about ontology and Lexicons and then knowledge that is connected to a specific cluster rules that may apply to a specific crossword. We can consider these two as a crossword database and the custom rule base module as sort of those that convey specific knowledge about, say the experience of Italian crossroads rather than the American crosswords. So things like, for example, signed backwards could be this kind of module trying to address. And all of these modules what they do is actually they analyze the clue on their own. They’re supported by the NLP analysis which provides an input representation of the rules. And this representation is used to access these knowledge and come back with possible answers. And these answers are then merged together because we need one unique list. And as you can see here then the answers are always so filtered.

Marco Ernandes:

And we will see in a little bit of modules what happens, but mostly what you can understand is that if you are looking for, I say, a plural or a singular, this is very simple example, then this filter will try to boost the most coherent of the two options. Is the number or the gender something we need to consider? Yes, depends on the clue if we have to consider gender or a number or any sort of different class applied to the answer, then we need to boost the coherent answer class. Yeah. Okay. So at the end of this process, we have one unique list of answers or candidate answers for each definition for each clue. And what happens then is that, as I mentioned, we propagate the confidences of each candidate answers, and then we start placing the answers in the grid. Okay. I think, actually Giovanni, touched some of these points about the different phases of clue analysis. Sorry, of the clue answering process. I want really going through all the details in each of them, so that we can have a little more time about the remaining and the unique experiments that we’re using WebCrow. Anyway, we can briefly go through this. So we know through the clue analysis that we are dealing with different types of clues that can be answered better by different modules.

Marco Ernandes:

By the way, here, fill in the blank clues. This is a possible, as I mentioned here, you need custom rule based approaches. American crosswords, Italian crossword do not have this. American crosswords have fill in the blank clues. You need to sort of know the convention that what’s the possible missing word here is the answer. Or actually missing multiterm answer is what needs to be placed here and so on, so forth. Okay, let’s jump. Yeah, this is at a glance, our clue analysis. So again we’re sort of dividing those answers that require access to experiential knowledge. So knowledge about the crosswords. Common knowledge or fresh knowledge about facts of the world. Lexical knowledge or ontological reasoning, so things about actually languages and properties of the language, concepts and things like more generic terms or more specific or synonyms and so on. Okay. So an example of common knowledge, things like similar movies or music, this kind of fresh information is perfectly accessed by web search. Then we have the lexical or ontological knowledge, that as I mentioned is involved in this case, finding pseudonyms that do fit the slot. And again, the experience from previously solved crossword.

Marco Ernandes:

I think this is interesting here, for example, when you say previously solved crossword you’re not really talking about perfect matching. So I have a definition and I look for that definition in the database. Obviously this needs to be a smart and flexible or fuzzy matching against the database. And here is an example. You can see cylindrical storage vessels that can be answered by accessing the database and discovering that there is similar definitions that provide a common answer that can build confidence and then single case would be the proposed candidate answer by the end. Okay. I’ll move a little bit ahead of this. Okay. Just an overview about the different modules and what they access. So we are talking about this crossword database. In our case, we’re using around 3 million unique pairs of clue answers. So this, which means around 50K so puzzles, as I mentioned. This is already three a day for 50 years. So I can imagine that what the Berkeley solution use this and more.

Marco Ernandes:

The Italian is [inaudible 00:33:29] smaller, but already effective and important. On the bottom left, you see that we are using Expert.ai Knowledge graph to access the ontological and lexical knowledge. And these are a little bit of few numbers talking about nearly half a million of contents and with the wikidata extension. So we’re talking about millions of content accessed through this technology. And then the web search is our tool for accessing the fresh, really fresh knowledge that is essential to provide answer to specific clues. Okay. Very briefly, what is the clue answer technique that we used in, as I mentioned earlier, we need to do a smart matching against seeing patterns, seeing portions of text that lead to an answer. And this was mainly approached with a neural QA technique that is actually published, the article by Zugarini and Ernandes. And the approach here can be actually multiple approaches. But what we followed in the current WebCrow implementation is what we call a question to question. So neural encoding over, for example, we did the database crosswords and the clues, and this neural encoding was used to access smart access, similar clues to the one that we were looking for.

Marco Ernandes:

Then after retrieving the answer, I mentioned, the one important aspect is filtering, and filtering is especially important for Italian that has a strong inflection. As I mentioned earlier, American crossroads don’t share such a richness either in the language or in the actual crosswords of inflections. But anyway, it is important to try to remove all possible errors from the grid filling. And things that are taken into account is gender, number and part of speech that we try to, obviously, as I mentioned, match these two information in the answer with the expected classes that can be predicted by reading the clue. Okay. Then a little bit of check on how the belief propagation works. This is especially important because it’s something that provides a strong boost to the quality of the list, where are the candidates for the clue answering. And we see here mainly the belief propagation uses one hypothesis. That the correct answer is the one that maximizes the probability of being the correct answer, but at the same time, fulfilling the constraints of the crossing words. The goal of this step is to increase the probability of the correct answers. So making them higher in the ranking, by involving the constraints in the process. And this is iterated process, it’s described.

Marco Ernandes:

Let’s see a little bit of… Okay. Let’s see here. You can see that, for example, Boheme here is 23rd rank, 23rd place in this list of answers. And by the fact that Boheme well crosses Edotte, which is already a candidate answers, it’s going to be reinforced. And you see here, the answers with an E at the beginning and Boheme with an E in the fourth place, they’re are sort of boosting each other. So there’s a little bit, I can skip this. Sorry. I will… Just a second. Where’s my presentation gone? Oops.

Brian Munz:

You may have minimized it.

Marco Ernandes:

Yeah, here. Sorry. I’ll skip this. Okay. There’s a little bit of Math here that describes in a ’99 paper from Shazeer, Littman, and Keim how this actually works, the Loopy belief propagation. It’s mainly to give you an insight, it’s built on building a graph of the constraints and unfolding the graph and figuratively recalculating the probabilities of each node over the graph. So the constraints, which in crosswords are simply to define, constraint is the requirement that two slots that crosses each other share the same lettering in the crossing itself. So that defines the constraint network. And then we can run the belief propagation algorithm over this graph. If you are interested into details, we can comment on later in the comments and discuss the details. Okay. Let’s go. Okay. Yeah, I think, yes, we are now at, sorry. I’ll go back again. We’re now at the grid filling part. Actually multiple approaches can be made. How does WebCrow work here? It takes a sort of two step approach right now. This is what gave the best performances, and this is a little bit a new way of approaching the problem. So the first step is what we call in certain characters, sort of detecting which characters are very strongly… On which characters WebCrow is very confident and placing those as sort of pivotal characters that you must put in.

Marco Ernandes:

Okay, let WebCrow take the risk of putting a few letters here and there, because he’s very sure about them. This is computed by calculating the probability mass of each character, including each letter, including all the possible answers that are change that letter in that position. And so this, what happens here is that it will fill the grid in. Let’s see this kind of way. There’ll be letters like this one here, for example, or this one that are not in certain. They’re still left empty by WebCrow because he’s not that confident about that position. And then he will go back to the list and look for the answer that fills, the list of word answers that fills that slot the best way. Okay. Let’s… I think we have a few examples from the practice graphs. This is a real example here. You can see that, for example, yes, these are the letters that WebCrow would put in without even seeing, accessing the full word. And you can see that there are areas where he’s not really confident about.

Marco Ernandes:

But we observed that then during the real second step refilling while filling with the entire answers, the number of errors that were made during the character filling was very few, it’s actually below 0.1%. And most of the errors happen to be added once we start putting the actual words. In this case, for example, here, you can see that WebCrow is wrong about a specific letter A, this is actually something understandable. [inaudible 00:42:27] population could actually have a plural or singular answer. And in this case, WebCrow got it wrong. Okay. Anyway, this gives you… I’ll skip this about what happens with the filling in the first part of the filling, which is putting the characters there. And then as I mentioned, the second step is finalizing the filling with the actual entire words.

Marco Ernandes:

This can be done with multiple algorithms. Right earlier in the WebCrow 1.0, we experienced the usage of A*. It’s a very known optimization algorithm. Anyway, right now we moved to a more simple, greedy approach that simply scans through the words that are coherent with the constraints and picks the most confident one. This is something that is probably one of the areas where a strong boost can be obtained in the future because inspired by Berkeley’s experience, it’s where reviewing again and again, the answers that are added can make the difference. In fact, the Berkeley project, added one thing that is called the local search approach that curatively checks the answers that are added to the grid and looks for better answers, generating better answers that can be very close to the available answers in the stuff.

Marco Ernandes:

Our goal is, this I will disclose something we are aiming to do in the future is something similar, but coherently with our approach to make this web assisted. So checking possible alternatives to answers that are already placed in the grid in the web, which means that there’s a very strong difference between this step here and the web based clue answering step, because while in the web based clue answering, we only have the clue to use, now we would have the clue and the possible answers. And that information is very rich if you think of going then to the web and checking if that answer is correct, or if you need some refinement. Okay, let’s dive then into some experimental results. These are currently the outcome of the experiments we run. You can see on the left, the Italian crosswords of our test set and the American crosswords. So this is all unseen data. Overall you can see that the performances are in terms of correct words and correct letters are strong enough to believe that the engine is extremely competitive, extremely close to expert level. It’s probably even beyond expert level in Italian. It’s getting close in American crosswords.

Marco Ernandes:

And as you can see here, but it is interesting to notice that, for example, on the New York Times, crosswords from Monday and Tuesday are actually solved most of the time perfectly and the quantity of wrong letters is very reduced. Pretty close to the experience of Berkeley. But there are other days where things become more tough as you can expect Saturday and Sunday crosswords, and also Thursday crossroads are very challenging. Thursday and Saturday share a common feature, which is having theme crosswords. Especially Thursday, you can see that theme crosswords really challenge the word crossover. But theme crossword are actually a totally different story. And in fact, when I mentioned earlier Berkeley’s performances, those were 99.7% mostly measured against non themeless puzzles. Anyway, this is a comparison we have here with Berkeley Solver. Berkeley Solver without the local search was more or less comparable with our experience right now, if you don’t consider AVCX crosswords. And the local search boosted the performances up to 80 and so percent. So we know that this final step of local search is going to provide a further boost.

Marco Ernandes:

Before I forget, I need to, again, sorry, thank AVCX, which is our American editor that collaborated in providing crossroads for the official challenge and the practice challenge. And the same thanks goes to Sudoku Edizioni, that provide the Italian corresponding puzzles. I didn’t mention this, but it’s important, all the puzzles that were used for practice and the official tournament, the official challenge were unseen crosswords. Unseen by the engine and unseen by the participants. Last, I think this is now my last slide. I think this is pretty interesting stuff here because we can now see in a reduced test set, what happens if we remove modules and how is the engine robust to ablation of parts, and how actually each part contributes to the whole. So, if we take this test set of 20 Italian crossroads, we see here that around, what was it? Yeah. 99.5% of letters were filled correctly. It’s as usual less four words, something like 98.4. Let’s see what happens if, for example, I don’t use web search at all. And you can see that there’s a strong drop in performances, but a substantial drop in performance for correct words, around 10%. Meaning that there’s quite a substantial number of clues that require web knowledge to be answered, even though the correct letters still don’t drop that much.

Marco Ernandes:

But yeah, you need to always to think that you need only one letter wrong to get two words wrong. That’s a natural way of explaining this phenomena, this discrepancy in the numbers. Okay. Let’s see then what happens if we remove, this is very interesting for our Expert.ai experience and corporation. Let’s see what happens if we remove all the possible language capabilities that we added. So the NLP analysis with the Morpho filtering, the access to lexicon and access to the Expert.ai knowledge graph. You can see here that overall there’s again, another more or less 10% of drop in performances. So meaning that accessing the knowledge graph and the linguistic refinements of the list is more or less of the same impact as the web search is. Finally, an additional ablation we did was removing all of the modules that deal with previous knowledge about the world of crossroads, like the rule based module, things like fill in the blanks. Actually, we don’t have the fill in the blanks. It’s in the American version, not yet, but we have similar tricks for the Italian crosswords. Rule based model was totally removed and also the entire crossword database was removed. As you can see, actually, by removing only the crossword database, the drop in performance is even smaller than removing the web search.

Marco Ernandes:

And overall if we remove any kind of knowledge, it’s like saying I’m picking up random person that doesn’t know anything about crosswords, put him in front of crossword, briefly explain the rules and let the person fill the grid. So this is the kind of metaphor in this ablation. If you think of this, WebCrow is not performing badly actually, it’s performing at around 90% of correct letters. So it’s still putting there a nine out of 10 letters correctly, which is probably even beyond an average random person, which I thought it was really surprising. So this try to prove how much of the problem is a combination of different capabilities, accessing common knowledge, dealing with natural language processing and accessing lexical knowledge and at the end, being aware and competent on the specific rules of the given game, in this case crosswords. Yeah, so this is a little bit from the experimental point of view. We have a number of next steps we want to… Some of them were already mentioned, but anyway, an important fact is highlighted here. We are now actually aiming at starting collaboration with French and German universities to take WebCrow to French and German crosswords and probably Spanish in the medium term. So I really hope that the next time we can have a chat about WebCrow we’ll be already addressing these languages and meeting human experts.

Brian Munz:

Okay.

Marco Ernandes:

Yeah, just a final comment. I need to thank the entire team here. Our mentor, Professor Marco Gori and Professor Marco Maggini, from the University of Siena. The student, Tommaso Iaquinta, and our collaborator, Szymon Stefanek. Then our team in the Expert.ai side is myself, Giovanni, that you already met, and Andrea Zugarini. Yeah, absolutely. The project, the challenge wouldn’t have been possible if any one of these wasn’t really wasting a lot of its time in making it possible. Yeah. Thank to all, Thank to the WCCI that allowed us to and invited us there. That was a super, super event. Yeah. And thank again, to Expert.ai for promoting this project and actually making it possible to create this team and moving the activities forward. Brian, I’ll let you say thanks.

Brian Munz:

No, thanks. That was extremely interesting. I mean, on a lot of these, we focus on NLP with more common language constructs, but it’s interesting to see how well it performs with word play and kind of tricks and all the things that are needed to be more clever for a crossword, it’s actually interesting that it does that well. But yeah, so thanks again for presenting. It was super interesting. We’re going to have to keep up with you as time goes on, but everyone please tune in next week where we’re going to talk about innovation and space begins with NLP, which sounds really interesting. I’m trying to think of how that may, NLP and outer space may cross paths, but we’ll have to see next week. So make sure to join at 11:00 AM Eastern Time. And again, Marco, thanks for presenting and we’ll see everyone next week.

Marco Ernandes:

Thank you. Bye. Bye-bye to all.

 

Related Reading