Reading is not understanding.
This is a fundamental concept, but one that many choose to ignore. To read a word or phrase, you must only have a fundamental understanding of language and phonetics. To understand a word or phrase, you must not only know the language in which it is written (with all its grammatical rules, structures and conventions), but have some knowledge of the topic at hand.
For example, we can all read the following sentence — “In 1998, a Hail Mary pass by the Vikings’ QB helped them defeat the rival Bears.” — but cannot comprehend it without a base knowledge of American football.
It should come as no surprise then that when researchers began developing software that could read text, they sought to replicate the same knowledge-based approach. Thus, an artificial intelligence engine must embed as many rules and structures of a given language as possible, and it should have access to the domain knowledge covered by the text.
Advanced research continued in this direction for many years, despite the limitations caused by slow and expensive computers. As computers became more powerful and relatively cheap to operate, attention remained on machine learning and deep learning to optimize text analysis speeds. Symbolic AI, a more effective method of replicating human knowledge, was overlooked.
Lack of Knowledge Hinders Meaningful Natural Language Processing
Many data scientists and researchers saw the exponential growth in computing power as an opportunity to bypass the complexity of a human interpreting a text when processing language. In this approach, words are transformed into “numbers” and, by simply applying statistics and mathematics (and brute force), they hoped to “teach” a computer to read and understand a document in a way that out-performed existing methods.
While great strides have been made in fields like image recognition, we still don’t have comparable results for text understanding. And, while there is still some consensus that “it’s just a matter of time,” there is growing skepticism that this is the case, including by some who originally supported this approach (e.g., Geoff Hinton and Riza Berkan).
Machine Learning Value Capped by Lack of Knowledge
Beyond scenarios characterized by narrow domains, abundant sample data and the automation of simple tasks, it has been nearly impossible for ML to obtain relevant results. In the meantime, information overload has made the benefits of machine reading and understanding increasingly visible and relevant across every industry. This has created a renewed interest in a more traditional approach based on a human-like understanding of language that leverages a rich knowledge base.
Machine learning techniques can, in certain situations, add value. However, if we care about progress in AI-based natural language processing, it is imperative that we start by having an honest discussion about the evident limitations of this approach, and that we don’t run away from complexity when it comes to issues like reading and understanding, which are known to be inherently complex.
Tasks that are complex, yet essential for humans to understand language (e.g., building a reliable knowledge base and leveraging that knowledge to understand the meaning of a text) are equally essential for the software designed to replicate this activity. Similarly, investing time and effort into building a vast and comprehensive knowledge base will improve performance of both humans and software.
Why the Knowledge Graph is Central to Natural Language Processing
In the software world, we generally discuss NLP in reference to a knowledge graph. A knowledge graph is a representation of the real world in which concepts are defined and connected to one another by different kinds of relationships.
They can be broad or narrow, depending on the comprehensiveness of the domains they cover. They can also be deep or shallow, depending on how they represent the knowledge of a certain domain from its core to its more specific elements. The more the knowledge graph expands in breadth and depth, the better it understands the language. The better it understands the language, the more useful it becomes.
There is no such thing as a standard knowledge graph structure, as they can be used in many different ways, in many different scenarios. In this post, we will focus on the structure and content of a knowledge graph that understands text of any type and domain and is flexible enough to grow in tandem with its usage.
The Foundation of a Knowledge Graph
In a knowledge graph, each item is linked with one or more other items (where links represent the relationships between them) and each item has a set of attributes that describes the characteristics of words and concepts. The links and attributes of a concept (e.g., a motor vehicle has wheels) are transferred effortlessly to the more specific concepts in the chain (e.g., a car) in the same way humans do when they understand the meaning of a concept.
This simple feature ensures that the knowledge of the software can be extended with limited effort because there is no need to repeatedly input this information each time a new concept is added to the knowledge graph. With broader, deeper and richer knowledge graph concepts (in terms of attributes and relations), your system better understands the meaning of text straight out of the box.
The Core Benefits of a Knowledge Graph
More traditional systems based on engines that perform deep linguistic analysis and rely on a rich knowledge graph offer several practical and pragmatic advantages. The following two stand out the most:
- A knowledge graph enables any natural language software to start from a higher level of accuracy, therefore requiring much less work to reach your desired level of performance. This is true for any NLP software, including those based purely on machine learning.
- In any domain, new concepts and knowledge requirements are frequently introduced. Knowledge graphs can help your NLP system maintain its level of performance by easily and incrementally expanding its knowledge base. This enables subject matter experts to understand the structure of the repository and easily identify where to add the new concepts, rather than completely retraining the model (as you would with machine learning).
As NLP use cases become more complex, the knowledge graph becomes more essential to the efficacy of the model. By providing you with an understandable rule-based knowledge structure, the knowledge graph allows you to easily add new concepts and adjust rules as you go so you maintain a stable level of performance…and do so at minimal cost.
Knowledge is fundamental to everything we do from both a personal and professional standpoint. More importantly, it is vital to improving ourselves and our work.
So put knowledge front and center in your natural language model. Only then can you truly realize the full potential of language.