Lately, there’s been a large amount of media attention around the hybridization of statistical AI and symbolic AI for language understanding systems. Some have termed this neuro-symbolic AI, in which AI’s knowledge foundation augments the usefulness of its statistical foundation, and vice versa.
Gartner alludes to these capabilities as composite AI, in which organizations employ the full range of AI technologies (including statistical and symbolic techniques) to solve language understanding problems, and others. This hybrid method merges AI’s two traditionally opposed approaches to natural language processing—machine learning and symbolic AI’s rules and taxonomies—so the one’s strengths supports the other’s weaknesses.
For example, users can accelerate supervised learning’s enormous training data requirements by training with knowledge graphs that contain concepts, relations, rules and taxonomies. Unsupervised learning approaches can enrich knowledge graphs with various concepts and relations.
Keys to Implementing Hybrid AI
Currently, there’s a lack of literature about how to properly implement this hybrid approach for language understanding. I’ve found there are three best practices for this undertaking.
The first is to start with the end goal in mind, which is based on the desired results, available data, and budget. The second is to maintain an expert in the loop by involving subject matter experts early and often in this process. The third is to let the complexity of the business problem determine the proportion of statistical and symbolic approaches.
Following these best practices will allow organizations to optimize their language understanding applications by combining symbolic AI and machine learning techniques.
#1: Align Data, Results, and Budget
The first step for implementing a hybrid approach to language understanding is to determine the desired results, available data, and available budget. It’s best to start without preconceptions and to keep an open mind based on these three factors.
There are two types of available data: training data for machine learning and enterprise knowledge for symbolic AI. The former involves examples of outcomes machine learning models will predict. These outcomes need annotations for supervised learning deployments. Enterprise knowledge consists of things like business rules, taxonomies, vocabularies, ontologies and business glossaries.
The results of hybrid language understanding systems typically fall into three categories.
- Cognitive process automation (CPA) which might involve extracting unstructured data from any type of document for use in downstream systems.
- Text analytics which can help a financial analyst identify market trends in news reports, for example.
- Conversational AI which is useful for intelligent chatbots that support customer service, for example.
Your allotted budget is critical to the practicality of implementation. Smaller budgets require the use of simple machine learning methods which only produce incremental improvements to existing language understanding systems.
#2: Keep an Expert in the Loop
The input of subject matter experts is pivotal to any successful language understanding application. Such experts are knowledgeable in the specific business domain employing the language understanding system and do not need to be adept in AI techniques.
These experts provide invaluable assistance to these undertakings in a couple of ways. They know what resources and data are available for both statistical and symbolic approaches. Oftentimes, they’re familiar with the business use case the application is being built for. As a result, they know where to find example data or enterprise knowledge, as well as what aspects of them are most relevant for fulfilling business objectives.
Moreover, their expertise is essential for actually combining symbolic AI and machine learning. When building a supervised learning model, for example, users have to annotate the example data for training. Experts can denote what data is available and how long labeling might take, which may be a significant number of days. However, by assisting in writing symbolic AI rules, experts can expedite that training to far fewer days while overseeing this process and approving the rules.
Though data scientists may be the ones to devise these systems, subject experts are responsible for supervising the content process that shapes them.
#3: Balance Symbolic AI and Statistical AI
Once the data, results, and budget are aligned, use case complexity is the single biggest determinant of how much machine learning and symbolic AI to use for language understanding. The more complex the business problem is, the more reliant it will be on symbolic AI. Consequently, as it becomes more reliant on symbolic AI, it becomes equally less reliant on machine learning.
It is possible to start with machine learning models for basic tasks, then supplement them with symbolic AI approaches as complexity grows with respect to contextual understanding, expanded vocabularies, and the nuance of language.
Users should incorporate symbolic AI in an amount equal to that complexity as it grows—which will reduce the use of machine learning commensurately. The greater the complexity, the more time firms will spend annotating and training models. Symbolic AI can hasten the annotation, model training, and feature generation processes which, in turn, makes the model far more cost-effective.
In fact, this paradigm is widely used in hybrid language understanding approaches. Users leverage the business rules and domain expertise in knowledge graphs to inform statistical model building’s training data, annotations, or feature engineering.
The Best Way Forward
A hybrid approach of statistical AI and symbolic AI is the best option for numerous language understanding use cases—especially those of enterprise caliber, scale and magnitude. That much is perfectly apparent from the attention it’s received. Far fewer people understand how to properly implement this method, and that the key determinants for it are available data, expected results, and budgets, which must be aligned before beginning.
The critical factor is experts in the loop are mandatory to oversee these projects, which otherwise fail without them. Once they’re involved, the proportion of how much of each method is required is predicated on the complexity of the business problem at hand—much of which will be innately understood and articulated by the said experts. Because symbolic AI techniques involve more training and expertise in linguistic capabilities and semantics than machine learning does, they’re at the upper end of the spectrum. Utilizing this method truly allows the experts’ impact on language understanding systems to shine.