Our NLP Stream is a weekly live session where we talk about all things NLP. Last week, Jacob Berk from our consulting services team joined us to show how to leverage taxonomies for data discovery using the expert.ai Platform. The session, “Successful Data Discovery with Taxonomies,” showed how easy it is to create taxonomies and what they can be used to accomplish.
What Is a Taxonomy?
Before we dig into the session itself, let’s take a quick step back and answer the question: What is a taxonomy? As a concept, a taxonomy is a way of identifying and classifying information into a hierarchical structure. In the digital world, it’s a way of applying structure to content and the relationships between components so that it can be easily found. When combined with AI and NLP technologies, taxonomies make it easier for both machines and users to discover text-based documents, web pages, news articles, social media posts and any other asset in the form of language.
Three Ways to Create a Taxonomy
Now, back to the livestream. Using the expert.ai Platform, we can see three different methodologies for creating a taxonomy:
- Manual taxonomy creation
- Automatic taxonomy creation using the expert.ai magic taxonomy feature
- Importing an existing, pre-built taxonomy
To manually create a taxonomy, the expert.ai Platform uses named entity recognition to mine the language data and cluster main topics. This method recognizes People, Organizations and Geographic entities present in your body of documents and information. In the livestream, you’ll see Jacob import a corpus and how the platform identifies the relevant terms and entities and topics, which will give an idea of the classification and categories on which to build your customized taxonomy.
To make it even easier, you can also use the platform to automatically create a taxonomy without having to do it from scratch. With the magic taxonomy feature, you can import a corpus and the platform makes classification recommendations to help create a taxonomy without all the manual work.
You can also use an existing taxonomy and load it onto the platform. Here, you might use a public thesaurus or another open source taxonomy. You can import RDF files in XML format and the expert.ai Platform will create hundreds or thousands of classifications and subgroups. Then, you can easily add different properties or relationships or properties to customize this taxonomy for your business.
Companies can use these same techniques to normalize unstructured data for different types of analysis, including for:
- Search/Knowledge Discovery
- Email Management
- Sentiment Analysis
- Contract Analytics and Policy Review
- Risk Analysis
By extension, any document- or language-intensive process can be augmented using these methodologies.
Watch the full livestream of “Successful Data Discovery with Taxonomies” on demand to learn more.