In the past few years, an unprecedented amount of information has been created. According to IDC, the digital universe will reach over 40 ZB by 2020 (that’s approximately 1.7 MB of new information created for every human, every second, of every day). A large portion of this information is unstructured and in text format. This has created a need for systems that can read and understand information in a way that is both dynamic and scalable. Enter text mining, a process that transforms unstructured information into structured data that can be analyzed in a traditional way.
What is Text Mining?
Text mining (also called text data mining or text analytics) is a method for extracting useful information from unstructured data through the identification and exploration of large amounts of text. Or, to put it another way, text mining is a method for extracting structured information from unstructured text.
Text mining applies techniques such as categorization, entity extraction, sentiment analysis and natural language processing to transform text into data that can be used for further analysis. Applied to a corpus or body of information, text mining can be used to make large quantities of unstructured data accessible and useful by extracting useful information and knowledge hidden in text content and revealing patterns, trends and insight in large amounts of information.
Why is text mining so important?
These aspects make text mining a great support tool for organizations because it can go deeper into information, understanding and identifying relevant business insight from content, highlighting connections between information within one or more texts that would otherwise go undiscovered using traditional tools or search engines.
Organizations are already relying on information to fuel most of their business processes, and as the amount of information they manage (and the additional information that they’d like to include) continues to grow and diversify, the pressure to take advantage of it, even automate it, in real-time is increasing. Text mining tools can help.
Let’s look at an example of how text mining works. To retrieve information in a typical internet search scenario, you employ keywords to find what you’re looking for, but this results in a large amount of results that aren’t necessarily relevant to your query. Using the techniques described above, text mining applications actually “read” documents in a body of information. It understands the search term or phrase at a conceptual level rather than relying on just keywords. This allows users to perform more complex queries that can effectively evaluate or analyse a variety of documents and sources and even reveal connections and patterns hidden in information.
This allows researchers, analysts and any user to cover a lot of ground—large knowledge bases, open sources, etc. in a single search.
Another example of text mining is available in our latest iQ Report comparing Melania Trump’s 2016 Republican National Convention speech with Michelle Obama’s Democratic National Convention speech from 2008. We used text mining to analyze the transcripts of both speeches to answer the question: From a linguistic standpoint, did they basically say the same thing? In evaluating the main topics, entities and emotions expressed in the speeches, the answer is no. (Check out the full report here.)
In this new scenario, text mining ensures the use of all of your available information thanks to a more effective and productive knowledge discovery that makes you able to make better informed decisions, automate information-intensive processes, gather business critical insights and mitigate operational risks.
Used correctly, text mining can solve high-value knowledge discovery problems in many different areas of application, including R&D, competitive intelligence, patent analysis and market research using sentiment analysis and social media mining.