Decisions, Decisions: Which Text Analytics Tool Is Right For You?

The opinion is now widely shared that the ability to extract actionable insights and intelligence from unstructured text is crucial for most organizations. So, you have your unstructured data in-hand. Now what? How do you get started? There are so many tools out there – which do you use?

Venturing into the world of text mining and analyzing unstructured text can sometimes feel like walking into a car dealership – you are faced with an array of vehicle models, each with its own set of options. When confronted with these types of choices, you are likely to make trade-offs as you attempt to get the most value for your money.

 

Each of the available text analysis solutions also presents different strengths, challenges, and functionalities. Depending on factors such as the characteristics of the data you are looking to analyze, requirements such as set-up cost and turn-around time, and how you want to share the results, some technologies and tools might be better suited than others.

 

Although some hybrid options exist, the main text analysis tools can be placed into one of the following three categories:

  • Manual Coding is straightforward in that 100% human involvement is required throughout the entire process from defining, identifying, extracting, and classifying concepts within unstructured text. Historically, analysts have been performing this rather labor-intensive task of manual coding, which typically entails multiple iterations of reading through comments, mentally flagging recurring themes, grouping similar themes together to form categories and sub-categories, and then classifying each comment into the most relevant categories and/or sub-categories.
  • Linguistic/Rule-based Text Analytics Linguistic/Rule-based text analytics relies on both the human analyst and a linguistics-driven engine to work together. The engine automatically segments a comment into fragments (often based on parts-of-speech), which are then tagged, grouped based on similarity, and indexed before being extracted as concepts. The human analyst defines categories and creates rules to classify extracted terms and concepts.
  • Machine Learning Text Machine Learning Text Analytics can be supervised or unsupervised. Machine learning options incorporate and rely heavily on statistical or mathematical algorithms to aid in the classification process. Supervised machine learning is similar to linguistic/rule-based methods in that human interaction is essential in the text classification phase. Unsupervised machine learning is just that… it relies heavily on the algorithms to classify data, with very minimal or no human involvement.

 

Ipsos’ extensive experience in working with clients on unstructured data analysis has led us to the conclusion that the expertise and guidance of an analyst is needed to choose the right tool for your particular situation. The analyst will ask you several questions about your requirements and will also examine the dataset in order to determine the ultimate path of the analysis.

 

Following are several of the key factors to consider when making a choice of text analysis tools.

 

  1. Set-up labor, analysis turn-around time Since 100% human involvement is required with manual coding, it’s understandable that it tends to require the highest amount of setup labor and that the turn-around time for analysis is the longest vs. other text analytics tools that utilise algorithms and/or technology.
  2. Big data friendly, scalability, consistency/replicability These factors speak to the ease of processing large amounts of data, data from multiple data sources, and the ability to replicate the analysis while maintaining consistency even as the data set grows. In fact, scalability is at the heart of the benefits of text analytics tools that utilise some type of technology. A high volume of comments ensures that nonmanual text analytics approaches remain time and cost effective over time. The definition of “high” volume will depend on the tool you choose, but as a rule of thumb, several thousand or more comments are required to realize the true benefits of non-manual text analysis Tools.
  3. Ongoing human involvement is typically needed to fine-tune the coding or text modeling. In addition to the obvious cost implications, the amount of ongoing involvement required in utilizing each text analysis tool should be considered in planning for other project management-related needs, such as resource allocation, as well as any ongoing training of new resources.
  4. Transparency Machine learning algorithms often feel like a “black box” as they are less transparent in terms of the inner-workings of the text analysis. This can be particularly troublesome when output or results from the analysis are counter-intuitive or nonsensical, as it makes them more difficult if not sometimes impossible to reverse-engineer, adjust and explain.
  5. Flexibility, level of control following from the previous point, when results are counter-intuitive, the analyst will want to intervene and make improvements to the text analysis algorithm or model. In particular, unsupervised machine learning options tend to offer the least amount of flexibility and control since these tools depend on the algorithms used, without any up-front human intervention or training (hence the term “unsupervised”).
  6. Capture hidden patterns/identify latent dimensions Creating rules within a text analysis tool often involves the analyst inputting explicit references and specifications. It is very challenging for linguistic and rule-based text analytics tools to deal with abstract or latent (less prominent) concepts. Both supervised and unsupervised machine learning algorithms are better than linguistics and rule-based options on this front, as machine learning techniques are designed to look at both the explicit and latent dimensions in the data.

 

As a conclusion, after going through the list of key factors to consider when making a choice of text analysis tools, it is easy to conclude that the analyst is key in setting the text analysis on the right path, carrying out the text analysis in the right way, and interpreting, validating, and contextualizing the results to the individual business questions.

 

The analyst is at the heart of the process in most of the projects that Ipsos runs for clients; technology is an enabler but will not give you all the answers.

 

We recommend that you discuss each of the factors above with your analyst in order to set your text analysis effort up for success.

New Services