News

Text analytics could help bridge data divide: But is IT ready?

Mark Brunelli, News Editor

The combination of newly mature text analytics tools with advanced enterprise search and business intelligence (BI) reporting capabilities could

Continue Reading This Article

Enjoy this article as well as all of our content, including E-Guides, news, tips and more.

be the key to bridging the once vast chasm between structured data and unstructured content.

But according to one expert, the mammoth task of blending the structured with the unstructured will require IT departments to adopt complex new architectures and unfamiliar ways of thinking, and some IT professionals could have a tough time making the adjustment.

“IT has to pay attention to this,” said Susan Feldman, vice president of search and discovery technologies with Framingham, Mass.-based analyst firm IDC. “And that means a profound re-education, because the skills -- the information retrieval skills, the linguistic skills -- are not what was taught to most IT guys.”

"[Text analytics is] not trivial. It’s a complicated process, and it requires a lot of domain knowledge.”

Bahman Dehkordi, research statistician, State Farm Insurance

Feldman, who spoke during an interview at the Text Analytics Summit 2010 in Boston last week, said the barrier between structured and unstructured data is finally beginning to crumble, thanks mainly to the latest advances in text analytics tools. Text analytics technology helps organizations find valuable patterns within the oceans of unstructured or free-form text that reside in websites, emails and other electronic files.

“You have this great divide between the database world and the content world,” she explained, “and now it’s starting to get pierced, largely by technologies like text analytics being able to feed into BI applications as well as into search applications.”

CIOs and CTOs responding to IDC surveys have for years indicated an overwhelming desire for an easy-to-use application that offers the ability to access both data and content from a single interface. It’s a desire driven by the realization that – according to most analyst firm estimates – as much as 80% of the information stored within a typical organization comes in the form of unstructured and difficult-to-search content.  

Organizations have historically experienced difficulty getting simultaneous and meaningful access to database sources and free-form text. According to IDC, search engines lack the ability to handle structured data while, conversely, BI applications fall short when it comes to unstructured content. Text analytics tools, which have grown up over the last few years in terms of functionality and popularity, are the key to bringing it all together, Feldman said.

“What text analytics does is to layer understanding on top of basic search technology,” she said. “It’s the semantic understanding of the concepts of the names of people, places and things.”

Text analytics tools have evolved, but challenges remain

While several applications today claim to use text analytics to make sense of both data and content on varying levels, CIOs may want to refrain from pulling out their checkbooks just yet.

According to IDC, several challenges to adoption remain. For one thing, building the architecture to support such a product -- or a similar in-house application, for that matter – can be extremely time-consuming and expensive. Moreover, the architecture requires expertise in specific areas that might seem alien to many IT professionals.

“It’s an entirely different thing,” Feldman said. “You’re talking about inverted indices. You’re talking about graph representations, vector space models, and a whole bunch of technical stuff that is not familiar to most IT professionals.”

The good news, however, is that those organizations willing to take on the challenge of building an in-house architecture that bridges the information divide are likely to find that it pays off in the end. The ROI associated with text analytics is generally very high, Feldman said, because the ability to reveal patterns in structured and unstructured content can reduce the financial and opportunity costs associated with marketing research.

“Imagine being able to understand what your customers are saying,” she said. “That means you can find out what they’re complaining about, but also what they’re in favor of, and then take that data and feed it back to the product manager or the sales guys or the marketing people.”

Conference attendee Bahman Dehkordi, a research statistician with State Farm Insurance in Bloomington, Ill., spends most of his time dealing with statistical patterns but has also studied computer science extensively. He said he can understand why IT professionals might have trouble using text analytics to connect structured data with unstructured content.

“It’s a big challenge because it’s two separate lines of thinking to begin with,” Dehkordi said. “The structured side is statistical, mathematical and computer science, and the unstructured text is more about natural language and feelings of sentiment.”

But despite those barriers, Dehkordi said he thinks that text analytics can be extremely helpful in terms of decision making and developing company strategies. He added that he learned a couple of key lessons at the Text Analytics Summit.

“One is that text analytics is doable. People are doing it,” he said. “The second message is that it’s not trivial. It’s a complicated process, and it requires a lot of domain knowledge.”

With text analytics projects, don’t be afraid to hand over the keys

One way to avoid the dangers of creating an architecture that spans information sources through text analytics is to do something that many IT pros, especially open source developers, might find offensive -- get someone else to do it.

“Try not to develop it yourself [even though that] goes against every instinct an IT guy has,” Feldman said. “Look at those packages that are actually developed for line-of-business people to administer and to customize so you don’t have to have that kind of expertise -- because it’s very complex.”