BI SEARCH AND TEXT ANALYTICS - The Data Warehousing Institute

More documents

Recommendations

Info

BI SEARCH AND TEXT ANALYTICSanalysis) require mined and extracted data in a format (typically a flat file in recordformat) that the tool’s algorithms can understand. Many applications that rely on datafrom text analytics involve complex business models that are hard to represent in datamodels, like predictive analytic applications for risk and fraud. Each application hasnuances for modeling.• Data flow. Further processing of entity records is usually required when multiple tools (withmultiple data model needs) will operate on it. In these cases, the usual best practices of dataintegration apply, such that entity records and related data may be subject to a data flow thatstages and transforms data along the way.• Scalable storage. Technical users interviewed for this report explained that whenunstructured data is converted to structured data, the volume of storage increased by a factorof 2 to 10, due to the overhead of metadata and indexing. Be sure to run tests to quantifythis before attempting capacity planning for storage and server throughput.Entity Clustering and Taxonomy Generation as Advanced Text AnalyticsEntity extraction is fundamental to producing the structured data that a BI technology stackdemands. But text analytics tools may also perform more intelligent tasks, like entity clustering ortaxonomy generation, which have their place in managing unstructured data for a BI purpose.For example, entity clustering is very useful for the discovery of entities in unstructured data.Clustering entities shows you hot spots among entities found in the mined text sources, likerecurring customer complaints and product problems. The clusters are analytic by nature, in thatthey reveal in what proportions entities are referenced in a body of text sources. This, in turn, helpsyou decide which unstructured data sources to eventually mine via text analytics, as well as whichentities and entity relationships to parse for.Taxonomy generation is likewise useful in advanced configurations of BI search. When appliedto search results, taxonomy generation improves relevancy by organizing the result set. Entityclustering might also be applied to a BI search result set, producing an analytic view of it.Text Analytics Coupled with Predictive AnalyticsWhen interviewing users for this report, TDWI Research encountered a few users who’ve developedapplications that include both text analytics and predictive analytics, although they sometimes callthem text mining and data mining. Consider the following use case, told in the user’s own words:USER STORYAnalytics based on miningapplies to both structuredand unstructured data.“A lot of claims data is unstructured, and you can learn a lot by mining it,” said a risk managerat a large insurance company. “For example, when a loss is reported to our call center, a lot oftext is collected, mostly rudimentary facts about the loss. This is logged in a transactional system,and activity logs collect more information as the claim is processed. Eventually, all data about aclaim—whether structured or unstructured—goes into a claim master record or CMR. For years,people inside and outside the company have been able to view CMRs as they work a claim. Recently,we started mining CMRs to learn more about our customers and internal efficiencies, as well asto watch for risk and fraud. We spent a year figuring out the real value of the CMR and buildingreports. Now we’re mining the CMRs’ structured and unstructured data for predictive analytics.“Our technical team uses the same tools and similar techniques, whether mining structured orunstructured data,” the risk manager continued. “That’s because both are simply data preparationsteps that have to be done before we can run reports and analyses. Text mining—or text analytics24 TDWI RESEARCH
Best Practices in Text Analyticsor whatever you want to call it—is key to converting information in text to structured data thatour analytic tools require. The output of text analytics is often an input for predictive analytics,showing that the two work well in tandem. Without text analytics, we’d be performing predictiveanalytics on half as much data, and getting only half the picture. Our primary business goals are todecrease the funds reserved for payouts and increase the funds acquired through subrogation. Wealso hope to improve fraud models and regulatory compliance. I feel confident that text analyticsand predictive analytics will help us get there.”Text Analytics Applied to Semi-structured DataThis report focuses on unstructured data and how it can contribute to traditional BI and DWtechnology stacks. But semi-structured data is likewise useful for BI when processed by textanalytics, as seen in the following user story. The story also shows that the natural progressionfrom traditional data warehousing to predictive analytics to text analytics defines a low-risk phasedapproach to project implementation:“We depend heavily on EDI for transactions and communication both inside our company andoutside with partners,” said the managing director of business information at a mortgage securitiesfirm. “EDI documents contain a mish-mash of structured, semi-structured, and unstructured data,and all of it’s useful for determining risk, so we mine all of it. Before we got into mining, we alreadyhad a mature data warehouse practice as a foundation. We started with data mining, which toldus a lot, but not everything. So we added text mining to it, such that facts deduced from text feedinto the data mining implementation we’d already deployed. The order of steps we took—from datawarehousing to data mining to text mining—turned out to be a safe, phased approach.”USER STORYStart with a datawarehouse, thenpredictive analytics, andfi n a l l yt e x ta n a l y t i c s .Processing Unstructured Data in a DBMSLeading database management systems (DBMSs) have functions for processing unstructured datawithin a database, typically stored as text fields or binary large objects (BLOBs). Depending on theDBMS brand, these functions do keyword indexing for search, text analytics for entity extraction orclustering, and data mining for predictive analytics.A prominent use case for this configuration involves e-mail. TDWI has encountered severaluser organizations that dump e-mail text into DBMSs periodically, then index it for search andentity analysis, looking for legal and compliance infractions like insider trading and data privacyviolations. Since most DBMSs can create multiple indices of various types, the data can be accessedby various query, reporting, search, and predictive analytics tools.Hence, a DBMS may qualify as a tool for text analytics or search. Oddly enough, this makes sense,because DBMS servers have diverse indexing capabilities and scalable performance. Assumingthat these functions meet user requirements, they can be compelling to organizations that wish toleverage their DBMS investment or to manage unstructured content in a DBMS.Text Analytics and the Future of BIText analytics has the potential to double the volume and breadth of data that is applied to BI. It’snot so much the volume as the breadth that’s significant. Organizations embracing text analyticsall report having an epiphany moment when they suddenly knew more than before, which helpedthem to gain more precision in fact-based decision making. After all, without representation fromunstructured data, a data warehouse is a single truth, but not the whole truth. In another direction,BI is heading strongly into performance management, which assumes accurate measurement; textwww.tdwi.org 25
Page 1 and 2: SECOND QUARTER 2007TDWI BEST PRACTI
Page 4 and 5: BI SEARCH AND TEXT ANALYTICSAbout t
Page 6 and 7: BI SEARCH AND TEXT ANALYTICSIntrodu
Page 8 and 9: BI SEARCH AND TEXT ANALYTICSUnstruc
Page 10 and 11: BI SEARCH AND TEXT ANALYTICSIn rece
Page 14 and 15: BI SEARCH AND TEXT ANALYTICS• Dat
Page 16 and 17: BI SEARCH AND TEXT ANALYTICSa fairl
Page 18 and 19: BI SEARCH AND TEXT ANALYTICSsingle
Page 20 and 21: BI SEARCH AND TEXT ANALYTICSThere a
Page 22 and 23: BI SEARCH AND TEXT ANALYTICSHow wou
Page 24 and 25: BI SEARCH AND TEXT ANALYTICSText Mi
Page 28 and 29: BI SEARCH AND TEXT ANALYTICSanalyti
Page 30 and 31: BI SEARCH AND TEXT ANALYTICSHyperio
Page 32 and 33: BI SEARCH AND TEXT ANALYTICSSybaseD
Page 34 and 35: BI SEARCH AND TEXT ANALYTICSBI sear
Page 36: TDWI RESEARCHTDWI Research provides

BI SEARCH AND TEXT ANALYTICS - The Data Warehousing Institute

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?