Business Intelligence Best Practices - BI-BestPractices.com

Collaboration. Communication. Community.

 
 
 Printer-friendly
 E-mail to friend
  Comments
ADVERTISEMENT
Enterprise BI Search: Implementation Considerations
The simplicity has many people excited about the prospect of bringing Google-like search into enterprise business intelligence (BI) systems.

By Rado Kotorov and Jake Freivald

Introduction

My seven-year-old daughter thinks there is a knowledge genie that her teacher “Googles” for answers. While cute, this anecdote also exemplifies how much Google’s obsession with simplicity has helped build brand awareness, making its name literally synonymous with search. I can foresee generations X and Y being followed by generation S—one that will rely on search to accomplish almost any task.

This simplicity has many people excited about the prospect of bringing Google-like search into enterprise business intelligence (BI) systems. There is something magical about asking a question using a few keywords and receiving an answer in less than a second.

Today, BI search deployments are limited to a few functional areas, mainly Web sites and departmental document directories. This may remain unchanged unless people start to approach enterprise BI search differently from the way they approach ordinary Web search. While end users still need the same simple user experience, enterprise BI search introduces a new level of complexity because of the heterogeneous architecture of applications and data stores.

Organizations must plan for more than ordinary Web search, which primarily addresses the reading and parsing of presentation- oriented file formats and URL-based site crawling. Too few search vendors stress the importance of accessing and indexing the 300-plus enterprise applications and databases that can make or break a BI search solution.

Defining the Scope of Enterprise BI Search

Misunderstanding or intentionally limiting the breadth of an enterprise BI search solution can lead to incomplete solutions, inappropriate vendor selection, and (ultimately) a compromised user experience.

Search solutions must answer all relevant questions, whether they are about detailed data or summaries. Hence, the scope of BI search extends along a continuum from unstructured documents and aggregate reports to individual records and transactions stored in applications and databases (see Figure 1). While a solution can be implemented in stages, selected technologies should enable indexing and reporting along the entire continuum. This simple approach helps map vendor capabilities and determine which offers the best fit.


KotorovFig1

Figure 1: The continuum of enterprise information and data sources.


Unstructured BI-Relevant Content

Web search engines such as Google, MSN, and Yahoo! crawl file directories to find and index upwards of 300 file formats, including presentation-oriented and unstructured files such as HTML, Word, Excel, PDF, images, and multimedia files. These unstructured documents affect BI search implementations because they provide context and substantive detail to reports; for example, court documents often supplement arrest records. Generally speaking, the support of more file formats results in a more complete index with less document preparation. While Web search engines must crawl directories blindly, without prior knowledge of the stored file formats, this feature is less important for enterprises because most standardize on content-creation tools and document formats.

Support for 300 file formats will usually suffice, but consider whether you will need an engine that can integrate proprietary parsers for unsupported file types. Most BI content—especially the structured content—can be transformed into an appropriate indexing format. Since few search engines offer robust transformation tools, BI vendors can fill this gap.

BI-Specific Content: Reports, Records, and Transactions

Reports and transactions are BI-specific content types. Their original formats don’t matter because they can be transformed into, say, HTML or XML for indexing by Google. More important is the need to access data sources and applications to extract and enrich data, making the information meaningful for natural language search. Specialized search engines have started to develop access and integration capabilities, but only BI vendors currently provide enterprise-level capabilities.

Reports—static aggregations of individual transactions—are stored in report libraries or file systems. Search engines can index reports independently or with BI vendors in the same way they index other unstructured documents. But the lack of context makes it difficult, for example, to distinguish one profit report from another among the hits on the search results page.

BI vendors provide value by supplying metadata in the search results that the end user can use to identify the most relevant report. An integrated BI and search solution lets users retrieve reports, refresh the data, and modify the report content—important capabilities when up-to-date reports are required. Only BI vendors can generate entirely new reports from the hits, such as what users would need while searching for inventories that might be out of stock.

BI companies use structured queries to find or filter data in known data sources using known parameters. Search allows users to find data not only in structured (dimensional) fields, but also in unstructured (character large object or text) fields without prior knowledge of the data sources or the parameter values.

Most BI vendors index only reports. While it’s tempting to think that users don’t need anything more, most questions are about the details of individual records and transactions, especially in operational BI. Experts estimate that 80 percent of enterprise data is structured and that, from a decision-making point of view, the value of structured transactional data far exceeds that of unstructured data. That implies that enterprises should focus on indexing structured data first; unstructured content is misconceived as low-hanging fruit because it was the core competency of search engines.

Search engines significantly expand BI query capabilities in this area. BI companies use structured queries to find or filter data in known data sources using known parameters. Search allows users to find data not only in structured (dimensional) fields, but also in unstructured (character large object [CLOB] or text) fields without prior knowledge of the data sources or the parameter values. Thus, customer records can be retrieved by names in structured fields or by customer clues recorded in the free form text fields. Some BI companies provide the missing link through transactional indexing, which includes data access and metadata enrichment.

Transactional Indexing

Search engines can rarely index transactional data without preprocessing and enrichment—what search companies call “content aggregation”—because the raw data isn’t suitable for natural language query. For example, users search for products by names and descriptions rather than inventory numbers, so they need more than the data from a star schema’s fact table. At a minimum, indexing this content requires supplemental look-up values (natural-language descriptions) for all keys and codes.

Transactions can be enhanced by appending data from other tables, databases, and applications, or by pre-aggregating records. Help desk applications, for example, create a new entry for each communication with a customer and relate it to a customer case using a reference key. Indexing each communication record separately will create fragmented search results; not indexing all customer communications will create an incomplete record for searching.

The solution requires enriching the incoming record with the available customer information, re-aggregating all communications into a single indexed message, and passing it to the search engine to replace the previously indexed record. This indexing process flow involves numerous steps: capturing the new incoming customer communication, creating dynamic joins with other tables and applications, running a procedure to aggregate the related case records, structuring and transforming the message into an indexing format required by the search engine, passing it to the search engine for re-indexing, and deleting the prior record.

Vendors have taken different approaches to transactional data indexing:

  • Crawling databases. Web search engines have adopted an approach to transactional indexing similar to document indexing—they crawl tables in databases using SQL SELECT statements. Crawling is an acceptable choice for slowly changing tables, but not for large volumes of frequently changing data that needs to be available for search in near real time. It is also ineffective for applications and highly normalized operational data stores.
  • Passing the search query to the application. This solution relies on some intelligence to determine how to match search terms with applications. It also relies on the application for data extraction and aggregation. This approach works well for simple queries such as stock price information. Implementation becomes more daunting if users can run multiple queries against the same application. In those cases, a self-service application will likely offer more robust querying capabilities and be less confusing to the user.
  • Pushing application data to the index. Instead of letting the engine crawl the records, an application pushes data into the index using a search engine–provided indexing application programming interface (API). The application makes all connections into the underlying data store and has complete control over scheduling, interfacing protocols, and data structures. The scope of effort to configure and use this method depends on the extraction and transformation complexity and the available application tools.
  • Integrating data through SOA and process flows. These same APIs can let integration tools broaden the scope of the index. This requires integration capabilities, including transformation tools, process flow capabilities, and adapters, to define and execute process that capture and enrich transaction data in real time.

The first three methods are application-specific and work in projects with limited scope. The fourth method is generic and addresses all present and emerging search integration needs, but few traditional BI companies have the expertise in modern integration architecture to implement it.

User Interface Augmentation

With search technologies, we’re used to thinking that less is more. When a BI search returns a large number of records, however, simple interfaces displaying search hits ordered by relevancy aren’t enough. Consider a bell curve, for instance: although the right-hand tail is small, it may represent a large number of records in absolute terms. No one has the time to page through hundreds of results, so BI search results must enable interactivity to supplement relevancy. This helps users avoid information overload and easily find the exact information that they need.

Search Results Classification and Categorization

Two methods enhance the filtering of search results: classification and categorization of the hits.

Both methods appear the same to end users. The underlying data is used to group the search results and then present the groups in ordinary tree controls to let the user select parameters and narrow down the hits. This interaction is referred to as guided navigation (see Figure 2).


KotorovFig2

Figure 2: Guided navigation search results with categorization displayed on the left side.


Although they appear the same to users, categorization and classification create groups in fundamentally different ways.

Search companies, which have roots in unstructured data, typically extract categories from the unstructured text using statistical methods. This automates the grouping process, but it doesn’t give information architects control over how records are grouped. BI companies, which have roots in structured data, classify records dynamically. Information architects define metadata about the structures they want to index; this metadata can precisely control how records are grouped.

The two methods aren’t mutually exclusive. Categorization offers the definite advantages of parameterized, searchable structured data as well as unstructured content that contains structured metatags (precategorized, unstructured content). Given the trend of tagging every piece of structured or unstructured content, classification clustering appears to be more complementary to categorization. If the BI search solution provides both methods, the classification and categorization can be displayed simultaneously, providing the user with a robust overview of the data.

As search emerges as the primary information access point, robust metadata will become even more important as it is used to build custom, adaptable navigation interfaces to augment or replace many current application interfaces.

Search Results Analytics

Users need to do more with search results than just filter them. Search returns a data set—one that is potentially quite large—and users will benefit from the ability to manipulate it. Expect vendors to differentiate based on this emerging requirement.

The common capability to sort results by date or relevancy provides little value on large result sets, because the first result page shows only the top or bottom hits. Sorting on metadata categories, a capability that is provided by some vendors, gives users more power to explore and organize large result sets (see Figure 3).


KotorovFig3

Figure 3: Filtering or sorting search results using metadata.


Some vendors have recently added the ability to convert the search results from the standard Google-like display with snippets to a tabular view (see Figure 4). This suits structured data, but, as with all features, not all tabular views are equal; most tabular views provide static data and can be sorted only by date, relevancy, and other predefined categories. Also, server-based sorting operations regenerate the tabular view on each user interaction. In these cases, the user benefits only from a different display compared to the standard view.


KotorovFig4

Figure 4: Tabular view with server-side sorting options displayed in the dropdown box.


Other vendors convert results into a dynamic tabular view that applies calculations, visualizations, charts, rollups, and pivot tables locally in the browser. This opens a whole new perspective on search, making the result set more useful and enabling users to do reporting and ad hoc analyses; for example, comparing data along two or more dimensions, as they’re accustomed to doing with pivot tables in Excel. A user’s search for running shoes might return hundreds of results, which the user could use to compare prices by brand and gender (see Figure 5).

Since reporting and analysis of this type is often done using a data warehouse, it’s not surprising that some vendors require the creation of an intelligent data warehouse at the time of indexing. However, some vendors provide the ability to manipulate the data directly in the browser without requiring any additional technology. Keeping the data and reports self-contained provides additional advantages, such as the ability to save and share them via e-mail.

Ad hoc analytics on search results seems to be the most promising area for creating a true, search-driven BI.


KotorovFig5

Figure 5: Search results pivoted by brand and gender.


Search-Based Reporting

To provide BI search to the masses, you must avoid recreating all the complexities of traditional BI.

For example, if the chosen solution indexes only reports, how will you support a user who needs information that isn’t in any indexed report? In this type of solution, the report usually acts as an entry point that takes the user to the BI world to refine the request. The user may find the information by drilling down from within the report; if not, however, the user must use the regular BI tools to modify the existing report or to create an ad hoc report. Thus, the user has dropped from a simple search paradigm into all the complexities of BI that search should eliminate.

A metadata-based approach provides a different user experience. The indexed records or transactions act as the entry points to BI, and dynamically constructed, metadata-driven report links can take the user to any information resource. For example, a police-records search application can provide, directly from each criminal-offense record, links to the offense details, a summary report of all criminal records for the offender, another summary report on all criminal activities within date and geographic ranges, a crime analysis, and ad hoc reports that are structured by police activity. Any metadata associated with the hit is passed to the report or to the structured ad hoc form.

This BI search solution gives untrained users one-click access to all reporting capabilities without dropping them into any BI tool. The reporting capabilities must be as robust and simple as the search, or applications and tools will remain the preferred point of entry to BI.

Scoping Your First Projects

How can you move from theory to practice and make your search projects successful right from the start? The biggest issue isn’t technology—it’s scope. In fact, don’t talk about technology at all. Find a good application instead. New technology projects have staying power only if their initial applications provide a clear benefit, and search in particular survives only when it makes a knowledge worker’s life easier. To get started, look for two types of knowledge workers in your organization: those who are threatened with information overload, and those with too few easily available information sources.

If information overload is the problem, ask workers what kind of information would be most useful to them. You will often find this information in readily available transactions, which can be streamed into the search engine. For example, when a customer calls a contact center after receiving a product in the wrong color, a customer service representative (CSR) may need to see sales, shipping, product, and customer information, each of which comes from different places. CSRs often have to look into several applications to find the needed information; enterprise BI search helps by capturing, categorizing, classifying, and presenting all of the transactions and reports related to the specific customer question.

Interestingly, the same technology applies to people with too few information resources. They often use only a small number of applications to determine what they should do, and each application requires them to learn its own reports, search functions, data structures, and quirks. Enterprise BI search helps these users by giving them a zero-training interface so they can see all transactions related to their problem anywhere within the organization.

For example, a judicial clerk who helps a judge determine what bail should be set for a suspected criminal may have access only to judicial and police systems that provide information about arrest records, previous rulings, and so on. By incorporating feeds from other systems, such as the National Crime Information Center or the Immigration and Naturalization Service, the clerk can find much more information (suspected terrorist ties, etc.) that may be relevant to the bail decision. Clearly, you must also consider security issues; make sure that your BI search solution allows users to see only the information they’re authorized to view, regardless of its source.

Once you have decided which users you want to start with, find the transactions and existing reports that will satisfy their information needs. Start with just a few high-value transactions—happy users will always come back for more, which can guide your selection of transactions to add in later phases. Integrate these transactions into the search engine, add the search analytics layer on top, and deploy.

Summary

Search and BI complement each other through more than just access to data, reports, and related documents. Together, they expose a rich set of information resources to ordinary users. It remains to be seen whether combined search and BI will go mainstream; however, there are many applications that could leverage their symbiotic relationship, and if the right indexing methodology and technologies are deployed, search may help bring BI to the masses.


KotorovRado

Rado Kotorov, Ph.D, is strategic director for Information Builders.
rado_kotorov@ibi.com

FreivaldJake

Jake Freivald is vice president of corporate marketing for Information Builders.
jake_freivald@iwaysoftware.com