Business Intelligence Best Practices - BI-BestPractices.com

Collaboration. Communication. Community.

 
 
 Printer-friendly
 E-mail to friend
ADVERTISEMENT
BI Case Study: There's an (Analytic) Method to Their (March) Madness

by Linda Briggs
One thing you can say for sure: predictive analysis needn’t be boring. Just ask two university professors who applied the techniques to select competitors for the NCAA’s annual March Madness event—w

Every year, the men’s college basketball season ends with single-elimination tournaments lasting through most of March. Which teams qualify for the tournament is traditionally a subject of heated water-cooler debates and (somewhat) clandestine betting. In fact, money wagered on the men’s tournament now nearly outranks the Super Bowl. Given that, the National Collegiate Athletic Association’s selection committee comes under tremendous heat each year as it secretly selects the 65 teams that will compete, using a wealth of team-performance statistics.

For the fifth year in a row, two professors used software from SAS Institute to correctly predict which teams the committee would select. This year, the professors missed just three teams of the 35 at-large selections (30 of the teams are entered automatically based on their records). That gives them an overall accuracy record of 93.6 percent over the five years that they’ve applied their formula.

The two professors, Jay Coleman and Allen Lynch, used the same sort of predictive analysis that your company might use to forecast customer behavior for an upcoming direct mail campaign, to decide whether a given set of customers is a good credit risk, or to predict a possible problem in a manufacturing process.

To create the equation, the two used a tool called SAS Stat, along with their own expertise in statistical modeling. Their misses this year? They chose the University of Texas at El Paso, or UTEP, along with Air Force and the University of Richmond, none of which made the final cut. (Instead, the committee selected Louisiana State University, or LSU, Notre Dame, and Utah State.)

SAS Institute, with $1.2 billion in annual sales, is a private company based in Cary, NC and a leader in business intelligence software. SAS has more than 3.5 million users worldwide, and its customers include most of the largest companies in the world.

For the first time this year, the two professors also turned their analysis on the women’s college tournament, picking 31 teams correctly in the field of 34 at-large teams, also for a 91 percent accuracy rate. Misses on the women’s side: Maryland, Missouri, and Mississippi, all of which made the tournament after all.

The professors’ rankings don’t predict who will win the tournaments overall, or where teams will be seeded. It’s merely a predictor of who will make the field. And they stress that this sort of software is not a substitute for creative thought—it’s a decision-making aid.

The two bring ample knowledge of statistics and economics to the problem: Coleman is an operations management and quantitative methods professor at the University of North Florida; Lynch is associate professor of economics and quantitative methods at Mercer University in Macon, Georgia.

“The statistics are almost remarkable,†Coleman says about the NCAA tournament and the tool they call the “Dance Card,†after the NCAA’s nickname of The Big Dance. “We know that analytical processes work, but we’re always a bit surprised when it works so consistently and well every year.â€

So far, there haven’t been any indications from the selection committees for either the men’s or women’s tournaments that they might consider using the Dance Card tool in making their selections. “I’d love to see the committee actually use this,†Coleman says. “It could be a good tool for them. The only feedback we’ve gotten is negative, but that may change over time. We would love to see it as a decision aid.â€

Coleman suggests that software could help the committee in last-minute cases, such as surprise upsets near the end of the process. He cited a situation in 2003 when the University of Nevada at Las Vegas (UNLV) lost a surprise game at the very end of the selection process, then didn’t make the cut. The team was clearly selected by the professors’ equation for inclusion, Coleman says. “That’s a flaw in [the selection committee’s] systems— there’s not a lot of time to re-evaluate,†he speculates. “Maybe that’s why we get some odd picks sometimes.†In a case like UNLV’s, a software tool could help by rapidly reshuffling huge amounts of data at the last minute.

Predictive Analysis Is Everywhere
Using predictive analytics on something as everyday and popular as the NCAA basketball tournament can help people realize how pervasive data analysis techniques are, according to Anne Milley, director of analytical strategies for SAS. Milley worked closely with the professors on the analytics side.

“I love it when things like [the NCAA tournament] happen—it lets people know what can be done. People can relate to it. The tournament is entertainment for most people—this helps them realize that this sort of thing touches us in our everyday lives.â€

SAS customers traditionally have been among the largest in the world—firms that use tools such as data mining and predictive analysis to make large, dollar-laden predictions about how customers might act in given situations. But smaller companies are increasingly interested in doing this sort of intensive data analysis, Milley suggests.

“A lot of companies are realizing that their data is a strategic asset—the more they can do with it, the better they’ll be,†she says. “Formerly, SAS has been well-penetrated into the Global 500—the largest companies. We’re seeing interest in this more from smaller companies.â€

The Supermarket/Basketball Connection
Sports in general are a data-rich environment where ample statistics are gathered at every game, Coleman says, but by business standards, it’s not nearly as rich as you might find in other applications. As an example, he cites supermarket loyalty cards, in which stores record every item sold at every transaction, along with a wealth of associated customer data. “You might have tens of millions of observations annually,†Lynch says. “That’s when you need this type of software.â€

In the NCAA project, Coleman and Lynch weighed approximately 420 teams with a “reasonable chance†of making the final cut. They then collected the selection committee’s decisions since 1994, and looked at 42 pieces of information about each team (including conference records, number of wins and losses, and RPI, or Ratings Percentage Index, a publicly available set of statistics about each team).

The trick, Lynch says, is to estimate which pieces of information were most important to the selection committee, and use that information to make a good forecast. Interestingly, only six pieces of data per team ended up being important in predicting the at-large bids—a determination that the SAS software helped them make. “We’ll make some guesses on the front end on what variables will matter,†Lynch says. “[The software] determines which really are.â€

Both professors are big fans of SAS, having used it for data manipulation since graduate school. “What SAS allows us to do with large data sets is almost astounding,†Lynch says— such as handling millions of customer observation files, then merging and truncating data very quickly. “I don’t want to sound like a commercial… but the market advantage that SAS offers is captured by its data sets— what you can do with data.â€

“When you’re looking at huge data sets,†Coleman concurs, “just looking at numbers doesn’t tell me a darn thing. There are thousands of observations I might have. One thing that this type of analysis brings to the table is the ability to get your arms around all that data, as opposed to a bunch of numbers.â€

The right data provides a chance to look back and trace your steps, Lynch says. “Where did we go wrong; where did we go right? It’s the only real record of business performance. You have to analyze it in order to understand it.â€

“If you don’t,†Coleman says, “someone else in your industry is going to, and you’re going to be left behind.â€

User-Friendly Data Analysis
In the future, Milley, Coleman and Lynch all predict that advanced software packages will let users do more without having to be statisticians. Convenient graphical interfaces will help remove users from the most rigorous aspects of data analysis, making it “integrated behind the scenes and more for the average person,†Coleman says.

That, Milley argues, is a good thing, since it brings the users who actually know the data best closer to the process.

Future software packages will also see more integration of data analysis tools, Lynch says. Convenient graphical interfaces, he predicts, will let users do more complex manipulations “without having to know the hardcore stuff. They’ll be more removed from the rigorous aspects.â€

If the software is well-conceived and designed, he maintains, “it will tell you when you’re heading in the wrong direction or the right direction.â€

What does the future hold for the Dance Card tool? “Actually,†Coleman says, “there are a couple of [other] things that we could do with basketball. We haven’t done it because of random variability. We could predict winners. We could try to predict seeds, once we know the field.â€

Coleman has also looked at college football rankings—perhaps in the interest of finding a better system than the notorious BCS rankings that drew huge criticism late last year. But finding a better way to rank college football teams isn’t easy, according to Coleman. “Actually, it’s a difficult mathematical problem. There’s a lot of circularity in there.†He has developed a model for a football equation, however, and now needs to pull data together from past years to try it out.

That means that you might want keep an eye on the professors—they could help you walk off with next year’s office football pool as well.

Linda Briggs -

Linda Briggs is the Founding Editor of Microsoft Certified Professional Magazine and a former senior editorial director at 101communications. Based in San Diego, she writes about technology in corporate, education, and government markets. She can be reached at lbriggs@lindabriggs.com.