« Party Shuffle | Main | MySpace and This Space »

February 05, 2006

In Defense of Data Mining

NP: party shuffle

Yeah, I'm being prolific today. Something about pushing melancholy introspection off of the top of the page, and the whole business of not being able to really shut my brain off when I get on a roll.

The far left spits out the words "data mining" the same way the right does "liberal," and that's just not fair. Data mining is not only a useful tool for detecting interesting relationships, it's pretty much the core of what I'm good at in the professional sense, so I can't sit by idly and watch it get pilloried. The problem with the program being decried by the DailyKos isn't that it's a data mining prorgram, it's that it's a bad data mining program.

Any sort of advanced behavior modeling like this requires very heavily on one thing in order to be effective, and that is training. In order to train a model, you need a significant quantity of desired (or, in this case, not-so-desired) outcomes so that you can recognize the correlations that might give you a leg up in future detection. If you're trying to predict whether or not someone is going to fly a plane into a building, for the most obvious example, you only have 19 cases out of however many million individual records on which to base your conclusions.

As we in the business like to say in these sorts of situations, your results are only going to be directional, at best.

The obvious rejoinder is that the intelligence community actually has more data points to work with based on the number of known terrorists that exist in the world, and not just the ones who have done the most dastardly things. But if they know who the terrorists are well enough that they can train the predictive model, then they should also know enough to, oh, I don't know, catch them?

This is why it's very easily and, in my view, very accurately portrayed as a fishing expedition that will yield very little value, despite how many times the White House mentions it and how it would have increased the chances of stopping 9/11. Which, in itself, is typical Rovian semantics, because even an increase of .001% is an increase, and therefore the statement is not wrong.

Comments

Post a comment
Name:


Email Address:


URL:


Comments:


Remember info?



about notabbott.com

what is it?

notabbott.com is not spamming you -- please read

however, if you'd like e-mails about upcoming shows and whatnot, click here

recent entries in MAIN

Domino Effects
March 4, 2015

Housekeeping note
January 2, 2014

Slacker Profiteering
July 7, 2013

In My Defense
June 20, 2013

When A Foul Isn't A Foul
February 5, 2013

archives by month

credits

Creative Commons License
All content on this website (including text, photographs, audio files, and any other original works), unless otherwise noted, is licensed under a Creative Commons License.