Thursday, May 7, 2015

Kind of pissed now about the training we never got. Language lesson: Forecasting, Machine Learning, and Statistics.

Last night I had an interesting "moment" when one of the meetup organizers informed me that by using "machine learning" algorithms, he was able to successfully predict exhaust on his nodes and move traffic before there was customer impact.

uhm..  yeah, so?  Was my first thought.  My second thought was trying to decode what he meant by machine learning since traffic migration has been a part of my life since the beginning of time.  Was there now a newer, sexier term I should rewrite my freakin' resume YET AGAIN for?  I thought machine learning was trying to figure out how machines knew each other in the wild.  Nope.

And being out here in the wild, it just really got to me last night while drinking a pitcher (or three) with the boys (and one girl)... I was forced to take a shitton of useless training and after the late 90's, I couldn't "get out" to get the training at conferences, conventions, and other professional events which really mattered to idea sharing, and learning. </rant>

I've been asking around how people define machine learning (because I'm used to being an idiot) and I find that the term "Machine Learning" is flung around quite a bit.  There seems to be a lot of vaguery around exactly what it means, but everyone I've asked is very sure they understand the difference.  One person this past week told me the difference is that Machine Learning (ML) uses statistics (who doesn't) and results in probabilities, not final quantities.  The discussion here on StackExchange somewhat defies that notion, by hauling this little phrase out:

Simon Blomberg:
From R's fortunes package: To paraphrase provocatively, 'machine learning is statistics minus any checking of models and assumptions'. -- Brian D. Ripley (about the difference between machine learning and statistics) useR! 2004, Vienna (May 2004) :-) Season's Greetings!
When I saw the above, I 'bout bust a gut because bottom line, if you're buying hardware you're buying a model.  If you're sizing a network, that's all model based.  How do you know you're getting what you paid for?

Another thing which to me is "big whoop" (all lower case, notice, please) is that ML uses historical data.  Erm... okay.  whoop-de-fucking doo.  That begs the question of retention periods, granularity, and quantity / access to other data (variables) results.

Then they bring in the "data miners"
Add a third culture: data mining. Machine learners and data miners speak quite different languages. Usually, the machine learners don't even understand what is different in data mining. To them, it's just unsupervised learning; they ignore the data management aspects and apply the buzzword data mining to machine learning, too, adding further to the confusion. 
One person summarized the damned hoopla:
The biggest difference I see between the communities is that statistics emphasizes inference, whereas machine learning emphasized prediction. When you do statistics, you want to infer the process by which data you have was generated. When you do machine learning, you want to know how you can predict what future data will look like w.r.t. some variable.
uhm, sorta full circle and that means ML really looks like "forecasting" to me without marketing's involvement.  

Lord, I'm rolling my eyes now b/c I'm trying to figure out what the fuckin' differences are and why the hang-up on semantics and is the phrase, "forecasting" now?  Apparently not.  We are so outdated.

Also, don't follow me on twitter.  I can't figure out how to change the text of my tweets yet.  I feel like such a feckin' idiot.  



No comments:

Post a Comment