March 31, 2008
Joltin' Joe Dimaggio's 56-game hit streak, re-examined - I've heard it said several times that DiMaggio's 56-game hit streak was the most unlikely, and one of the most unbreakable, records in baseball. That claim has always rung false (or at least, unexamined) to me. Samuel Arbesman (a grad student from Steven Strogatz' group) actually sat down with the numbers and wrote a nifty little article for the NYT.
The data/baseball nerd in me is fascinated, but even moreso I wish this kind of science reporting were more frequent. The result here is new and interesting, but the scope and method don't much extend beyond a breakfast table experiment. It is no knock at all on this article to point out that blogs such as Baseball Prospectus consistently feature original work of many times the statistical and methodological depth.
Instead of a Newspaper, I want a NewsInterestingsOrImportantsPaper, and especially so for the science section. It just isn't that interesting to read a breathless dumbed-down relay of a Nature article whose impact can't yet be judged -- not to mention they're all too often chosen by quality of Press Release. I think reader and publisher would both benefit from considering "what's new to the reader" as much as "what's new to the world".
Yes, and I'd also like to know if it says anything about the shape of the actual distribution (interesting to note the high number of 1890's streaks, as predicted). Baseball stat analyses have an unfortunate tradition of using maxima rather than any kind of sensible centroid to characterize outcomes. Drawing conclusions about bulk based on examination of tails is dodgy.
The best way to look at the psychological effect is to use the incredibly powerful Win Probability Added (WPA) and Leverage tools the community has evolved (replacing the clumsy traditional "Close and Late" proxy). I'd expect a ton of scatter -- it's a necessarily small sample, and WPA is I think often a 0-or-lots thing, and you'd only expect to start seeing an effect late in a streak. (I think a streak reaches news interest about every couple years, and usually premieres with a quote like "Yeah, I didn't even know I had a streak going until Smitty told me during BP the other day." Also of course the MLBPA-mandated "God Willing" and "I don't care about the streak, I'm just here to play the game the right way and help my teammates".)
If you could define a good proxy for "psychological detriment" based on WPA it would be interesting to correlate it with the changes in sports reporting and public attention.
I haven't looked to see what if anything Arbesman/Strogatz have published in the scientific or baseball literature, but I might.
Freakonomics Q&A may be relevant -- Flip, you get a question in here?
A chain of links from pablo's link led me to the Bill James College Basketball Safe Lead Calculator which is pretty awesome.
AND if mrflip did read pablo's article he probably clutched his head and moaned upon reading this exchange:
Q: What statistical software do you use?
A: Just Excel.
It's funny -- a lot of the baseball guys seem to not use too much more than this (of course, many of them do use way way more than this). The troubles I had with data extraction and interconnecting the different available datasets is part of what got me started on the infochimps.org project.
« Older Ask Languagelog: How much of our language do we use? | Time to break up the unstoppable Nats machine? Newer »
To post comments to a thread you must login or create a profile.
Are the results enough to show the strong psychological effect that can affect hitting streaks as they approach a record length? Look at their batting averages near the ends of tight games that they're losing, or something.
posted by McD at 05:26PM CST on March 31