March 31, 2008

Are the results enough to show the strong psychological effect that can affect hitting streaks as they approach a record length? Look at their batting averages near the ends of tight games that they're losing, or something. 

Yes, and I'd also like to know if it says anything about the shape of the actual distribution (interesting to note the high number of 1890's streaks, as predicted). Baseball stat analyses have an unfortunate tradition of using maxima rather than any kind of sensible centroid to characterize outcomes. Drawing conclusions about bulk based on examination of tails is dodgy.

The best way to look at the psychological effect is to use the incredibly powerful Win Probability Added (WPA) and Leverage tools the community has evolved (replacing the clumsy traditional "Close and Late" proxy). I'd expect a ton of scatter -- it's a necessarily small sample, and WPA is I think often a 0-or-lots thing, and you'd only expect to start seeing an effect late in a streak. (I think a streak reaches news interest about every couple years, and usually premieres with a quote like "Yeah, I didn't even know I had a streak going until Smitty told me during BP the other day." Also of course the MLBPA-mandated "God Willing" and "I don't care about the streak, I'm just here to play the game the right way and help my teammates".)

If you could define a good proxy for "psychological detriment" based on WPA it would be interesting to correlate it with the changes in sports reporting and public attention.

I haven't looked to see what if anything Arbesman/Strogatz have published in the scientific or baseball literature, but I might. 

Freakonomics Q&A may be relevant -- Flip, you get a question in here? 

A chain of links from pablo's link led me to the Bill James College Basketball Safe Lead Calculator which is pretty awesome. 

AND if mrflip did read pablo's article he probably clutched his head and moaned upon reading this exchange:

Q: What statistical software do you use?

A: Just Excel.  

It's funny -- a lot of the baseball guys seem to not use too much more than this (of course, many of them do use way way more than this). The troubles I had with data extraction and interconnecting the different available datasets is part of what got me started on the infochimps.org project. 

« Older Ask Languagelog: How much of our language do we use? | Time to break up the unstoppable Nats machine? Newer »



To post comments to a thread you must login or create a profile.