Apropos of our discussions of the four Cs, and of correlation

posted Feb 2, 2015, 4:44 PM by Jen Mankoff

"Pornhub (which is apparently the third most-popular pornography site on the Internet) was approached by Buzzfeed (which is probably the most-popular animated GIF distributor on the Internet) to analyze its traffic and determine whether “blue” states that voted for Obama in the last election consumed more pornography than “red” states that voted for Romney. And so, that’s what the statisticians at Pornhub did, pulling IP addresses from their website’s traffic logs, geocoding their likely locations and deriving a figure of total traffic for each state. They then divided the total hits from each state by that state’s population to derive a hits-per-capita number for each state. As a result, they were able to report that per-capita averages for each state and that blue states averaged slightly more hits per capita than red states. ... Unfortunately, the study and the subsequent reporting derived from the Pornhub data serves as a vivid example of six ways to make mistakes with statistics:

    • Sloppy proxies
    • Dichotomizing
    • Correlation does not equal causation
    • Ecological inference
    • Geocoding
    • Data naivete"