Course Blog‎ > ‎

The importance of asking the right question

posted Mar 27, 2014, 5:39 AM by Jen Mankoff
This MIT Tech Review article on facebook sharing  summarizes a project using machine learning to explore under what conditions a shared photo will be shared many many times vs petering out after no or a few shares. 
"Cheng and co come to their conclusions by analysing the way photographs were shared on Facebook over a 28 day period following their initial upload in June 2013. The looked over 150,000 photos which were together reshared over 9 million times."

Unlike past work, which tries to determine features of a photo that predicts it being shared a large number of times, Cheng et al. start with a photo that has been reshared a certain number of times, say k. They then determine the likelihood that this photo will be shared twice as many times. 

The reason this is such a good way to phrase the question is that the baseline accuracy (just guessing) is 50%. In contrast, the question that people have asked in the past is highly skewed (most photos are not shared that many times). A baseline of 50% makes a machine learning task more tractable, and their results are strongly better than 'just guessing'. In contrast, if the baseline (just guessing the majority class) is 99% as in a skewed data set, it is very hard to improve on that. So we see here how the problem being solved and the approach being used (machine learning) both need to be considered very carefully in crafting the question being answered.