Wed 23 May 2012
Comments Off on Bad data in Learning Analytics
If there are successful companies because they use good algorithmic prediction systems and an abundance of data (Amazon, etc.), it is only logical that there are unsuccessful ones because of bad algorithms and incomplete/rotten datasets. This question bugged me in an interesting discussion here at OUNL. Is the potential of failure also built into the algorithms and datasets?
The critical pressure point for LA and EDM seems to be the accuracy of the analysis. In a commercial context, if the recommendation on a book in Amazon is accurate 50% of the time, this is pretty good and means that every second suggestion you receive as a customer is something of relevance that you may want to buy. In an educational context, however, the glass may be half empty! A probabilistic prediction of 50% accuracy on the performance of a student may lead to dramatic life-changing consequences.
Bad predictions often suffer from bad data being included in the calculations. Apart from the common “test” instances that lurk in the darkness of system databases and, if undetected, may shift the picture, there are also other factors that distort the results. Data collection often ignores “enmeshed identities” being used for analytics and prediction. A dataset cannot typically distinguish between a single individual and a shared presence in the learning space (group work on a single device). Students who often work together with others on shared devices (laptops, smartphone, lab space, etc.) produce enmeshed fingerprints in their educational data. This may lead to behaviours being attributed to a logged-in identity that may actually have originated from an “invisible” partner.
In a commercial environment like Netflix or Amazon it may not matter much if you watched “Pirates of the Caribbean 7” to keep you kids happy, or whether you bought that book for a colleague’s retirement farewell (Amazon actually does have an option to indicate whether an item is for yourself or a gift). These factors of personal ecosystem may influence future recommendations, but are generally not dramatic in their consequences. But, I’d argue that they do matter in an educational environment where potentially the learning path algorithms may be calculated on similar fingerprints.
While there seems to be no immediate cure available to this ail, especially in the context of increased team work and collaboration of students, the awareness of the issue should influence the rigidity and rigour with which educators consult the data with respect to assessing the performance of individual learners.