Fri 15 Jun 2012
Web analytics are often seen as the model for Learning Analytics to demonstrate what can be done with lots and lots of data, and how much new insights can be gained from analysing it. In fact, there isn’t that much difference between web analytics and LA other than the domain the statistical output of web behaviour is used in.
I said it already many times on this blog that statistics hardly ever lead to more clarity or agreement, neither in politics nor in economics, where stats are juggled with on a daily basis. In simple words: there is no simple truth in data!
On this blog, I run several analytics engines to see the visitor flow I get. The collectors I use are Statcounter, Google Analytics, and a WordPress extension called StatPress. They nicely illustrate how much of a black box the issue of collecting and presenting data can be.
One would expect that three stat engines come up with the same results in measuring visitors to my site, but far from it. The numbers wildly diverge. During the month of May (1-31 May 2012), Google Analytics captured 391 visitors, while Statcounter counted 533 in the same period. The two added together did not match the 1612 visits that StatPress counted! The latter doesn’t distinguish unique and returning visitors, but the former two also diverge in this with Google showing a 61/39% distribution between unique and returning visitors, whereas Statcounter gives me 76/24% ratio.
We are talking basics here by simply counting visits to a site, i.e. a digital agent sitting with a clicker and clicking every time someone comes to visit. Yet, the numbers are thus that it’s hard to read any meaning into them, other than perhaps in relative terms of growth over time, although I haven’t done this comparative calculation in the three engines. It shows how much depends on what the different agents consider a “hit” or a “visitor”. Automatic filtering of spiders (and maybe also spam bots) is one consideration which could explain the different counts. Still as an end-user, I have no means of understanding how the figures emerge, nor what the definitions for “hit” and “visitor” entail.
It’s clear that the more complex the analytics process, the more uneven the results may be, although, even on this primitive level above, they are already pretty puzzling and contain a loud warning sign not to put too high stakes on such figures.