Data and Learning Analytics


This presentation by Richard Palmer at the JISC Digifest is a provocation not a promotion for learning analytics! There is so much bias in this piece that it’s hard to decide where to begin with taking it apart. It’s full of empty generalisations and predictions, so I start with issuing a WARNING – this read might damage your views on human teachers.

The argument that computers don’t come hung over, stressed or loaded with bias to work has been used many times, especially at a period in technological evolution that was led by the illusion that computers are infallible and perfectly neutral. For a long time, we know that neither is the case. Apart from this already idealistic view, computer systems and networks pose the greatest of all security risks to our societies (from petty criminals to cyber warfare). Why then should we or would we entrust them with education?

The vision presented is that computers can improve attainment, progression and the educational experience of students, thus leading to better grades and retention rates. This is naive at best! Computer-led education too is human managed – by humans that are no pedagogues but algorithmic programmers, statisticians, and profit makers, mostly ignorant of human psychology and development. The implication here is that human teachers wouldn’t know how to improve attainment, progression and experience, but that’s wrong, for it is the human collective (aka. education system) that sets these quality indicators of what it means to be successful. It is determined by micro-to-macro economic thinking of the labour market, politics, parents’ ambitions, value systems, social needs, and so forth. Not so long ago, humanistic education and educational selection was seen as the quality benchmark, now it is market education (employability, entrepreneurship, civic compliance) and massiveness that characterises it. Can machines set these values? Should they?

Another dystopia is promoting services connected to the idea “all watched over by machines of loving grace”, where systems survey your every move and then bug you with sending messages with “can I help you…?” This is followed by the thought that machines would judge work efforts and “intelligence” of students and then again bugs them with support spam if they perform under their algorithmic prediction. The mention of “objective criteria” makes me laugh in this context, in a world where fake news and post-truth knowledge dominate the headlines every day. The age of objectivism is long over.

A criticism against humans expressed by Palmer is that they are sluggish in changing and cling to ways things have been done in the past. I disagree with such a prejudice. Human history has always been polarised between progress and conservation. We tend to cling to the “known” because routines of the familiar help us be efficient in terms of our brain power and energy consumption. Innovation is always connected to risk assessment, but it isn’t fair that there hasn’t been evolution beyond the natural. And, after all, we invented machines to change production processes (like the mentioned looms).

Comparing the guidance of a young student with driving a car in complexity actually answers itself – a car is just another machine with simple mechanical responses. A student embedded in society is a complex system that has more to do with chaos theory then with algorithms. True, computers don’t turn up hung over or stressed, but they do crash frequently, and network failures cause entire workplaces to stand still. What’s better? Well, can you communicate with a crashed computer? And, turning it around, with a hung over student I can still communicate on some level – even if it is just to buy him time to recover! Can a computer do the same and understand why he does it?

Here’s a good thing about humans that’s not in the paper: We can think flexible and context specific, whence we can accommodate special needs and wishes. Compare this to the experience in a wifi-governed ordering system in a restaurant (when asking for rice instead of fries) or to a computer till at the supermarket – will it tell you that you can get two for one if the algorithm is dominated by maximising profits for the company? Will it send you to the shop opposite because they have a better offer? Such things happen every day between humans. They are not regular programmed events, they are social interactions.

Yes, technologies will improve. Yes, there is a danger that machines will replace human teachers (at least in certain functionalities), but it is preposterous to anticipate that this will lead to a better world!

{lang: 'en-GB'}

Happy were the days when righteous Hollywood heroes persisted in their search for the truth despite all obstacles and deterrents.

The reality is, of course, very different. We live in the post-truth era, where (true) information is declining in value and belief is everything!

The bizarre thing is that we have more data than ever before, yet less facts can be deducted from it with certainty. Why is this? Let’s start with the economy, where data has been long used to find the best solution to anything and investment. Economic data despite its abundance holds any answer someone searches for. Algorithms dictate the value of stocks, prices and futures. They change in milliseconds and interact with one another – they are connected. People cannot keep up with the speed of machines and have no way of understanding what triggers events in machine agents. People then resort to plausibility, trust and beliefs (such as brand value). Companies can no longer be valued in real worth or assets, only in virtual billions that could be halved the next day.

News is another information channel that has become unreliable as a source in the search for facts and “truth”. News media are profit making organisations that feed on popularity not on truthfulness. What’s popular doesn’t have to be true. Any scandal is better than the truth. News media love Trump because he provides them with popular stories. Trump loves the media because they give him publicity and he doesn’t have to admit to anything they write about him, no matter how scandalous. Post-truth holds no proof. Even if the Intelligence Service would publish the Russian link to the election hack – it could be easily refuted as fake news or dark plot against him. Secret services are not employed to serve the truth, but to serve the enacted politics of the government (and their own interests). The electorate knows this. People only believe what they want to hear to confirm their beliefs!

Even in criminal courts finding an objective truth isn’t as easy as one would expect with all the forensic tools now available. Why else would we find this proliferation of criminal and terrorist activities with comparatively little or no convictions for lack of proof? Risks for criminals or extremists of getting caught and convicted are minimal – which itself sounds contradictory to the age of Big Data, ubiquitous surveillance, and integrated systems.

Far from finding an objective truth, even subjective truths are an endangered species. In a post-truth society an almost religious belief in perceived realities has developed that finds confirmation and amplification wherever data is abundant and where sources are better connected. Proof can no longer be provided.

{lang: 'en-GB'}

There have been persistent calls for transparency and algorithmic accountability in learning analytics. Quite recently, there was a discussion at an LASI event in Denmark on that topic.

There are good arguments for more transparency in developing and delivering learning analytics products. Presumably, teachers can derive better informed interventions from visualisations of learning data when they understand what goes in, how it is weighted and processed, and what comes out.

However, the discussion also moved very much into the direction of “personalised learning analytics” with questions like “at what point of comprehension educators are happy to trust products”, or asks whether this might be achieved if “the analytics system demonstrates to your satisfaction that it is attending to the same signals that you value”.  It goes on to challenge vendors (and researchers) that “information should be available and understandable to different kinds of learning scientists and learning professionals”. Ulla Lunde Ringtved asks: “do we need a kind of product declaration and standardization rules to secure user knowledge about their systems?”

I think this is going too far, without much hope and without much value to end users. After all, we are talking about “products”, i.e. ready made things. Vendors would not and could not deliver out-of -the-box, build-your-own, tweak-the-data, customise-the-algorithmic-process learning analytics tools. And data consumers would not want it! Teachers and students are surrounded by black boxes of all kinds, including Google, Blackboard and other VLEs, Facebook, etc. There is evidence that lack of transparency has no correlation to trust. In our lives, we don’t understand most of the tools that we use: the digital camera, the electronic alarm clock, and so forth. And we don’t have to! As long as they work.

{lang: 'en-GB'}

There is much uncertainty about ethics and privacy in learning analytics which hampers wider adoption. In a recent article for LAK16, Hendrik Drachsler and I tried to show ways in which trusted learning analytics can be established compliant with existing legislation and the forthcoming General Data Protection Regulation (GDPR) by the European Commission, which will come into force in 2018. In short, four things need to be established:

  • Transparency about the purpose: Make it clear to the users what the purpose of data collection is and who will be able to access it. Let users know that data collection is limited to fulfil only the intended purpose effectively.
  • Informed consent: Get users to agree to data collection and processing, by telling them what data you are collecting, for how long data will be stored, and provide reassurance that none of the data will be open for re-purposing or use by third parties. According to the GDPR, approval can be revoked and data of individual users must then be deleted from the store – this is called “the right to be forgotten”.
  • Anonymise: Replace any identifiable personal information and make the individual not retrievable. In collective settings data can be aggregated to generate abstract metadata models.
  • Data security: Store data, ideally encrypted, in a (physically) safe server environment. Monitor regularly who has access to the data.
{lang: 'en-GB'}

images

This is an interesting thought: Tore Hoel and Weiquin Chen in their paper for the International Conference on Computers in Education (ICCE 2016) suggest that the forthcoming European data protection regulation (GDPR), which is to be legally implemented in all member states by 2018, actually may drive pedagogy!

As unlikely as this may sound, I think they got a point. The core of the GDPR is about minimisation of data and use limitation. This restricts data collection to specified purposes and prevents re-purposing. It puts a bar on random collection of users’ digital footprints and sharing (selling) them for other – not clearly declared – purposes. This restriction to minimisation and specific use in turn will (perhaps) lead to more focus on the core selling point, i.e. pedagogic application of analytics.

I have previously articulated my concerns that most institutions intending to use LA applications will have to rely on third parties, where, at present, it isn’t obvious that they comply with Privacy by Design and by Default principles as demanded. Additionally to making their case to the educational customers about protecting the data of learners and teachers, there are now more pressures on them to provide tools and services that actually improve learning, not the revenue in advertising or data sharing. So, yes, I am optimistic that Tore and Weiquin are right in saying that this presents “an opportunity to make the design of data protection features in LA systems more driven by pedagogical considerations”!

{lang: 'en-GB'}

If you are like me on different scholarly social networks simultaneously, you probably asked yourself this same question: why do my analytics diverge so greatly between these platforms?

I have one article that has been cited 208 times on Google Scholar, 106 times on ResearchGate, and only 7 times on Academia.edu. Another more recent one shows a different distribution with only 1 citation on Google Scholar, 0 on ResearchGate, but 17 on Academia.edu.

2016-08-09_133920

There are several possible reasons for this. Firstly, although I cannot exactly remember, I might have uploaded the papers at different times to different platforms so there may be a time lapse issue involved. Secondly, my social networks (following and followers) in these platforms vary, despite a large overlap. Thirdly, the platform audience might differ, as some might be more prominent in one country or language than in another. Fourthly, however, and that’s my point for writing this post, the analytics involved in each tool vary and send out different messages, as I have contemplated in this other post.

The remaining question is what to do with this “information”? Shall I go and add all the sums together? Or are they counting the same citation in every tool? Should I go and boost my profile in the underrepresented platform by filling in more of my personal data, metadata interests, or follow even more people? Shall I perhaps start a marketing campain by spamming people with e-mail links to my articles on platform X? As long as I do not know how the figures are compiled, the system remains a biased blackbox that I can take on surface value. It may even use my figures for some other purpose than merely telling me how popular a scholar I am. Let’s not forget that the providers are in fact competitors in another world, so for them to reassure me that in their platform I get more citations than in the other, they are doing themselves a favour. I myself only have the option to decide which of the figures and platforms I trust and which ones I don’t.

{lang: 'en-GB'}

If you build them yourself, learning analytics tools can do what you expect them to. That is the idealised scenario in the learning analytics community, meaning that in order to get valuable insight and foresight from your learners’ data, one should start with a proper learning analytics design! This includes what data will be collected, how data will be cleaned, what the relevant indicators and weightings are, and how data will be processed using appropriate and tested algorithms.

AAEAAQAAAAAAAAXuAAAAJDA4Njg3ODliLWVjYjQtNGM1MS04YTkzLTM0YTcwODBiNDY5NA

However, more often than not, learning analytics is conducted via third party tools, such as VLE platforms, twitter youtube or facebook APIs, or separately sold tools. These tools are intransparent and sometimes open to changes out of control or even visibility of the user. Using built in analytics tools from third party software requires caution in its interpretation, for the algorithms may be biased toward some purpose other than achieving better understanding of learner behaviours.

Naturally, we cannot assume that every institution will build their own well designed learning analytics environment, and even if they did, modern networked learning using cloud-based services will always limit its scope. I do think, however, that transparency of the underlying engines is important, and that, just like with the terms of service, notification of changes to the algorithms would give a more transparent experience and thus higher validity for learning analytics.

 

{lang: 'en-GB'}

I am always wary when it comes to hyping a new technology. As the recent LAK16 global conference has hinted at, Learning Analytics may just have reached the height of the Gardener hype cycle.

Gartner_Hype_Cycle.svg

Sure, Learning Analytics has its promises to create new insights into learning and a new basis for learner self-reflection or support services. But it is dangerous to expect it to produce “truth about learning”! A forthcoming paper I recently reviewed covers the promising influence LA has on the Learning Sciences and rightly demands that more learning theories should be put at the basis for LA, but, as Paul Kirschner expressed in his keynote presentation, there are many types of learning and often in LA research and development they are simplified and generalised.

To correctly ground our expectations in some sort of reality, we only need to look at areas where data analysis and predictions have long been used to “tell the truth” and to predict the future in order to take appropriate measures: politics, economics, and the weather forecast. Since it is without human unpredictability, the weather forecast has become the most accurate of the data-heavy sciences, yet, even there, the long term predictions still carry a strong element of randomness and guesswork. Do we want to risk the future of students’ lives by basing them on 75% probabilities?

Even where there is higher accuracy, the question may be raised about algorithmic accountability. Who will be held responsible and how can anyone make a claim against a failed prediction. This risk isn’t as present in the commercial world, where an inaccurate shopping suggestion through targeted advertisements can simply be ignored, but in education careers are at stake. From a managerial perspective, while it is scientifically fabulous to have a 75-80% accuracy in predictions of highly specific drop-out scenarios, there is a cost-benefit issue related to this. To simply propose that system alerts should trigger teachers’ attention on particular students, and student support services then need to call up that particular student (which they may actually like as much as a phone call from the bank selling new services) doesn’t cut it. As a cheaper alternative, I ,sarcastically, suggested to use a random algorithm to pick a student for receiving special attention that week.

It is also worth contemplating in how much predictions about the success of learners may become self-fulfilling prophesies. Learning Analytics predictions are typically based on a number of assumptions forming the “student model”. One big assumption is that of a stable teaching/learning environment. If everything runs linear and “on rails” then it is relatively easy to say that the learning train departing from station A will eventually reach station B. However, it’s nowadays well recognised that learning is situated and human teachers are didactically and psychologically influencing the adaptivity of the learning environment. It would, in my mind, require much higher levels of intelligence for algorithms to achieve the same support as human teachers, but if it did, what would then become of our teachers? What would be the role of human teachers if LA and AI take over decision making? What qualities would they need to possess or could they just be obsolete?

We cannot neglect the human social factor in other ways too: quantifying people inevitably installs a ranking system. While a leaderboard scheme based on LA data could be on the one hand a motivating tool for some students (as is the case in serious and other games), it could also lead to apathy in others when they realise they’ll never get to the top. The trouble is that people are being metatagged by analytics and these labels are very difficult to change. They also may exercise a reverse influence on the learner in that such labels become sticky parts of their personality or digital identity.

As so often with innovative approaches, hypes and new technologies, the benefit of Learning Analytics may not lie in what the analytics actually do or how accurate they are, but in a “side-effect” that is somewhat unexpected. I see part of the promise of learning analytics in starting a discussion on how we take decisions.

{lang: 'en-GB'}

I am (positively) surprised at the level of critical self-reflection that’s happening at this year’s Learning Analytics conference (see the twitter stream #LAK16). Even the keynotes by Mireille Hildebrandt and Paul Kirschner questioned the validity and acceptability of using big data in education and highlighted potential dangers. The audience too shared these mixed feelings with questions like: “why should people (e.g. parents) sign up to this? What’s the promise?”

The two critical themes that emerged aren’t technical. They are about the ethical constraints put on the use of personal data and the validity and use for learning. Both these “soft” issues are present in our design framework for LA. The ethical and privacy concerns and how to perhaps deal with them, are discussed in our LAK16 presentation and full paper.

I see this as part of a maturing process of the community. Being enthusiastic about LA is one thing, being aware of the pitfalls and limitations is another. After all, should it turn out to be a dead horse, there is no point in flogging it. On the other hand, if there are benefits that outweigh the counter arguments, then, by all means, we need to have answers. Doing analytics just because we can isn’t a purpose or a justification.

{lang: 'en-GB'}
For some time now, universities have started calling students “customers” and charged them ever rising tuition fees. It seems this message has finally sunk in with them and turning the relationship between students and their institutions on its head in that students are now beginning to see payment of fees as a contract to obtain a qualification in exchange for money.
With accelerating costs to study, students are no longer taking silently whatever is given to them. The marketing machine of modern HE promising excellent services and highest quality studies is being scrutinised and carries the danger for HEIs of being challenged by unsatisfied customers who don’t feel they are receiving value for money. The consequences of this change in attitude can be seen, for example, in the case where a Swedish University College is being sued by a US student whose course did not match the level of quality promised.
I already previously noticed that especially mature students were very wary of how they were serviced on a course. There was a sincere dislike for peer tutoring, peer assessment, flipped classrooms and other innovative models of teaching. They saw their payment as an entitlement for being taught by an “expert” teacher not by other novices! Front-up lectures was what they felt they paid for, and it was quite difficult to change such expectations and to open them up for modern teaching/learning practices.
Following this research report, the THES summarises that “Universities are misleading prospective students by deploying selective data, flattering comparisons and even outright falsehoods in their undergraduate prospectuses”. The Guardian adds “that the prospectus belongs to the “tourist brochure genre”, but that young people don’t always realise that”.
Another possible legal battleground may involve implementations of learning analytics. It is quite conceivable that students before long may sue their university for not acting on data and information the institution holds about them. Universities have a fiduciary duty towards students and their learning achievements. Improved learning information systems and data integration has the potential for ringing alarm bells before a student drops out of a course. At least that is the (sometimes exaggerated) expectation some learning analytics proponents hold. Customers failing will now perhaps claim that the institution did know about their potential failure, but did not act on it.
{lang: 'en-GB'}

Next Page »