Knowledge & Content


Research activities have come a long way. Publication of scientific output has been dramatically upscaled over the past decade through internal and external HE policies. So much so, that it has become a bubble that threatens to burst.

Everybody publishes – lots! When I say everybody, I mean entire institutions being loaded with publication duties that previously only had a teaching mission (FE colleges, teacher training or arts colleges, polytechnics, etc.). In the accumulative system, PhD students too have to produce multiple quality articles instead of the previously required (single) doctoral thesis. This adds tens of thousands of people publishing large quantities of academic papers across Europe. Furthermore, also established universities squeeze the last bit out of their researchers in order for them to remain intellectually visible in a flooded market. Papers have become the currency by which researchers and the intellectual capital of institutions are quantified and benchmarked. Hence, the more papers the better!

In the same way as the production of academic papers has exponentially increased, so has the publishing market. There are now oodles of journals that offer themselves as outlets for academic products (ranging from the unknown start-up to the dodgy scammer). And every week there will be more. In some ways, Open Access, which now finally has taken off in a bigger way, has complicated things. Authors have very little understanding where they could and should publish their stuff in order to satisfy four key demands: (1) be visible to the relevant research community, (2) not being charged for publishing, (3) providing open access to their works for others, (4) being quality assured in a transparent and accepted way.

The entire academic publishing field has been blown out of proportions and over-scaled to the extent where parts of the complex system are no longer functional and therefore prone to system failure. One of these components is the quality assurance from peer reviews. I used to get requests for reviews ever so often from established conferences and journals. I enjoyed it a lot since it forced me to read and reflect, it allowed me to help and support peers in their quest for knowledge and it kept me up-to-date. All in all a good system. However, this type of volunteer-ism and honorary work is not scalable to the extent that the present situation would demand. Reviews are unpaid extra work and lately have developed the tendency of follow-up procedures where reviewed articles boomerang back to the reviewer often several times. At the same time, my own resources should be spent on actively publishing not on reviewing (say my bosses!). It simply isn’t my job to be a full-time lector and editor, and I am sure I am not alone in this.

This situation of supply outstripping (a) demand, and, (b) resources leads, in my opinion, to a spiral of reduced quality. Part of this has to do with lack of thoroughness on part of the peer reviews and slackness in the quality selection processes. But also the relative inexperience with which some authors and reviewers approach the matter of academic publication is amplified by the lack of time for thorough feedback and support. Hence it becomes a systemic failure of the quality processes which are meant to control how much noise is produced and reviewed.

{lang: 'en-GB'}

Fascinating, how fast industry jumps on the MOOC bandwaggon! However, it can hardly come more farcical than this example:

SIPX is yet another spin-off company out of Stanford “created to manage copyrights and deliver digital documents for the higher-education marketplace“. What SIPX identify as a problem is that content owners are faced with piracy that affects distribution and fair compensation. Equally, stretched library budgets to acquire licences hinders professors to prescribe digital materials and students studying them. All well known issues. However, despite their claim that they’re based on research from 2005, I see SIPX merely as parasitic squatters trying to occupy the middle ground between university libraries and students/lecturers (and charging for it).  Here is why:

According to their description, a professor offers a link to a SIPX registered resource, and posts the link. “A student, who clicks on such a link, is authenticated for applicable discounts, pays any necessary royalties, and then accesses the digital content for electronic reading, printing or both, all in a single, seamless user experience.” [link]

In the typical universities that I know, the university library already pays for the usage licence and receives a discount from publishers if available. Especially, if they buy large numbers of a book or buy more stuff from the same publisher. So, why, I ask, would the student now pay for an access licence again, plus presumably an additional charge to SIPX?

Neither is SIPX proposing to protect the copyright of authors or publishers. Despite their problem statement above, it is unlikely they’d go to court to defend the IPR of a content provider. This is typically the duty of publishers not of the brokers in the middle.

Even bolder is their statement on MOOC readiness [link]. With SIPX, “MOOC providers can enrich the student experience with a variety of readings that are otherwise difficult to clear for copyright”. What part in Massive OPEN Online Course (MOOC) did they not understand?

To me, companies like SIPX are part of the problem. The solution is quite simple: Open Access. Professors create their materials as OERs and OAI articles, and should only promote open access materials to their students. Even more so in a MOOC.

{lang: 'en-GB'}

The British BBC has put tremendous efforts into digitising their vast archives, and is now in the process of making content available as linked open data (LOD). This will allow third parties to use the BBC datasets in order to build applications and other useful services on top of this information.

Although this is very much under consideration, making shows and broadcasts themselves available is an undertaking that is hindered by legal restrictions and also by competition laws. This is a territory where even a big public organisation like the Beeb has to tread carefully. Things are slightly more positive, though, with information about programmes (meta-data). Therefore, the Corporation has initiated the Genome project which digitises the programme listings of the Radio Times, starting from the earliest inception days of 1923, and listing radio and later television broadcasts. From this information, they are constructing a LOD database of some 5 million records that can be queried by outsiders. So, if you wanted to know all the programmes that Sir David Attenborough moderated, you can!

Archive software

The BBC data team developed a query interface and SPARQL endpoints to interrogate the compiled data. They also make the resources available in different formats like XML, RDF, or JSON. The project team has lately been busy with cleaning and correcting the datasets where inconsistencies or gaps are being identified, before going public.

Developers may be interested in this service: http://www.bbc.co.uk/programmes/developers, where the project team offers their programmes ontology in FOAF format and leaning on the Dewey Decimal system.

I talked to one BBC representative at a recent meeting of the LinkedUp project, a project that deals with linking datasets for education, and asked him about records covering advertising breaks. This is currently not in scope for the Genome team, but I am pretty sure that sufficient records must exist (perhaps in the finance department where advertisers pay for showings) within the Corporation to build this information source. In the same way as it is interesting to know how many programmes where about “elephants”, I believe it would be interesting what the most advertised consumer product was in 1968 at prime time radio and tv, or how advertising activities between brands (like Persil or Dash) compared.

{lang: 'en-GB'}

Here’s a useful tip if you are like me and collect weblinks and other stuff in your Gmail mailbox.

I use my Gmail account not only as a communication tool, but also as an online storage device. When I come across a useful blog post or article that I would like to use later, I send the link or file to myself. The trouble is that it very quickly gets washed away by incoming mails and disappears into the depth of the mail account. Google search for Gmail is great and even allows to search inside attachments by using the has:attachment keyword command. Still, that’s not enough if the mail doesn’t contain reasonable information to search for (e.g. when it contains a bit.ly link).

Gmail does provide tags (called labels), but they are more for organising mails into folders and don’t lend themselves for tagging content. Since I don’t want to get drowned in labels, I need a better way to do this.

Fortunately, there is a good way to do this: Create a label for the items you want to store, such as “notes” or “bookmarks”, etc. You can then mail yourself the items using the yourname+notes@gmail.com address. Just enter a number of useful tags into the text message and voilá that’s it organised. Now, you can easily search for the tag or browse it via the “notes” label, which works like a folder.

2013-02-22_101634

{lang: 'en-GB'}

Have you ever wondered why nowadays there are half a dozen or more authors on a scientific paper? Have you had it that you co-author a paper with someone and suddenly another author appears on the list without having contributed?

There is substantial squatting going on in the field of authoring. In my view, this constitutes one of the worse sorts of academic plagiarism by claiming something to be yours that is the work of others. In this, academic publishing has become comparable to collaborative school essays: some do the work while the lazy guys latch on and get the marks. But there is no teacher to assess the individual contribution so it is nearly impossible to tell who contributed and how.

The motives are clear: Career and personal professional performance are measured in quantities, so having your name appear more often, or having a longer publication list directly benefits your reputation and career. Because author lists are sequential, the quantitative method leads to a dilemma as there can be no equals. Additional value measures need to be introduced, and, therefore, the authors’ list has become a battleground! First authors are deemed to be more valued than second, third, forth authors. Additionally, there is a battle for last place, which can be called the “mentor” spot. The person appearing last on an author list gets the benefit of being associated with the mentor-ship of the entire work, i.e. the authority in the background. Some research departments even operate a policy that the Head appears on all outgoing publications. The same applies to PhD supervisors.

People obviously ignore the possibility of acknowledgements, where the mentoring support of supervisors etc.  can be explicitly mentioned. Instead, a distorted picture of authorship emerges that leaves the reader in the dark on who’s responsible for the content and who exercised influence over it. Furthermore, it is strictly speaking against conventional IPR and copyright because only the manifestation itself, not the idea or revision qualifies for protection. Holding an opinion or idea but not recording it, does not fall under IPR. For this reason, many PhD supervisors claim that when a student writes “their” ideas in a paper, they have the claim of authorship. But, in this case, we reduce the PhD researcher to a ghost writer, which is far from the reality.

For me, this battle for authorship has become one more nail in the coffin of traditional academic publishing. But there is little one can do, since junior researchers are in a weak position to protest against exploitation, squatting and position battles. And more senior academics are unlikely to cry for change of something that benefits them without moving a finger.

{lang: 'en-GB'}

LinkedIn has recently introduced the endorsement feature for user profiles. This is an interesting innovation in terms of social metadata and profiling information.

Up till now, people’s profile pages underwent a self-publishing process that only had the value of “believe it or not” to others. LinkedIn already previously noted this and introduced a functionality to recommend and write references for your connections. This, however, is a bit tedious as a process and therefore not very scalable.  The new method of endorsing expertise of others is a much more painless exercise and scales pretty well given a large user base.

Social metadata is information gathered from the interactions of people with other entities on a platform. These entities can be items like learning objects or hotels (in a hotel portal), or they can be other users (‘connections’ or ‘friends’). Interactions are intentional acts of contributing to the social value or currency of an item. They are defined as intentional merely to distinguish them from tracking analytics. Social metadata consist of ratings, shares, reviews, bookmarks, tags, etc.

Social metadata express a collective opinion about a content object or person. For example, the average rating of a hotel is 4.2, as rated by 319 people. This gives the object a relative currency of 4.2 on a 5 star likert scale, putting it into a relative value position to other similar objects.

The endorsement feature now applies this crowdsourcing of opinions to self-published profile data in a clever and meaningful way. It works in the positive only, that is just ‘likes’, no ‘dislikes’, and through collective effort provides a simple verification system for what users claim to be. Similar crowdsourced verification systems can be envisaged in many different contexts and social platforms. They might even have the potential to replace the need to administratively verify authenticity of claims via certificates and ID cards.

{lang: 'en-GB'}

I have complained about what I perceive as a decline in the quality of research submissions before, which makes the act of blind peer-reviewing a roulette where you get lucky with a paper worth reading about every 36th time.

Now this may be exaggerated, but sums up the feeling of waste of time I increasingly get when reviewing submissions for conferences or journals. So can we put the finger on what’s going wrong?

Firstly, I think that the pressures and expectations to publish anything and everything, and especially lots of it, has created an environment where especially young researchers just give it a try. Lack of good supervision and proper induction to the world of research – in terms of its ethics, style, language and established practice seem to leave novice researchers out in the cold. In turn, though, they themselves become peer reviewers and apply the same attitude to others. This results in a trend toward lay wisdom and platitudes.

The fast spinning environment of modern knowledge creation is inept to the traditional ways of research publishing and quality control mechanisms, so we need to re-think these processes. I have long argued that the way by which research papers are published is out of date (and I don’t only mean the commercial academic publishers). This also includes the way pieces of information or data are accessed, evaluated, cited and referenced. Still, structural criteria are a requirement to distinguish methodical research from garbling.

Secondly, I am not of the opinion that our knowledge changes as much as is often said. Instead the value of knowledge that’s produced is partly so low that a long life is not anticipated anyway. Knowledge is designed as disposable from the outset, a commodity that gains a researcher immediate benefits and does not involve long-term evolution of neither knowledge nor expertise. For peer reviews this means that the quantities go up, but the necessity of quality checks becomes unscalable, unfeasible, and most of all unexciting (like sifting through a pile of rubbish).

And then again, I may just apply my old-fashioned attitude of long-term vision to a world that is full of short term approaches.

{lang: 'en-GB'}

This is a welcome step for others to follow. As this article describes, the UK government is taking measures to simplify their web design for public use. A first glimpse is available at GOV.UK.

Many ordinary citizens are dazzled by the wealth of information and the confusing design by which it is presented. It is not quite clear to me where this style of web design originated and why. It seems that the motives were both aesthetic and commercial in nature. To overload sites with unnecessary information and advertisements may have mimicked some yellow press newspaper compilations. This was further enhanced by the urge to place moving widgets and banners onto every page. I always wondered why a site that sells tyres shows a weather widget. It never made me buy any new tyres even when it showed sleet and snow. While loading your site with stuff may make it look more lively, active and content-rich, it is neither inclusive nor efficient. Government sites too followed this trend, so it is a relief to see a change coming.

Unlike commercial sites, public information services are an essential right of citizens. They allow people to actively participate and benefit from the public services they pay for with their taxes. As we are more and more migrating toward e-citizenship an inclusive design that is accessible to all parts of the nation’s population becomes increasingly important. Therefore, the design principles promoted at GOV.UK should become essential reading for web designers of all trades. Out of the 10, here are my personal favourites:

  • Start with needs
  • Design with data
  • Build for inclusion
  • Understand context
  • Build digital services, not websites
  • Be consistent, not uniform
{lang: 'en-GB'}

What a frustrating experience to find language specific resources on the most content-rich sites YouTube and Slideshare!

In a blog post from 2007, Slideshare proudly announced the language filter for content. This has since disappeared from the interface. A lengthy and annoying search of the help forum retrieved the information that the search language can be set as a default in the user profile. This means you have to be logged in, which is a pain. But not only do I not always want to search for specific language results, the whole thing did not work. After laboriously setting the personal default language to German, typing in a search for “Strategie” or “Management” did not show a single German result although it duefully translated the search term into English and found hundreds of “strategy” results. Yet users are held to state the language of the presentation they upload, so I assume it is added to the searchable metadata. Even more confusing, there is a language drop-down menu (French, German, English only) at the bottom of the page, but this only translates the interface.

And then the heureka moment happened: There is a language filter under some of the “browse” menu items, like ‘downloaded’, ‘featured’, etc. But what good is this when you search for a specific topic?

YouTube operates on a similarly ignorant level, see the filter option in the above picture. What has gotten into them? I would argue that language is the first and foremost criterion under the “relevance” filter that makes a real difference to users – at least until we have on-the-fly live translation services that work with different content types and in acceptable quality. Even more importantly, national primary and secondary education, and also to some extent adult education relies increasingly on OER content in their respective language. Why would I upload into Slideshare if it cannot be found by my linguistic community?

 

{lang: 'en-GB'}

This is disturbing news. Scientists in Italy were convicted of manslaughter for not accurately predicting the extent of the 2009 earthquake that tragically killed some 300 people. Inevitably, this measure has implications of how research serves the general public.

Sure enough, the scientists convicted were a kind of special force dedicated to the prediction and prevention of natural disasters. However, the incompleteness and imperfection of human knowledge, as well as the unpredictability of nature are factors that need to be considered.

Prosecution of scientific research in this way may set an example for other future court cases. There would be an infinite field of such activity in e.g. medical or pharmaceutical research. Clearly, this is highly undesirable as cases against the tobacco industry have already shown. Instead of peer collaboration this could potentially lead to peer prosecutions based on scientific counterpositions and political scapegoating.

{lang: 'en-GB'}

Next Page »