Wed 26 Jan 2011
Ok, it seems like a good time to reflect on semantic technologies a bit, since this week’s topic in the Learning and Knowledge Analytics MOOC (LAK11) is the Semantic Web.
What are the differences between semantic web technologies and language technologies? Following the keynote presentation by Dragan Gasevic, I’d say the key difference is that the Semantic Web is about a technology that allows computers to communicate in an enhanced way, whereas language technologies are about human language. Semantic web technologies like OWL, RDF and ontologies identify relationships between entities. Language technologies are about analysing and understanding natural human language, natural language processing (NLP) is the catch-all term for this.
What semantic web technologies can do is relatively simple to show. Take this example: You know that your friend John has a brother living in South America, but you can’t remember his name. Typing “brother of John” into a traditional search engine won’t work. All it will return is documents that contain the words ‘brother’ and ‘John’ or the exact phrase ‘brother of John’. The Semantic Web “knows” about relations, hence it would return a result saying ‘brother of John’ = ‘Kendon’. It works in exactly the same way for ‘capital of France’ = ‘Paris’; or ‘other words for red’ = ‘crimson’, ‘ruby’, etc. Semantic search engines can do this, based on a vocabulary of relations. This not only stores the words themselves, but also the way in which they relate to each other, i.e. ‘goose’ is a sub-item to ‘bird’.
In contrast to this, natural language processing tries to understand human language. Here is an example: When you put to someone the polite question “may I ask you, how old you are?”, the answer “I’m 42” is a perfectly acceptable response. Not to a computer! A computer does not understand politeness and can only respond with ‘yes’ or ‘no’.
How can we use natural language processing in learning and knowledge analytics? Here are two examples of possible use:
Language technologies also use vocabularies and ontologies. But in addition they also refer to grammars and corpora. This gives them the ability to identify synonyms. With language technologies like Latent Semantic Analysis distances between words or terms can be mapped out against their context. By comparing artefacts with a large body of documents, a new language item can be mapped in how closely it relates to them and the domain they cover. It also identifies related concepts and through specific techniques like disambiguation can exclude homonyms with different meaning, e.g. Java (the island) from Java (the programming language).
Taking the assumption that a learner attempts to progressively adopt more and more subject specific expressions and terminology, their conceptual coverage can be identified in the textual artefacts they produce (e.g. entries to their learning diary or blog), and, longitudinally, their conceptual development. The hypothesis simply is: “if it quacks like a duck…“. By analysing concept coverage in learner texts, a tutor (or support system) can also identify omissions and provide relevant intervention.
Chat and forum are widespread in education. However, using them with a large group of learners is challenging and carries a high cognitive load for the participants as well as the tutor. This is because the natural text language is confused in several ways: Firstly, the discussion threads are often wildly intertwined and hard to follow (in fora this is often shown as tree-structured indents). Secondly, some people are slow typists or make mistakes. A computer system analysing such text must understand typos. Thirdly, an abbreviated form of language (btw, omg, lol), emoticons, or grammatically incomplete sentences are used.
Sophisticated techniques like anaphora resolution, utterance categories, etc. can go a long way to analyse dialogues. In a learning context, this can support discussion moderators, but also the participants themselves in identifying strengths and weaknesses in textual conversations. The system can, for example, show who posed most questions, and who provided answers. It can also find out who disengaged and talked about something else (concept coverage as above), or who did not connect to others in the discussion. This knowledge analysis may help a tutor to support learners in staying on task.
NLP and the Semantic Web are complementary technologies. The latter is very important for finding, sharing and exchanging resources, the former is more geared towards human interaction. NLP still has a long way to go, though, but the signs are promising.
PS: more info at www.ltfll-project.org