Saturday, April 14, 2007

Wikipedia, Semantic Web

After I make the web-deliverable interface that allows users to expand these semantic frames into all possible related entities, I refactor some SRL parsers to turn natural language into this semantic data and I point the system at Wikipedia... weeks later, what does this enormous OWL file (possibly stored numerically) have to do with the Semantic Web?

A quote from Cycorp:

The success of the Semantic Web hinges on solving two key problems: (1) enabling novice users to create semantic markup easily, and (2) developing tools that can harvest the semantically rich but ontologically inconsistent web that will result. To solve the first problem, it is important that any novice be able to author a web page effortlessly, with full semantic markup, using any ontology he understands. The Semantic Web must allow novices to construct their own individual or specialized-local ontologies, without imposing the need for them to learn about or integrate with an overarching, globally consistent, master ontology.

Allowing users to type in natural language is the easiest way to generate semantic markup. Because users prefer to use natural language, any ontology that software can roundtrip natural language with will likely be an overarching (prevalent) one. A possible problem with the approach I'm using is that the ontology of the dataset used to train SRL-based parsers would, instead of being handcrafted by an expert, be a collaborative effort of people, hopefully experts, visiting a site — wikiontology is a relatively new idea.

Only after I have the dataset will I be able to say if the ontology from the planned website is advantageous to machine reasoning tasks. It shouldn't be terribly difficult to make a benchmark for the consistency of the wiki-generated ontology — possibly using natural language (after the parser is completed). For example, paraphrase corpora and other instruments could be of use in both generating and refining.

Wikipedia is a proof of concept that people can come together to generate collective knowledge resources, so — if we get the post-NLP/pre-NLG ontology right (prevalent as argued above) — the Semantic Web may resemble a distributed wiki-knowledgebase. The gigabytes of Wikipedia data would be a launching point.

No comments: