Friday, April 13, 2007

Ontology and File Compression

Thinking on this post-SRL/pre-NLG ontology made me realize there was no easy way to compare different models.

If we look at rule systems, ontology and taxonomy as interoperating towards efficiently storing knowledge, then there might be a metric. That is, if system A compresses the same knowledge set better than system B and is more computationally efficient (in decompression/utilization), then we can say that system A is superior to system B (on that set) without resorting to aesthetics or philosophy. We have SUMO, the CYC upper ontology, ISO 15926, and others designed around real-world data and it's difficult to rank them.

The metaphor of file compression to knowledgebases might allow competition between differing methods. As systems are envisioned that mechanically generate rules, ontological structure or taxonomy (optimizing generators that create a system for a given knowledgebase), these metrics may be of use in comparing the resulting generated systems. Personally, I think it would be interesting to have algorithms that compress knowledgebases like tar, zip and 7zip do to files. Unfortunately, this approach is storage and speed-based and doesn't consider interface considerations-- for example, sets of things that are categorized for navigation.

Here's a link to a paper describing a relationship between AI (my field of research) and file compression. Apparently, there's a prize for compressing Wikipedia.

No comments: