Sunday, April 29, 2007

Algorithm Design

After WikiCorpus is up and running, I've a few hypotheses to test on the dataset. The first is the use of finite state automata in the sequential processing hypothesis. The FSM might require one or more stacks or cursors. This would take a preprocessed sentence (via NL tools) as input and output a sequence of treebuilding instructions. Some substrings would have to be mapped to predicate candidates.

This first theory is to transform a sentence into an alphabet of treebuilding instructions that may have its own grammar (universal?). I find the permutations on the predicates at the logic-based or knowledge-based level that allows noun-order paraphrases to be transitioned between, the relationship between this and the treebuilding alphabet and the relation to the sentence paraphrases to be interesting.

Rephrased, each sentence is a sequence of words that can be transformed (FSM, HMM, ?) into one or more sequences of treebuilding instructions and the resulting sequence of predicates (or set of sequence candidates) can be permuted for noun order paraphases using the fact that predicates can map to others with the arguments inverted. It's possible that permutations on the tree can be mapped to transformations on the treebuilding instruction set and this can map back to sentence(s). This level of natural language understanding could also be called a paraphrase generator (a less than exciting name for some rather complicated AI).

Thus, it's theoretically possible that the same system that turns sentences into sequences of predicates can be of use in natural language generation.

No comments: