Wednesday, May 2, 2007

Machine Translation

The argument is that the sequential set that builds a semantic parse tree using the nouns in the order that they occur is an intermediate representation format in all languages between a set of phrasings and a semantic parse tree, and that changes in the semantic parse tree can be reflected in the sequential set, resulting in a different set of phrasings (paraphrase generation). The illustration of patterns in the examples in English is an existence proof of a heuristic, not an argument that such patterns are identical across all languages. However, the use of integers when representing semantic parse trees or sets of predicates is argued to be language independent.

Interestingly, natural language is a temporal process. The intermediate representation is both temporal and static and the sequence of predicates is a static knowledge representation that allows semantics-preserving permutations.

So, the argument is:

Machine Reading
Sentence(s) phrasing(s)
[patterns suggest heuristic to:]
Intermediate representation
[heuristic apparent to:]
Sequence of predicates

Machine Writing
Sequence of predicates
[heuristic apparent to:]
Intermediate representation
[patterns suggest heuristic to:]
Sentence(s) phrasing(s)

Paraphrase Generation
Sentence(s) phrasing(s)
Intermediate representation
Sequence of predicates
Permutations
Sequence of predicates
Intermediate representation
Sentence(s) phrasing(s)

The paraphrase generation aspect is theoretically important to move the semantics between languages during machine translation. One natural sounding sentence in one language may, for example, require multiple sentences to express the same semantics naturally in another; the sentence structures of a paragraph of content may differ in natural language usages, and so forth. The sequence of predicates is argued to represent the knowledge shared across all articulations.

No comments: