Thursday, April 19, 2007

Knowledge Representation, Paraphrases

In machine reading, a goal is for all paraphrases to be processed into the same set of predicates or other form of representing semantics. One method I found to represent this feature is to represent a sentence as a block matrix representing the pairwise binary predicates between nouns. This utilizes the fact that n-ary predicates can be decomposed into a set of binary predicates.

The rows and columns are in the order of nouns occurring in a sentence and the entry in the i-th row and j-th column is meant as the predicate(s) that relate(s) the i-th and j-th noun (nouns on main diagonal). Using this, permutations can change noun ordering (one variety of paraphrases) and the semantics can be preserved.

Example:
1) Tommy went to the store to get a crescent wrench. <Tommy,store,wrench>
2) To get a crescent wrench, Tommy went to the store. <wrench,Tommy,store>

From this or another noun-order invariant representation format, the goal of NLG is then to compose grammatically correct sentences containing the semantics with the nouns in the order of the underlying matrix. A reason for robustness in this is that noun ordering is often context dependent, as sentences are composed in paragraphs and documents; both noun ordering and sentence aggregation [1] are overarching processes that are part of “good writing style”.

Using the phases of NLG from the article, the process with this knowledge representation may resemble:

Content determination: Knowledge is obtained from a knowledgebase or web.
Discourse planning: Knowledge is moved into a matrix format.
Sentence aggregation: Block diagonalization and utilization of other patterns discerned from a corpus of well-written articles.
Lexicalisation: Putting words to the concepts.
Referring expression generation: Linking words in the sentences by introducing pronouns and other types of means of reference.
Syntactic and morphological realisation: Permutations are applied as per patterns discerned from well-written articles; each sentence is realized with the nouns in the best order.
Orthographic realisation: Matters like casing, punctuation, and formatting are resolved.


[1] Natural Language Generation, Wikipedia article

No comments: