Thursday, May 3, 2007

Wikipedia Example

Nations can have literatures, as can corporations, philosophical schools or historical periods. Popular belief commonly holds that the literature of a nation, for example, comprises the collection of texts which make it a whole nation. The Hebrew Bible, Persian Shahnama, the Indian Mahabharata, Ramayana and Thirukural, the Iliad and the Odyssey, Beowulf, and the Constitution of the United States, all fall within this definition of a kind of literature. [1]


As written, this sentence does not immediately parse as per the working hypothesis. The problem occurs around the "collection of texts which make it a whole nation."

[literature of a nation][comprises][collection of texts] which [make][it][a whole nation].

Labeling the chunks A - F, we can see
B(A,C)
D(A,...)

However, by inserting the word “into”, or “become” into the text:

[literature of a nation][comprises][collection of texts] which [make][it][into][a whole nation].

and labeling the chunks A-G, we can see
B(A,C)
D(A,F(E,G))

So there may be occasions where the machine reading algorithm has to add text to a document to get the algorithm the parse correctly, or use pattern-based rules, for example involving the predicate make(X,Y). These can be reflected in NLG as well to allow sentence candidates to be formed where, as in this example, the “into” or “become” is implicit.

So with that in mind,

Nations can have literatures, as can corporations, philosophical schools or historical periods. Popular belief commonly holds that the literature of a nation, for example, comprises the collection of texts which make it [into] a whole nation. The Hebrew Bible, Persian Shahnama, the Indian Mahabharata, Ramayana and Thirukural, the Iliad and the Odyssey, Beowulf, and the Constitution of the United States, all fall within this definition of a kind of literature. [1]


[Nations][can have][literatures], as [can][corporations], [philosophical schools] or [historical periods]. [Popular belief][commonly holds] that [the literature][of][a nation] ... [comprises] the [collection of texts] which [make][it][into][a whole nation]. [The Hebrew Bible], [Persian Shahnama], [the Indian Mahabharata], [Ramayana and Thirukural], [the Iliad and the Odyssey], [Beowulf], and [the Constitution of the United States], all [fall within] this [definition][of][a kind of literature].

Machine Reading
B(A,C)
B(E&F&G,C)
I(H,N(K(J,L),O))
P(O,R(Q,S))
AA(T&U&V&W&X&Y&Z,AC(AB,AD))

The ampersands indicate the processing of commas.

Start a new predicate with A as the left argument.
Name that predicate B.
Place C in the right argument of that predicate.
Start a new predicate with the label B.
Place C in the right argument of that predicate.
Place E in the left argument of that predicate.
In the context of a comma list,
Place F in the left argument of that predicate.
In the context of a comma list,
Place G in the left argument of that predicate.
Start a new predicate with H as the left argument.
Name that predicate I.
In the right argument of that predicate,
Start a new predicate with J as the left argument.
Name that predicate K.
Place L in the right argument of that predicate.
...
Wrap that predicate in a new predicate with the current content in the left argument.
Name that predicate N.
Place O in the right argument of that predicate.
Start a new predicate with O as the left argument.
Name that predicate P.
In the right argument of that predicate,
Start a new predicate with Q as the left argument.
Name that predicate R.
Start a new predicate with S as the right argument.
Start a new predicate with T as the left argument.
In the context of a comma list,
Place U in the left argument of that predicate.
In the context of a comma list,
Place V in the left argument of that predicate.
In the context of a comma list,
Place W in the left argument of that predicate.
In the context of a comma list,
Place X in the left argument of that predicate.
In the context of a comma list,
Place Y in the left argument of that predicate.
In the context of a comma list,
Place Z in the left argument of that predicate.
Name that predicate AA.
In the right argument of that predicate,
Start a new predicate with AB as the left argument.
Name that predicate AC.
Place AD in the right argument of that predicate.

We can use “symbolic logic” operations to turn:
B(E&F&G,C)
into
B(E,C)
B(F,C)
B(G,C)

resulting in

B(A,C)
B(E,C)
B(F,C)
B(G,C)
I(H,N(K(J,L),O))
P(O,R(Q,S))
AA(T,AC(AB,AD))
AA(U,AC(AB,AD))
AA(V,AC(AB,AD))
AA(W,AC(AB,AD))
AA(X,AC(AB,AD))
AA(Y,AC(AB,AD))
AA(Z,AC(AB,AD))

We can also observe that V and W can become V1&V2, and W1&W2.

B(A,C)
B(E,C)
B(F,C)
B(G,C)
I(H,N(K(J,L),O))
P(O,R(Q,S))
AA(T,AC(AB,AD))
AA(U,AC(AB,AD))
AA(V1,AC(AB,AD))
AA(V2,AC(AB,AD))
AA(W1,AC(AB,AD))
AA(W2,AC(AB,AD))
AA(X,AC(AB,AD))
AA(Y,AC(AB,AD))
AA(Z,AC(AB,AD))

which in quad form is

A B C 1
E B C 2
F B C 3
G B C 4
J K L 5
5 N O 6
H I 6 7
Q R S 8
O P 8 9
AB AC AD 10
T AA 10 11
U AA 10 12
V1 AA 10 13
V2 AA 10 14
W1 AA 10 15
W2 AA 10 16
X AA 10 17
Y AA 10 18
Z AA 10 19

[1] Literature, Wikipedia

1 comment:

Nagy Attila István said...

Impressing! I am a newbie into NLP, and I'd be interested in what program or method you used to generate this output.