Tuesday, May 29, 2007

Calculus of Language

I'm tentatively calling the new syntax and related logic a “calculus of language” as it has the properties of a predicate calculus and appears to represent natural language well. Syntactically, it's a superset of predicate calculus. I am writing up a draft of it, going over the formal logic and grammars, and heuristics for the sequential parsing of natural language and hope to publish it soon. I'll comment on any developments as they occur; this technique appears to be working on arbitrary natural language sentences. However, the output is strings, meaning that getting to the exact semantics (URI/integers) appears to be a second step algorithm.

Sunday, May 27, 2007

New Predicate Syntax

While working on the parser, I developed a new syntax that appears to be more readily parsed into. While this new syntax appears more readily processed into (from natural language), the syntax described previously appears to be more readily generated from (to natural language). There's a simple heuristic for converting between the two; the new syntax and grammar are a superset of the old syntax and grammar.

In the new syntax there are notational varieties of predicates. These notational conventions appear to reduce the instruction set for building the predicates sequentially and to reduce the context information required during the processing of a sentence— making the heuristic simpler.

Tuesday, May 22, 2007

PCFG and Adaptive FSM

I'm having some success with a first approach to machine reading. The algorithm is to convert a PCFG parse into a sequence and map this to a sequence of predicate assembly instructions using an adaptive FSM. The adaptive FSM algorithm basically maps sequences from one domain to another and utilizes patterns in the mappings of subsequences to optimize memory usage. This optimization relates to generalization in which patterns can be induced from training examples. I'll post further results as they occur; basically, the algorithm is an online parser (word-by-word) to convert sentences to predicates though the training data is structural (sentence-by-sentence).

Sunday, May 13, 2007

Preprocessing and Predicate Patterns

I found an example sentence that shows a promising relationship between the preprocessing and patterns in the predicates. That is, a correct preprocessing can indicate patterns that an incorrect chunking does not.

For example,

{the scientific method}[Seeks+][To+][Explain]{the complexities}[Of]{nature}[In]{a replicable way}<comma><and>[To+][Use]{these explanations}[To+][Make]{useful predictions}

H(B+C+D(A,F(E,G)),I)
O+P(L+M(A,N),Q)

Notice, that B could move across the <comma><and>

H(B+C+D(A,F(E,G)),I)
O+P(B+L+M(A,N),Q)

However, the preprocessing:

{the scientific method}[Seeks][To+][Explain]{the complexities}[Of]{nature}[In]{a replicable way}<comma><and>[To+][Use]{these explanations}[To][Make]{useful predictions}

B(A,H(C+D(_,F(E,G)),I)&O(L+M(_,N),P(_,Q)))

or

B(A,H(C+D(_,F(E,G)),I))
B(A,O(L+M(_,N),P(_,Q)))

Is more illustrative of the patterns indicative of sentences with <comma><and>, namely P(x, y1&y2...). The ampersand is a notational convention that is useful during sequential processing.

The pattern evidenced in the second predicate output, related to the <comma><and>, indicates that the second chunking was more correct than the first. This indicates that the chunking and choice in predicates is not arbitrary and that a set may emerge from patterns dependent upon the <> symbols in certain sentences.

I will find more example sentences that indicate that patterns in the output can reinforce chunking hypotheses. Some areas, in English, that are of potential complexity include the word 'to'.

Bobby went to the store.
{bobby}[Went][+To]{the store}.
B+C(A,D)
Here the 'to' is in the sense of 'towards'.

Bobby likes to draw.
{bobby}[Likes][To+][Draw].
B(A,C(_,_))
Here the 'to' is in the sense of an infinitive.

Bobby bought crayons to draw.
{bobby}[Bought]{crayons}[To][Draw]{pictures}.
D(B(A,C),E(_,F))
Here the 'to' is in the sense of 'in order to'.

Predicates and chunking appear to be a discernable property of patterns from <> in certain sentences. For example, the argument above is that SeeksToExplain(x,y) is not a predicate and that Seeks(x,y), Explain(x,y) are.

Tuesday, May 8, 2007

Artificial Intelligence

After WikiCorpus and some NL algorithms for machine reading/writing, I'm thinking of moving towards language understanding (NLU) post-processing for sentences like:

“Once the characteristic numbers of most notions are determined, the human race will have a new kind of tool, a tool that will increase the power of the mind much more than optical lenses helped our eyes, a tool that will be as far superior to microscopes or telescopes as reason is to vision.”

— Leibniz


This sentence compares a described tool to microscopes and telescopes, reason to vision, and compares those two comparisons. So, after an initial semantic parsing, a rule system may permute the structured knowledge into a format more conducive to machine reasoning and NLU.

However, this is moving from NL towards artificial intelligence or, at least, machines that can demonstrate intelligent behavior.

user> Birds are to air as fish are to what?
machine> Water.
user> Why?
machine> Birds move through the air and fish move through water.

Or,

user> Telescopes are to vision as what is to reasoning?
machine> A new kind of tool that the human race will have after the characteristic numbers of most notions are determined, according to Leibniz.

This sort of NL interaction appears to require machine reading, kb search, paraphrase generation, kb search and machine writing. A goal of mine after processing text into a knowledge representation is an NL interface (requiring machine reading/writing) to demonstrate NLU. The algorithm for the above examples is tentatively all-paths discovery, path permuting and parallel search of the kb.

Pseudocode:

// n1 is to n2 as n3 is to what?
// n1 is to n2 as what is to n4?
Nodes[] IsToAsIsTo(n1,n2,n3,n4)
{
 Paths[] P = FindAllPaths(n1,n2);
 Paths[] PP = Rephrases(P);
 Nodes[] N = null;
 if( n4 == null ) N = Search(n3,PP);
 else if( n3 == null ) N = Search(PP,n4);
 return N;
}

Sunday, May 6, 2007

WikiCorpus

So far so good on those hypotheses. The WikiCorpus alpha source code will be available (in a few weeks) here. I'm building it presently and testing it on Wikipedia articles. It will provide a user interface for generating corpora for pronoun and reference discernment, and semantic parsing algorithm design.

Friday, May 4, 2007

Hypotheses

The hypotheses that I'm presently testing are

1) Representation Hypothesis

Letting P(X,Y) = ~P(Y,X)

A natural language sentence can be represented:

S → S1 | S1 S2 | S1 S2 S3 ...
Si → P(A,A) | ~P(A,A)
A → P(A,A) | ~P(A,A) | N | _
N → n1 | n2 | n3 ...
P → p1 | p2 | p3 ...

2) Noun Order Hypothesis

We can view S as a sequence of trees. The leaf nodes of those trees, that are nouns, if viewed as a sequence, can be such that a noun can only be in that sequence if the previous noun from the natural language sentence exists previously in the sequence.

3) Hypothesis of Heuristic

It is heuristically possible to convert to and from this representation and natural language.


Optionally, for sentence structures resembling “X makes Y Z” and “X finds Y Z”, and should it help the heuristic, we can ammend the representation hypothesis:

Si → P(A,A) | ~P(A,A) | P(A,P'(A,A)) | ~P(P'(A,A),A)
A → P(A,A) | ~P(A,A) | P(A,P'(A,A)) | ~P(P'(A,A),A) | N | _

Where P' can be determined from P.

Machine Reading, Paraphrases

1) {Bobby} [likes][drawing] {pictures} [with] {crayons}.
Likes(bobby,With(Draw(_,pictures),crayons))

2) {Bobby} [likes][using] {crayons} [to draw] {pictures}.
Likes(bobby,Using(crayons,Draw(_,pictures)))
Likes(bobby,~With(crayons,Draw(_,pictures)))

So, the nesting of predicates can be discerned by noun order paraphrases. This is theoretical at this point, but so far so good. I'm using the premise of semantics-preserving noun order paraphrases as an axiom in theorizing machine reading algorithms.

Representation Notes

This sentence suggests that gerunds should be in noun brackets.
{Bobby} [likes] {drawing}. B(A,C)

However,
{Bobby} [likes] %drawing% B(A,C) or B(A,C(_,_))

{Bobby} [likes drawing] {pictures}. B(A,C)
{Bobby} [likes][drawing] {pictures}. B(A,C(_,D))
{Bobby} [likes][to draw] {pictures}. B(A,C(_,D))

So the gerund can be P(_,_) and the infinitive can be the same or, if a direct object is specified, be P(_,Y).

{Bobby} [drew]. B(A,_)

An argument for structural reuse is in the sentence:
As Bobby likes to draw pictures with crayons, he went to the store and he purchased a box of them.
<As> [[1]], {he}[went to]{the store} and {he}[purchased] {a box}[of]{them}.
As([[1]],[[2]])

The <> brackets represent elements or operators that relate predicates or sequences of predicates, logic operations and other punctuation relevant to processing.

Wikipedia Example 3

The term Artificial Intelligence (AI) was first used by John McCarthy who used it to mean "the science and engineering of making intelligent machines". It can also refer to intelligence as exhibited by an artificial (man-made, non-natural, manufactured) entity. The terms strong and weak AI can be used to narrow the definition for classifying such systems. AI is studied in overlapping fields of computer science, psychology, philosophy, neuroscience and engineering, dealing with intelligent behavior, learning and adaptation and usually developed using customized machines or computers.

Research in AI is concerned with producing machines to automate tasks requiring intelligent behavior. Examples include control, planning and scheduling, the ability to answer diagnostic and consumer questions, handwriting, natural language, speech and facial recognition. As such, the study of AI has also become an engineering discipline, focused on providing solutions to real life problems, knowledge mining, software applications, strategy games like computer chess and other video games. One of the biggest difficulties with AI is that of comprehension. Many devices have been created that can do amazing things, but critics of AI claim that no actual comprehension by the AI machine has taken place. [1]


{The term Artificial Intelligence|A} [was first used by|B] {John McCarthy|C} <who|D> [used|E] {it|F} [to mean|G] <"|H> {the science and engineering|I} [of|J] {making intelligent machines|K} <"|L>. {It|M} [can also refer to|N] {intelligence|O} as [exhibited by|P] {an artificial (man-made, non-natural, manufactured) entity|Q}. {The terms strong and weak AI|R} [can be used to narrow|S] {the definition|T} [for|U] [classifying|V] {such systems|W}. {AI|X} is [studied|Y] [in|Z] {overlapping fields|AA} [of|AB] {computer science|AC}, {psychology|AD}, {philosophy|AE}, {neuroscience and engineering|AF}, [dealing with|AG] {intelligent behavior|AH}, {learning and adaptation|AI} and [usually developed using|AJ] {customized machines|AK} [or|AL] {computers|AM}.

{Research in AI|AN} [is concerned with|AO] [producing|AP] {machines|AQ} [to automate|AR] {tasks|AS} [requiring|AT] {intelligent behavior|AU}. {Examples|AV} [include|AW] {control|AX}, [planning and scheduling|AY], [the ability to answer|AZ] {diagnostic and consumer questions|BA}, {handwriting|BB}, {natural language|BC}, {speech and facial recognition|BD}. <As such|BE>, {the study|BF} [of|BG] {AI|BH} [has also become|BI] {an engineering discipline|BJ}, [focused on providing|BK] {solutions|BL} [to|BM] {real life problems|BN}, {knowledge mining|BO}, {software applications|BP}, {strategy games|BQ} [like|BR] {computer chess and other video games|BS}. {One|BT} [of|BU] {the biggest difficulties|BV} [with|BW] {AI|BX} [is|BY] <that|BZ> <of|CA> {comprehension|CB}. {Many devices|CC} [have been created|CD] that [can do|CE] {amazing things|CF}, <but|CG> {critics|CH} [of|CI] {AI|CJ} [claim|CK] that <no|CL> {actual comprehension|CM} [by|CN] {the AI machine|CO} [has|CP] {taken place|CQ}.

B(A,C)
G(E(C,F),J(I,K))
N(M,P(O,Q))
S(R,U(T,V(_,W)))
Y+Z(X,AB(AA,AC&AD&AE&AF&AG&AH&AI))
AJ(X,AL(AK,AM))
AO(AN,AR(AP(_,AQ),AT(AS,AU)))
BE(AW(AV,AX&AY(_,_)&AZ(_,BA)&BB&BC&BD),BI(BG(BF,BH),BJ))
BE(AW(AV,AX&AY(_,_)&AZ(_,BA)&BB&BC&BD),BK(BJ,BM(BL,BN&BO&BP&BR(BQ,BS))))
BY(BW(BU(BT,BV),BX),CB)
CG(CE(CD(CC,_),CF),CK(CI(CH,CJ),!CP(CN(CM,CO),CQ)))

I'm looking into the linguistic reification of occurrence in the representation format (... CP CQ). A logic-based rule system may aid in this area.

Also interesting in this text is the passive voice, “Many devices have been created that can do amazing things.” The underscore indicates that the creator is unknown to the machine reading algorithm: CA(BZ,_) as CA corrolates to ~Created. I'm using the notation P(_,_) to represent gerunds.

One strategy in developing machine reading algorithms is to look at children's books and texts. These are already categorized by reading level, so any algorithm that appears to somehow build on itself as the reading level is increased would, philosophically, have additional merit, in my opinion.

[1] Artificial Intelligence, Wikipedia

Wikipedia Example 2

Knowledge representation is an issue that arises in both cognitive science and artificial intelligence. In cognitive science it is concerned with how people store and process information. In artificial intelligence (AI) the primary aim is to store knowledge so that programs can process it and achieve the verisimilitude of human intelligence. AI researchers have borrowed representation theories from cognitive science. Thus there are representation techniques such as frames, rules and semantic networks which have originated from theories of human information processing. Since knowledge is used to achieve intelligent behavior, the fundamental goal of knowledge representation is to represent knowledge in a manner as to facilitate inferencing i.e. drawing conclusions from knowledge. [1]


{Knowledge representation|A} [is|B] {an issue|C} that [arises in|D] both {cognitive science and artificial intelligence|E}. [In|F] {cognitive science|G}, {it|H} [is concerned with|I] [how|J] {people|K} [store and process|L] {information|M}. [In|N] {artificial intelligence|O} {the primary aim|P} [is|Q] [to store|R] {knowledge|S} [so|T] that {programs|U} [can process|V] {it|W} and [achieve|X] {the verisimilitude|Y} [of|Z] {human intelligence|AA}. {AI researchers|AB} [have borrowed|AC] {representation theories|AD} [from|AE] {cognitive science|AF}. <Thus|AG> <there are|*> {representation techniques|AH} [such as|AI] {frames|AJ}, {rules|AK} and {semantic networks|AL} which [have originated from|AM] {theories|AN} [of|AO] {human information processing|AP}. <Since|AQ> {knowledge|AR} [is used to achieve|AS] {intelligent behavior|AT}, {the fundamental goal|AU} [of|AV] {knowledge representation|AW} [is|AX] [to represent|AY] {knowledge|AZ} [in|BA] {a manner|BB} as [to facilitate|BC] {inferencing|BD} <i.e.|BE> {drawing conclusions|BF} [from|BG] {knowledge|BH}.

B(A,C)
D(C,E1)
D(C,E2)
I(F(H,G),J(K,L1(K,M)))
I(F(H,G),J(K,L2(K,M)))
Q(N(P,O),T(R(_,S),V(U,W)&X(U,Z(Y,AA))))
AG(AC(AB,AE(AD,AF)),AM(AI(AH,AJ),AO(AN,AP)))
AG(AC(AB,AE(AD,AF)),AM(AI(AH,AK),AO(AN,AP)))
AG(AC(AB,AE(AD,AF)),AM(AI(AH,AL),AO(AN,AP)))
AQ(AS(AR,AT),AX(AV(AU,AW),BC(BA(AY(_,AZ),BB),BD))
BE(BD,BG(BF,BH))

[1] Knowledge Representation, Wikipedia

Natural Language Understanding

Nations can have literatures, as can corporations, philosophical schools or historical periods. Popular belief commonly holds that the literature of a nation, for example, comprises the collection of texts which make it [into/become] a whole nation. The Hebrew Bible, Persian Shahnama, the Indian Mahabharata, Ramayana and Thirukural, the Iliad and the Odyssey, Beowulf, and the Constitution of the United States, all fall within this definition of a kind of literature.


The difference between machine reading and natural language understanding, in my opinion, is that machine reading is towards natural language processing and natural language understanding is towards machine reasoning and artificial intelligence. Looking at the underlined portions of the quote, we can equate or otherwise relate them, but I would say this is a task of natural language understanding as defined and not machine reading.

This is a sort of pronoun or reference resolution, binding strings to the same underlying unique representations may be slightly more advanced than reference resolution as in:

Nations can have literatures, as can corporations, philosophical schools or historical periods. Popular belief commonly holds that the literature of a nation, for example, comprises the collection of texts which make it [into/become] a whole nation. The Hebrew Bible, Persian Shahnama, the Indian Mahabharata, Ramayana and Thirukural, the Iliad and the Odyssey, Beowulf, and the Constitution of the United States, all fall within this definition of a kind of literature.


Revisiting the “X makes Y Z” sentence, the [into/becomes] predicate is unique to the makes predicate as to allow paraphrases and is not necessarily the same predicate as other text occurences of the words “into” or “become”. While predicates and nouns are resolved from text to integers after machine reading structures the sentence, predicates generated during machine reading are algorithmically determined.

Algorithmically, “X P Y Z.” → “X P Y P' Z.” where the empty string is a candidate for P' in P'(X,Y) in NLG, allows permutation based paraphrasing and the generation of sentences that occur naturally as in this document from Wikipedia. The context entered with P should notice when Z, a noun chunk, is to be placed in a predicate position that P' is needed and Z is the right argument.

X P Y Z.

Start a predicate with X as the left argument.
Label that predicate P.
Place Y into the right argument of that predicate.
Z...
Wrap that argument (Y) into a predicate with the previous content in the left argument.
Label that predicate P'.
Place Z into the right argument of that predicate.

So, the “X makes Y Z” required some thinking. If you find any other sentences that are good examples to strengthen algorithm design, comment them here or email me and I'll post on them.

Thursday, May 3, 2007

Wikipedia Example

Nations can have literatures, as can corporations, philosophical schools or historical periods. Popular belief commonly holds that the literature of a nation, for example, comprises the collection of texts which make it a whole nation. The Hebrew Bible, Persian Shahnama, the Indian Mahabharata, Ramayana and Thirukural, the Iliad and the Odyssey, Beowulf, and the Constitution of the United States, all fall within this definition of a kind of literature. [1]


As written, this sentence does not immediately parse as per the working hypothesis. The problem occurs around the "collection of texts which make it a whole nation."

[literature of a nation][comprises][collection of texts] which [make][it][a whole nation].

Labeling the chunks A - F, we can see
B(A,C)
D(A,...)

However, by inserting the word “into”, or “become” into the text:

[literature of a nation][comprises][collection of texts] which [make][it][into][a whole nation].

and labeling the chunks A-G, we can see
B(A,C)
D(A,F(E,G))

So there may be occasions where the machine reading algorithm has to add text to a document to get the algorithm the parse correctly, or use pattern-based rules, for example involving the predicate make(X,Y). These can be reflected in NLG as well to allow sentence candidates to be formed where, as in this example, the “into” or “become” is implicit.

So with that in mind,

Nations can have literatures, as can corporations, philosophical schools or historical periods. Popular belief commonly holds that the literature of a nation, for example, comprises the collection of texts which make it [into] a whole nation. The Hebrew Bible, Persian Shahnama, the Indian Mahabharata, Ramayana and Thirukural, the Iliad and the Odyssey, Beowulf, and the Constitution of the United States, all fall within this definition of a kind of literature. [1]


[Nations][can have][literatures], as [can][corporations], [philosophical schools] or [historical periods]. [Popular belief][commonly holds] that [the literature][of][a nation] ... [comprises] the [collection of texts] which [make][it][into][a whole nation]. [The Hebrew Bible], [Persian Shahnama], [the Indian Mahabharata], [Ramayana and Thirukural], [the Iliad and the Odyssey], [Beowulf], and [the Constitution of the United States], all [fall within] this [definition][of][a kind of literature].

Machine Reading
B(A,C)
B(E&F&G,C)
I(H,N(K(J,L),O))
P(O,R(Q,S))
AA(T&U&V&W&X&Y&Z,AC(AB,AD))

The ampersands indicate the processing of commas.

Start a new predicate with A as the left argument.
Name that predicate B.
Place C in the right argument of that predicate.
Start a new predicate with the label B.
Place C in the right argument of that predicate.
Place E in the left argument of that predicate.
In the context of a comma list,
Place F in the left argument of that predicate.
In the context of a comma list,
Place G in the left argument of that predicate.
Start a new predicate with H as the left argument.
Name that predicate I.
In the right argument of that predicate,
Start a new predicate with J as the left argument.
Name that predicate K.
Place L in the right argument of that predicate.
...
Wrap that predicate in a new predicate with the current content in the left argument.
Name that predicate N.
Place O in the right argument of that predicate.
Start a new predicate with O as the left argument.
Name that predicate P.
In the right argument of that predicate,
Start a new predicate with Q as the left argument.
Name that predicate R.
Start a new predicate with S as the right argument.
Start a new predicate with T as the left argument.
In the context of a comma list,
Place U in the left argument of that predicate.
In the context of a comma list,
Place V in the left argument of that predicate.
In the context of a comma list,
Place W in the left argument of that predicate.
In the context of a comma list,
Place X in the left argument of that predicate.
In the context of a comma list,
Place Y in the left argument of that predicate.
In the context of a comma list,
Place Z in the left argument of that predicate.
Name that predicate AA.
In the right argument of that predicate,
Start a new predicate with AB as the left argument.
Name that predicate AC.
Place AD in the right argument of that predicate.

We can use “symbolic logic” operations to turn:
B(E&F&G,C)
into
B(E,C)
B(F,C)
B(G,C)

resulting in

B(A,C)
B(E,C)
B(F,C)
B(G,C)
I(H,N(K(J,L),O))
P(O,R(Q,S))
AA(T,AC(AB,AD))
AA(U,AC(AB,AD))
AA(V,AC(AB,AD))
AA(W,AC(AB,AD))
AA(X,AC(AB,AD))
AA(Y,AC(AB,AD))
AA(Z,AC(AB,AD))

We can also observe that V and W can become V1&V2, and W1&W2.

B(A,C)
B(E,C)
B(F,C)
B(G,C)
I(H,N(K(J,L),O))
P(O,R(Q,S))
AA(T,AC(AB,AD))
AA(U,AC(AB,AD))
AA(V1,AC(AB,AD))
AA(V2,AC(AB,AD))
AA(W1,AC(AB,AD))
AA(W2,AC(AB,AD))
AA(X,AC(AB,AD))
AA(Y,AC(AB,AD))
AA(Z,AC(AB,AD))

which in quad form is

A B C 1
E B C 2
F B C 3
G B C 4
J K L 5
5 N O 6
H I 6 7
Q R S 8
O P 8 9
AB AC AD 10
T AA 10 11
U AA 10 12
V1 AA 10 13
V2 AA 10 14
W1 AA 10 15
W2 AA 10 16
X AA 10 17
Y AA 10 18
Z AA 10 19

[1] Literature, Wikipedia

Wednesday, May 2, 2007

Machine Translation

The argument is that the sequential set that builds a semantic parse tree using the nouns in the order that they occur is an intermediate representation format in all languages between a set of phrasings and a semantic parse tree, and that changes in the semantic parse tree can be reflected in the sequential set, resulting in a different set of phrasings (paraphrase generation). The illustration of patterns in the examples in English is an existence proof of a heuristic, not an argument that such patterns are identical across all languages. However, the use of integers when representing semantic parse trees or sets of predicates is argued to be language independent.

Interestingly, natural language is a temporal process. The intermediate representation is both temporal and static and the sequence of predicates is a static knowledge representation that allows semantics-preserving permutations.

So, the argument is:

Machine Reading
Sentence(s) phrasing(s)
[patterns suggest heuristic to:]
Intermediate representation
[heuristic apparent to:]
Sequence of predicates

Machine Writing
Sequence of predicates
[heuristic apparent to:]
Intermediate representation
[patterns suggest heuristic to:]
Sentence(s) phrasing(s)

Paraphrase Generation
Sentence(s) phrasing(s)
Intermediate representation
Sequence of predicates
Permutations
Sequence of predicates
Intermediate representation
Sentence(s) phrasing(s)

The paraphrase generation aspect is theoretically important to move the semantics between languages during machine translation. One natural sounding sentence in one language may, for example, require multiple sentences to express the same semantics naturally in another; the sentence structures of a paragraph of content may differ in natural language usages, and so forth. The sequence of predicates is argued to represent the knowledge shared across all articulations.

Tuesday, May 1, 2007

Foundations of Machine Reading/Writing

[Leibniz][took up][the question][in][his baccalaureate thesis], and [argued][in][the true scholastic style][for][a principle of individuation] which [would preserve][the independence of universals] [with respect to][ephemeral sensations], and [yet][embodied][universal ideas][in][the eternal natures of individuals].

Let us label each bracketed chunk with a letter, A through S.

with the [yet] (O), we would have

D(B(A,C),E)
{

G(F+I(A,J),H)
K(J,M(L,N))

}
O
{

P(A,R(Q,S))

}

Theoretically resembling:

D(B(A,C),E)
G(F+I(A,J),H)
K(J,M(L,N))
P(A,R(Q,S))
O(G(F+I(A,J),H),P(A,R(Q,S)))
O(K(J,M(L,N),P(A,R(Q,S)))

but let us look at the following four and call the processing of O a metaoperation.

D(B(A,C),E)
G(F+I(A,J),H)
K(J,M(L,N))
P(A,R(Q,S))

So, in this paradigm of machine reading, the goal is to turn a sequence of chunks into a structure with necessary internal states to manage the noun chunks during the process. Transitions may occur via chunked and non-chunked tokens.

How would we describe the process of building these four predicates?

Start a new predicate with A as the left argument.
Name that predicate B.
Place C in the right argument of that predicate.
Move that predicate into the left position of a new predicate.
Name that predicate D.
Place E in the right argument of that predicate.
Start a new predicate with A as the left argument.
Name that predicate F.
Move that predicate into the left position of a new predicate.
Name that predicate G.
Place H in the right argument of that predicate.
Now back to the last predicate that we just moved into the left position of this predicate.
Add I to that predicate's label.
Place J in the right argument of that predicate.
Start a new predicate with J as the left argument.
Name that predicate K.
The next statement is set in the right argument of that predicate.
Start a new predicate with L as the left argument.
Name that predicate M.
Place N in the right argument of that predicate.
Start a new predicate with A as the left argument.
Name that predicate P.
The next statement is set in the right argument of that predicate.
Start a new predicate with Q as the left argument.
Name that predicate R.
Place S in the right argument of that predicate.

Patterns:

(×4)
Start a new predicate with <X> as the left argument.
Name that predicate <X+1>.

(×3)
Start a new predicate with <X> as the left argument.
Name that predicate <X+1>.
Place <X+2> in the right argument of that predicate.

(×2)
Name that predicate <X>.
The next statement is set in the right argument of that predicate.
Start a new predicate with <X+1> as the left argument.
Name that predicate <X+2>.
Place <X+3> in the right argument of that predicate.

“A” is the subject of the sentence, "Start a new predicate with A as the left argument" (×3)

In two of the occasions that "Start a new predicate with A as the left argument" occurs (Start a new predicate with <SUBJ> as the left argument), it is preceded by "Place <X> in the right argument of that predicate." As the sentence ends with that instruction, it is possible that in a sequence of sentences, all three occurances would be.

I theorize that in a processed document, a continuous sequence of these instructions (across sentence boundaries) would have robust and complex patterns indicative of natural writing style. Furthermore, I theorize that this methodology will be able to explain why people read active tense faster than passive and how these are processed differently in this paradigm.