Semantic Developer

Tuesday, July 3, 2007

Dynamo

Been working on a new module, it generates algorithms for mapping sequences to sequences. I'm coding it with c# generics for reusability. Basically, the module will generate memory resident assemblies which map sequences from one domain to sequences from another. Kind of a 'blackbox generator' (it's a superset of FSM-generation). The use I'm planning is mapping parse trees to predicate building sequences and/or quadruple set sequence building sequences. The code generation was fairly easy but I'm working on optimization algos now. Ideally, the generated assemblies will be fast enough for production environments.

The wikicorpus project is also nearing completion; the adaptive AI algos have been bringing more coding joy so I'll probably post a draft of that first.

Tuesday, May 29, 2007

Calculus of Language

I'm tentatively calling the new syntax and related logic a “calculus of language” as it has the properties of a predicate calculus and appears to represent natural language well. Syntactically, it's a superset of predicate calculus. I am writing up a draft of it, going over the formal logic and grammars, and heuristics for the sequential parsing of natural language and hope to publish it soon. I'll comment on any developments as they occur; this technique appears to be working on arbitrary natural language sentences. However, the output is strings, meaning that getting to the exact semantics (URI/integers) appears to be a second step algorithm.

Sunday, May 27, 2007

New Predicate Syntax

While working on the parser, I developed a new syntax that appears to be more readily parsed into. While this new syntax appears more readily processed into (from natural language), the syntax described previously appears to be more readily generated from (to natural language). There's a simple heuristic for converting between the two; the new syntax and grammar are a superset of the old syntax and grammar.

In the new syntax there are notational varieties of predicates. These notational conventions appear to reduce the instruction set for building the predicates sequentially and to reduce the context information required during the processing of a sentence— making the heuristic simpler.

Tuesday, May 22, 2007

PCFG and Adaptive FSM

I'm having some success with a first approach to machine reading. The algorithm is to convert a PCFG parse into a sequence and map this to a sequence of predicate assembly instructions using an adaptive FSM. The adaptive FSM algorithm basically maps sequences from one domain to another and utilizes patterns in the mappings of subsequences to optimize memory usage. This optimization relates to generalization in which patterns can be induced from training examples. I'll post further results as they occur; basically, the algorithm is an online parser (word-by-word) to convert sentences to predicates though the training data is structural (sentence-by-sentence).

Sunday, May 13, 2007

Preprocessing and Predicate Patterns

I found an example sentence that shows a promising relationship between the preprocessing and patterns in the predicates. That is, a correct preprocessing can indicate patterns that an incorrect chunking does not.

For example,

{the scientific method}[Seeks+][To+][Explain]{the complexities}[Of]{nature}[In]{a replicable way}<comma><and>[To+][Use]{these explanations}[To+][Make]{useful predictions}

H(B+C+D(A,F(E,G)),I)
O+P(L+M(A,N),Q)

Notice, that B could move across the <comma><and>

H(B+C+D(A,F(E,G)),I)
O+P(B+L+M(A,N),Q)

However, the preprocessing:

{the scientific method}[Seeks][To+][Explain]{the complexities}[Of]{nature}[In]{a replicable way}<comma><and>[To+][Use]{these explanations}[To][Make]{useful predictions}

B(A,H(C+D(_,F(E,G)),I)&O(L+M(_,N),P(_,Q)))

or

B(A,H(C+D(_,F(E,G)),I))
B(A,O(L+M(_,N),P(_,Q)))

Is more illustrative of the patterns indicative of sentences with <comma><and>, namely P(x, y₁&y₂...). The ampersand is a notational convention that is useful during sequential processing.

The pattern evidenced in the second predicate output, related to the <comma><and>, indicates that the second chunking was more correct than the first. This indicates that the chunking and choice in predicates is not arbitrary and that a set may emerge from patterns dependent upon the <> symbols in certain sentences.

I will find more example sentences that indicate that patterns in the output can reinforce chunking hypotheses. Some areas, in English, that are of potential complexity include the word 'to'.

Bobby went to the store.
{bobby}[Went][+To]{the store}.
B+C(A,D)
Here the 'to' is in the sense of 'towards'.

Bobby likes to draw.
{bobby}[Likes][To+][Draw].
B(A,C(_,_))
Here the 'to' is in the sense of an infinitive.

Bobby bought crayons to draw.
{bobby}[Bought]{crayons}[To][Draw]{pictures}.
D(B(A,C),E(_,F))
Here the 'to' is in the sense of 'in order to'.

Predicates and chunking appear to be a discernable property of patterns from <> in certain sentences. For example, the argument above is that SeeksToExplain(x,y) is not a predicate and that Seeks(x,y), Explain(x,y) are.

Tuesday, May 8, 2007

Artificial Intelligence

After WikiCorpus and some NL algorithms for machine reading/writing, I'm thinking of moving towards language understanding (NLU) post-processing for sentences like:

“Once the characteristic numbers of most notions are determined, the human race will have a new kind of tool, a tool that will increase the power of the mind much more than optical lenses helped our eyes, a tool that will be as far superior to microscopes or telescopes as reason is to vision.”

— Leibniz

This sentence compares a described tool to microscopes and telescopes, reason to vision, and compares those two comparisons. So, after an initial semantic parsing, a rule system may permute the structured knowledge into a format more conducive to machine reasoning and NLU.

However, this is moving from NL towards artificial intelligence or, at least, machines that can demonstrate intelligent behavior.

user> Birds are to air as fish are to what?
machine> Water.
user> Why?
machine> Birds move through the air and fish move through water.

Or,

user> Telescopes are to vision as what is to reasoning?
machine> A new kind of tool that the human race will have after the characteristic numbers of most notions are determined, according to Leibniz.

This sort of NL interaction appears to require machine reading, kb search, paraphrase generation, kb search and machine writing. A goal of mine after processing text into a knowledge representation is an NL interface (requiring machine reading/writing) to demonstrate NLU. The algorithm for the above examples is tentatively all-paths discovery, path permuting and parallel search of the kb.

Pseudocode:


// n1 is to n2 as n3 is to what?
// n1 is to n2 as what is to n4?
Nodes[] IsToAsIsTo(n1,n2,n3,n4)
{
 Paths[] P = FindAllPaths(n1,n2);
 Paths[] PP = Rephrases(P);
 Nodes[] N = null;
 if( n4 == null ) N = Search(n3,PP);
 else if( n3 == null ) N = Search(PP,n4);
 return N;
}

Sunday, May 6, 2007

WikiCorpus

So far so good on those hypotheses. The WikiCorpus alpha source code will be available (in a few weeks) here. I'm building it presently and testing it on Wikipedia articles. It will provide a user interface for generating corpora for pronoun and reference discernment, and semantic parsing algorithm design.