Friday, April 13, 2007

Head-Driven Phrase Structure Grammar

I'm thinking that the PropBank style of parsing (redwoods treebank) is more readily converted to structured knowledge than part of speech tagging. However, looking over the initial results of some software, it appears that some style of recursion would be of use in capturing all the structure (substructure) of a sentence. Some arguments to the main predicate appear to have discernable structure remaining — if an argument to the predicate could be a predicate, then this would capture as much structure as possible. Also noticing the inability of this style of parser to capture parallel predicates with some sentences that use logical connectives. Different levels in the recursion should be able to reuse arguments from across the sentence. I may have to code up a prototype to obtain as much semantic structure as possible, possibly outputting a set of these parses that capture it in parallel.

Also looking into initializing this style of parser with the POS-style to discern the main verb and then the remainder in order. After entity recognition and string concatenation of multiword nouns, parsers seem to function more accurately. I'm going to look at the parse trees for verb hierarchy and SBAR information to construct recursive predicate structures, utilize the NP information to bootstrap entity recognition, and post here which code appears “easiest” to build from.

[1] Miyao, Y. and Tsujii, J. 2004. Deep linguistic analysis for the accurate identification of predicate-argument relations. In Proceedings of the 20th international Conference on Computational Linguistics (Geneva, Switzerland, August 23 - 27, 2004). International Conference On Computational Linguistics. Association for Computational Linguistics, Morristown, NJ, 1392.

[2] Towards Parsing Unrestricted Text into PropBank Predicate-Argument Structures

No comments: