Next: Tectogrammatical Parsing Up: Topological Parsing Previous: Synchronising Phenogrammar and Tectogrammar Contents

Phenogrammatical Parsing

Phenogrammatical parsing (or pheno-parsing for short) is performed using a bottom-up right-to-left chart parsing algorithm. As mentioned in section 9.2, parsing the input string right-to-left and interpreting the rules left-to-right eliminates the need for active edges.

The parser begins by looking up the words in the input stream in its lexicon and finding the grammatical category/categories that the word belongs to. Then using the matches constraints, it finds what pheno-category/categories each one of the tecto-categories matches. Using this information, it can then seed the chart with both tecto- and pheno-edges as appropriate. For example, if the third word in the input string is listed as both a pronoun and a marker in the lexicon, and we have the following matches rules in the grammar:

pron   matches (mf; vf; objf)
marker matches cf

then the following pheno-edges will be added to the chart:

pheno_edge(2,3,mf,  ID1)
pheno_edge(2,3,vf,  ID2)
pheno_edge(2,3,objf,ID3)
pheno_edge(2,3,cf,  ID4)

where IDn stands for a unique ID for each edge. The predicate responsible for the addition of pheno-edges to the chart is add_pheno_edge/5, which also performs a transitive closure on the accessibility relation between the newly added edge and all its descendents unless one of those descendents has resulted in the prediction of tecto-category. We will talk about tecto-edges in the next section.

Each addition of an edge to the chart triggers a search in the topo rules trying to find a corresponding parse subtree. The search is performed in a failure driven loop using Prolog's call stack as its search space. For instance, if the grammar includes the rule:

clause topo [vf,cf,mf,vc,nf]

the addition of the edge pheno_edge(L,R,vf,ID1) will trigger the following topo rule:

topo(L, M, vf, ID1) :-
  pheno_edge(M, N, cf, ID2),
  pheno_edge(N, O, mf, ID3),
  pheno_edge(O, P, vc, ID4),
  pheno_edge(P, R, nf, ID5),
  R>L,
  add_pheno_edge(clause, L, R, 
    [ID1-vf,ID2-cf,ID3-mf,ID4-vc,ID5-nf], _).

This predicate succeeds if a sequence of correct edges have been found and if at least one of those edges consumes input--i.e., has not been added to the chart because it has been declared optional somewhere in the grammar. The list of ID-F pairs is used by add_pheno_edge/5 for topological accessibility.

In addition, if the grammar contains the following relevant linking and matched-by rules as well:

clause <<-- nf
clause matched_by s

then the topo predicate mentioned above will have some additional clauses accordingly, as follows:

topo(L, M, vf, ID1) :-
  pheno_edge(M, N, cf, ID2),
  pheno_edge(N, O, mf, ID3),
  pheno_edge(O, P, vc, ID4),
  pheno_edge(P, R, nf, ID5),
  R>L,
  add_pheno_edge(clause, L, R, 
    [ID1-vf,ID2-cf,ID3-mf,ID4-vc,ID5-nf], ID6),
  bv(L, R, BV),
  add_tecto_aedge(s, ID6, [], BV, 0),
  add_pheno_edge(nf, L, R, [ID6-link], _).

The matched-by clause in the grammar says that the yield of all clause regions is matched by the yield of some S. This causes the parser to predict an S whenever it finds a clause. In this topo predicate, the clause bv(L, R, BV) converts the interval spanned by the clause region into a bit vector BV. We record the yield of tecto-categories in bit vectors because tecto-categories may not always be contiguous. Then it adds an active tecto-edge for an S to the chart. The ID of the sponsoring clause is also passed to the active edge. The empty list that is passed as the third argument of add_tecto_aedge/5 represents the keys that that edge carries (see below). The fourth argument stands for the CanBV of the active edge, which is the parts of the input string that it is allowed to use. The fifth argument stands for the OptBV of the edge, which is the parts of the input string that it is allowed not to use. In this case the OptBV is 0 because the S must consume everything that the clause has otherwise the matched-by constraint would not hold. The add_pheno_edge has been added because of the linking rule. It adds a new pheno-edge to the chart.

Next: Tectogrammatical Parsing Up: Topological Parsing Previous: Synchronising Phenogrammar and Tectogrammar Contents

TRALE Reference Manual