Phenogrammatical parsing (or pheno-parsing for short) is performed using a bottom-up right-to-left chart parsing algorithm. As mentioned in section 9.2, parsing the input string right-to-left and interpreting the rules left-to-right eliminates the need for active edges.
The parser begins by looking up the words in the input stream in its lexicon and finding the grammatical category/categories that the word belongs to. Then using the matches constraints, it finds what pheno-category/categories each one of the tecto-categories matches. Using this information, it can then seed the chart with both tecto- and pheno-edges as appropriate. For example, if the third word in the input string is listed as both a pronoun and a marker in the lexicon, and we have the following matches rules in the grammar:
pron matches (mf; vf; objf) marker matches cf
then the following pheno-edges will be added to the chart:
pheno_edge(2,3,mf, ID1) pheno_edge(2,3,vf, ID2) pheno_edge(2,3,objf,ID3) pheno_edge(2,3,cf, ID4)
where IDn stands for a unique ID for each edge. The predicate
responsible for the addition of pheno-edges to the chart is
add_pheno_edge/5
, which also performs a transitive closure on
the accessibility relation between the newly added edge and all its
descendents unless one of those descendents has resulted in the
prediction of tecto-category. We will talk about tecto-edges in the
next section.
Each addition of an edge to the chart triggers a search in the topo rules trying to find a corresponding parse subtree. The search is performed in a failure driven loop using Prolog's call stack as its search space. For instance, if the grammar includes the rule:
clause topo [vf,cf,mf,vc,nf]
the addition of the edge pheno_edge(L,R,vf,ID1)
will trigger
the following topo rule:
topo(L, M, vf, ID1) :- pheno_edge(M, N, cf, ID2), pheno_edge(N, O, mf, ID3), pheno_edge(O, P, vc, ID4), pheno_edge(P, R, nf, ID5), R>L, add_pheno_edge(clause, L, R, [ID1-vf,ID2-cf,ID3-mf,ID4-vc,ID5-nf], _).
This predicate succeeds if a sequence of correct edges have been found
and if at least one of those edges consumes input--i.e., has not been
added to the chart because it has been declared optional somewhere in
the grammar. The list of ID-F pairs is used by
add_pheno_edge/5
for topological accessibility.
In addition, if the grammar contains the following relevant linking and matched-by rules as well:
clause <<-- nf clause matched_by s
then the topo predicate mentioned above will have some additional clauses accordingly, as follows:
topo(L, M, vf, ID1) :- pheno_edge(M, N, cf, ID2), pheno_edge(N, O, mf, ID3), pheno_edge(O, P, vc, ID4), pheno_edge(P, R, nf, ID5), R>L, add_pheno_edge(clause, L, R, [ID1-vf,ID2-cf,ID3-mf,ID4-vc,ID5-nf], ID6), bv(L, R, BV), add_tecto_aedge(s, ID6, [], BV, 0), add_pheno_edge(nf, L, R, [ID6-link], _).
The matched-by clause in the grammar says that the yield of all clause regions is matched by the yield of some S. This causes the
parser to predict an S whenever it finds a clause. In this topo predicate, the clause bv(L, R, BV) converts the interval
spanned by the clause region into a bit vector BV. We
record the yield of tecto-categories in bit vectors because
tecto-categories may not always be contiguous. Then it adds an active
tecto-edge for an S to the chart. The ID of the sponsoring
clause is also passed to the active edge. The empty list that is
passed as the third argument of add_tecto_aedge/5
represents
the keys that that edge carries (see below). The fourth argument
stands for the CanBV of the active edge, which is the parts of the
input string that it is allowed to use. The fifth argument stands for
the OptBV of the edge, which is the parts of the input string that it
is allowed not to use. In this case the OptBV is 0 because the S must
consume everything that the clause has otherwise the matched-by
constraint would not hold. The add_pheno_edge
has been added
because of the linking rule. It adds a new pheno-edge to the chart.