[User's Manual]
[Code]
EFD stands for Empty First Daughter. The EFD-closure algorithm
assumes that a grammar is ``EFD-closed'' meaning that the first
daughter of all the grammatical rules in the grammar are non-empty.
The definition of an EFD-closed grammar is provided below:
At compile time, ALE transforms any phrase-structure grammar to an EFD-closed grammar. The EFD-closure process is explained in section 9.2 below. In this section, we step through the parsing of a sentence based on the simple grammar presented below. For ease of exposition, atomic categories are used instead of feature structures.
s ===> [np,vp]. vp ===> [vbar]. vp ===> [vbar,pp]. vbar ===> [vt,np]. np ===> [det,nbar]. nbar ===> [n]. nbar ===> [n,pp]. pp ===> [p,np]. lex(john,np). lex(nudged,vt). lex(a,det). lex(the,det). lex(man,n). lex(cane,n). lex(with,p).
The sentence that we are parsing is ``John nudged the man with a cane.'' The top-level predicate of the parser is rec/2 which takes a list of words in its first argument and returns a feature structure corresponding to that sentence in its second argument. Therefore, to parse the above sentence ALE calls rec/2 as follows:
rec([john,nudged,the,man,with,a,cane],FS)
This algorithm proceeds breadth-first, right-to-left through the input
string at each step applying the grammar rules depth-first, matching
daughter categories left-to-right. The first step is then to reverse
the input string, and compute its length (performed by
reverse_count/5
) and initialize the chart:9.1
rec(Ws,FS):- retractall(edge(_,_,_)), reverse_count(Ws,[],WsRev,0,Length), CLength is Length - 1, functor(Chart,chart,CLength), build(WsRev,Length,Chart), edge(0,Length,FS).
retractall(edge(_._._))
resets the chart by removing all
asserted edges. reverse_count(Ws,[],WsRev,0,Length)
reverses
the input list and computes its length. In the case of our example, it
will return this:
reverse_count([john,nudged,the,man,with,a,cane],[], [cane,a,with,man,the,nudged,john],0,7)
Two copies of the chart are used in this presentation. One is represented by a term chart(E1,...,EL), where the argument holds the list of edges whose left node is . Edges at the beginning of the chart (left node 0) do not need to be stored in this copy, nor do edges beginning at the end of the chart (especially, empty categories with left node and right node Length). This is called the term copy of the chart. The other copy is kept in a dynamic predicate, edge/3, as a textbook Prolog chart parser would. This is called the asserted copy of the chart.
Neither copy of the chart stores empty categories. These are assumed
to be available in a separate predicate, empty_cat/1
. Since
the grammar is EFD-closed, no grammar rule can produce a new empty
category. As you may have noticed already, lexical items are assumed
to be available in the predicate lex/2.
In our example, CLength evaluates to 6, which means the
corresponding term copy of the chart gets initiated as
chart(_,_,_,_,_,_)
. The predicate, build/3, then
actually builds the chart. The definition of build/3 is shown
below:
build([W|Ws],R,Chart):- RMinus1 is R-1, (lex(W,FS), add_edge(RMinus1,R,FS,Chart) ; ( RMinus1 =:= 0 -> true ; rebuild_edges(RMinus1,Edges), arg(RMinus1,Chart,Edges), build(Ws,RMinus1,Chart) ) ). build([],_,_).
The precondition upon each call to build(Ws,R,Chart) is that
Chart contains the complete term copy of the non-loop edges
of the parsing chart from node R to the end, while
Ws contains the (reversed) input string from node R
to the beginning. Each pass through the first clause of
build/3 then decrements Right, and seeds the chart
with every category for the lexical item that spans from R-1
to R. The predicate add_edge/4
actually adds the
lexical edge to the asserted copy of the chart, and then closes the
chart depth-first under rule applications in a failure-driven loop.
add_edge/4
initiates a loop because it calls rule/4
,
which in turn calls match_rest/5
, which then calls
add_edge/4
again. It is also a failure-driven loop because
the number of rules with any given leftmost daughter is finite causing
rule/4
to fail after all the relevant rules have been
processed. When it has finished, if Ws is not empty
(RMinus1 is not 0), then build/3 retracts all of the
new edges from the asserted copy of the chart (with
rebuild_edges/2
, described below) and adds them to the
R-1st argument of the term copy before continuing to the next
word.
add_edge/4
matches non-leftmost daughter descriptions from
either the term copy of the chart, thus eliminating the need for
additional copying of non-empty edges, or from empty_cat/1
.
Whenever it adds an edge, however, it adds it to the asserted
copy of the chart. This is necessary because add_edge/4
works in a failure-driven loop, and any edges added to the term copy
of the chart would be removed during backtracking:
add_edge(Left,Right,FS,Chart):- assert(edge(Left,Right,FS)), rule(FS,Left,Right,Chart). rule(FS,L,R,Chart):- (Mother ===> [FS|DtrsRest]), % PS rule match_rest(DtrsRest,R,Chart,Mother,L). match_rest([],R,Chart,Mother,L):- add_edge(L,R,Mother,Chart). match_rest([Dtr|DtrsRest],R,Chart,Mother,L):- arg(R,Chart,Edges), member(edge(NewR,Dtr),Edges), match_rest(DtrsRest,NewR,Chart,Mother,L) ; empty_cat(Dtr), match_rest(DtrsRest,R,Chart,Mother,L).
Note that we never need to be concerned with updating the term copy of
the chart during the operation of add_edge/4
because
EFD-closure guarantees that all non-leftmost daughters must have left
nodes strictly greater than the Left passed as the first
argument to add_edge/4
.
Moving new edges from the asserted copy to the term copy is
straightforwardly achieved by rebuild_edges/2
:
rebuild_edges(Left,Edges):- retract(edge(Left,R,FS)) -> Edges = [edge(R,FS)|EdgesRest], rebuild_edges(Left,EdgesRest) ; Edges = [].
The two copies required by this algorithm are thus: 1) copying a new edge to the asserted copy of the chart by add_edge/4, and 2) copying new edges from the asserted copy of the chart to the term copy of the chart by rebuild_edges/2. The asserted copy is only being used to protect the term copy from being unwound by backtracking.
The first pass of build/3
on our example sentence will thus
update the term copy of the chart as follows:
chart(_,_,_,_,_,[edge(7,n),edge(7,nbar)])
This means that the parser has found an n and an nbar that span from the edge 6 (known from the argument number in the chart) to the edge 7 (recorded by edge/2). By the time the parser reaches the first word in the input string, Chart will contain the following data:
chart([edge(2,vt),edge(4,vbar),edge(4,vp), edge(7,vp),edge(7,vbar),edge(7,vp)], [edge(3,det),edge(4,np),edge(7,np)], [edge(4,n),edge(4,nbar),edge(7,nbar)], [edge(5,p),edge(7,pp)], [edge(6,det),edge(7,np)], [edge(7,n),edge(7,nbar)])
At the end of the parsing operation, the asserted copy of the chart
only has a record of the edges whose left corner is 0. This is because
all the other edges have been retracted by rebuild_edges/4
while updating the term copy of the chart, and because there is no
need to copy the asserted copy of the edges with a left corner 0 onto
the term copy. rec/2 uses this to retrieve the edge that spans
the whole input string.