Next: EFD-closure Up: Bottom-up Parsing Previous: Bottom-up Parsing Contents

The Algorithm

[User's Manual]
[Code]
EFD stands for Empty First Daughter. The EFD-closure algorithm assumes that a grammar is ``EFD-closed'' meaning that the first daughter of all the grammatical rules in the grammar are non-empty. The definition of an EFD-closed grammar is provided below:

Definition 18 An (extended) context-free grammar,

, is empty-first-daughter-closed (EFD-closed) iff, for every derivation $N \Rightarrow^{*} w$ , there is a derivation of $N \Rightarrow^{*} w$ in which no left-corner of any non-preterminal subtree is $\epsilon$ .

At compile time, ALE transforms any phrase-structure grammar to an EFD-closed grammar. The EFD-closure process is explained in section 9.2 below. In this section, we step through the parsing of a sentence based on the simple grammar presented below. For ease of exposition, atomic categories are used instead of feature structures.

s    ===> [np,vp].
vp   ===> [vbar].
vp   ===> [vbar,pp].
vbar ===> [vt,np].
np   ===> [det,nbar].
nbar ===> [n].
nbar ===> [n,pp].
pp   ===> [p,np].

lex(john,np).
lex(nudged,vt).
lex(a,det).
lex(the,det).
lex(man,n).
lex(cane,n).
lex(with,p).

The sentence that we are parsing is ``John nudged the man with a cane.'' The top-level predicate of the parser is rec/2 which takes a list of words in its first argument and returns a feature structure corresponding to that sentence in its second argument. Therefore, to parse the above sentence ALE calls rec/2 as follows:

rec([john,nudged,the,man,with,a,cane],FS)

This algorithm proceeds breadth-first, right-to-left through the input string at each step applying the grammar rules depth-first, matching daughter categories left-to-right. The first step is then to reverse the input string, and compute its length (performed by reverse_count/5) and initialize the chart:^9.1

rec(Ws,FS):-
  retractall(edge(_,_,_)),
  reverse_count(Ws,[],WsRev,0,Length),
  CLength is Length - 1,
  functor(Chart,chart,CLength),
  build(WsRev,Length,Chart),
  edge(0,Length,FS).

retractall(edge(_._._)) resets the chart by removing all asserted edges. reverse_count(Ws,[],WsRev,0,Length) reverses the input list and computes its length. In the case of our example, it will return this:

reverse_count([john,nudged,the,man,with,a,cane],[],
              [cane,a,with,man,the,nudged,john],0,7)

Two copies of the chart are used in this presentation. One is represented by a term chart(E1,...,EL), where the $i^{th}$ argument holds the list of edges whose left node is . Edges at the beginning of the chart (left node 0) do not need to be stored in this copy, nor do edges beginning at the end of the chart (especially, empty categories with left node and right node Length). This is called the term copy of the chart. The other copy is kept in a dynamic predicate, edge/3, as a textbook Prolog chart parser would. This is called the asserted copy of the chart.

Neither copy of the chart stores empty categories. These are assumed to be available in a separate predicate, empty_cat/1. Since the grammar is EFD-closed, no grammar rule can produce a new empty category. As you may have noticed already, lexical items are assumed to be available in the predicate lex/2.

In our example, CLength evaluates to 6, which means the corresponding term copy of the chart gets initiated as chart(_,_,_,_,_,_). The predicate, build/3, then actually builds the chart. The definition of build/3 is shown below:

build([W|Ws],R,Chart):-
  RMinus1 is R-1,
  (lex(W,FS),
   add_edge(RMinus1,R,FS,Chart)
  ; ( RMinus1 =:= 0 -> true
    ; rebuild_edges(RMinus1,Edges),
      arg(RMinus1,Chart,Edges),
      build(Ws,RMinus1,Chart)
    )
  ).

build([],_,_).

The precondition upon each call to build(Ws,R,Chart) is that Chart contains the complete term copy of the non-loop edges of the parsing chart from node R to the end, while Ws contains the (reversed) input string from node R to the beginning. Each pass through the first clause of build/3 then decrements Right, and seeds the chart with every category for the lexical item that spans from R-1 to R. The predicate add_edge/4 actually adds the lexical edge to the asserted copy of the chart, and then closes the chart depth-first under rule applications in a failure-driven loop. add_edge/4 initiates a loop because it calls rule/4, which in turn calls match_rest/5, which then calls add_edge/4 again. It is also a failure-driven loop because the number of rules with any given leftmost daughter is finite causing rule/4 to fail after all the relevant rules have been processed. When it has finished, if Ws is not empty (RMinus1 is not 0), then build/3 retracts all of the new edges from the asserted copy of the chart (with rebuild_edges/2, described below) and adds them to the R-1st argument of the term copy before continuing to the next word.

add_edge/4 matches non-leftmost daughter descriptions from either the term copy of the chart, thus eliminating the need for additional copying of non-empty edges, or from empty_cat/1. Whenever it adds an edge, however, it adds it to the asserted copy of the chart. This is necessary because add_edge/4 works in a failure-driven loop, and any edges added to the term copy of the chart would be removed during backtracking:

add_edge(Left,Right,FS,Chart):-
  assert(edge(Left,Right,FS)),
  rule(FS,Left,Right,Chart).

rule(FS,L,R,Chart):-
  (Mother ===> [FS|DtrsRest]), % PS rule
  match_rest(DtrsRest,R,Chart,Mother,L).

match_rest([],R,Chart,Mother,L):-
  add_edge(L,R,Mother,Chart).

match_rest([Dtr|DtrsRest],R,Chart,Mother,L):-
  arg(R,Chart,Edges),
  member(edge(NewR,Dtr),Edges),
  match_rest(DtrsRest,NewR,Chart,Mother,L)
  ; empty_cat(Dtr),
    match_rest(DtrsRest,R,Chart,Mother,L).

Note that we never need to be concerned with updating the term copy of the chart during the operation of add_edge/4 because EFD-closure guarantees that all non-leftmost daughters must have left nodes strictly greater than the Left passed as the first argument to add_edge/4.

Moving new edges from the asserted copy to the term copy is straightforwardly achieved by rebuild_edges/2:

rebuild_edges(Left,Edges):-
  retract(edge(Left,R,FS))
  -> Edges = [edge(R,FS)|EdgesRest],
        rebuild_edges(Left,EdgesRest)
  ;     Edges = [].

The two copies required by this algorithm are thus: 1) copying a new edge to the asserted copy of the chart by add_edge/4, and 2) copying new edges from the asserted copy of the chart to the term copy of the chart by rebuild_edges/2. The asserted copy is only being used to protect the term copy from being unwound by backtracking.

The first pass of build/3 on our example sentence will thus update the term copy of the chart as follows:

chart(_,_,_,_,_,[edge(7,n),edge(7,nbar)])

This means that the parser has found an n and an nbar that span from the edge 6 (known from the argument number in the chart) to the edge 7 (recorded by edge/2). By the time the parser reaches the first word in the input string, Chart will contain the following data:

chart([edge(2,vt),edge(4,vbar),edge(4,vp),
       edge(7,vp),edge(7,vbar),edge(7,vp)],
      [edge(3,det),edge(4,np),edge(7,np)],
      [edge(4,n),edge(4,nbar),edge(7,nbar)],
      [edge(5,p),edge(7,pp)],
      [edge(6,det),edge(7,np)],
      [edge(7,n),edge(7,nbar)])

At the end of the parsing operation, the asserted copy of the chart only has a record of the edges whose left corner is 0. This is because all the other edges have been retracted by rebuild_edges/4 while updating the term copy of the chart, and because there is no need to copy the asserted copy of the edges with a left corner 0 onto the term copy. rec/2 uses this to retrieve the edge that spans the whole input string.

Next: EFD-closure Up: Bottom-up Parsing Previous: Bottom-up Parsing Contents

TRALE Reference Manual