Next: Executing Grammars: Generation Up: Running and Debugging ALE Previous: Displaying Grammars Contents

Executing Grammars: Parsing

In this section, we consider the execution of ALE phrase structure grammars compiled for parsing. The examples shown in this section have been produced while running with the mini-interpreter off. The mini-interpreter will be discussed in the next section.

The primary predicate for parsing is illustrated as follows: [Code]

| ?- rec [john,hits,every,toy].

STRING: 
0 john 1 hits 2 every 3 toy 4

CATEGORY: 
cat
QSTORE e_list
SYNSEM basic
       SEM every
           RESTR toy
                 ARG1 [0] individual
           SCOPE hit
                 HITTEE [0] 
                 HITTER j
           VAR [0] 
       SYN s

ANOTHER?  y.

CATEGORY: 
cat
QSTORE ne_list_quant
       HD every
          RESTR toy
                ARG1 [0] individual
          SCOPE proposition
          VAR [0] 
       TL e_list
SYNSEM basic
       SEM hit
           HITTEE [0] 
           HITTER j
       SYN s

ANOTHER?  y.

no

The first thing to note here is that the input string must be entered as a Prolog list of atoms. In particular, it must have an opening and closing bracket, with words separated by commas. No variables should occur in the query, nor anything other than atoms. The first part of the output repeats the input string, separated by numbers (nodes in the chart) which indicate positions in the string for later use in inspecting the chart directly. ALE asserts one lexical item for every unit interval, with empty categories being stored as loops from every single node to itself. The second part of the output is a category which is derived for the input string. If there are multiple solutions, these can be iterated through by providing positive answers to the query. The final no response above indicates that the category displayed is the only one that was found. If there are no parses for a string, an answer of no is returned, as with:

| ?- rec([runs,john]).

STRING: 
0 runs 1 john 2

no

Notice that there is no notion of ``distinguished start symbol'' in parsing. Rather, the recognizer generates all categories that it can find for the input string. This allows sentence fragments and phrases to be analyzed, as in:

  | ?- rec [big,kid].
  
  STRING: 
  0 big 1 kid 2
  
  CATEGORY: 
  cat
  QSTORE ne_list_quant
         HD some
            RESTR and
                  CONJ1 kid
                        ARG1 [0] individual
                  CONJ2 big
                        ARG1 [0] 
            SCOPE proposition
            VAR [0] 
         TL e_list
  SYNSEM basic
         SEM [0] 
         SYN np
  
  ANOTHER?  n.

There is also a two-place version of rec that displays only those parses that satisfy a given description: [Code]

  | ?- rec([big,kid],s).

  STRING: 
  0 big 1 kid 2

  no

This call to rec/2 failed because there were no parses of big kid of type, s. Internally, the parser still generates all of the edges that it normally does -- the extra description is only applied at the end as a filter.

Once parsing has taken place for a sentence using rec/1, it is possible to look at categories that were generated internally. In general, the parser will find every possible analysis of every substring of the input string, and these will be available for later inspection. For instance, suppose the last call to rec/1 executed was rec [john,hits,every,toy], the results of which are given above. Then the following query can be made: [Code]

  | ?- edge(2,4).
  
  COMPLETED CATEGORIES SPANNING: every toy 
  
  cat
  QSTORE ne_list_quant
         HD every
            RESTR toy
                  ARG1 [0] individual
            SCOPE proposition
            VAR [0] 
         TL e_list
  SYNSEM basic
         SEM [0] 
         SYN np

  Edge created for category above: 
       index: 20
        from: 2 to: 4
      string: every toy
        rule:  np_det_nbar
   # of dtrs: 2
  Action(retract,dtr-#,continue,abort)? 
  |: continue.
 
no
| ?-

The possible replies in the action-line will be discussed in the next section. This tells us that from positions 2 to 4, which covers the string every toy in the input, the indicated category was found. Even though an active chart parser is used, it is not possible to inspect active edges. This is because ALE represents active edges as dynamic structures that are not available after they have been evaluated.

Using edge/2 it is possible to debug grammars by seeing how far analyses progressed and by inspecting analyses of substrings.

There is also a predicate, rec/4 that binds the answer to variables instead of displaying it:^B.1 [Code]

  | ?- rec([kim,walks],Ref,SVs,Iqs).
 
  Iqs = [ineq(_A,index(_B-third,_C-plur,_D-masc),_E,index(_B-third,_C-plur,
  _D-masc),done)],
  SVs =  phrase(_F-synsem(_G-loc(_H-cat(_I-verb(_J-minus,_K-minus,_L-none,_M-
  bool,_N-fin),_O-unmarked,_P-e_list),...)))?

Ref is an unbound variable that represents the root node of the resulting feature structure. SVs is a term whose functor is the type of the feature structure, whose arity is the number of appropriate features for that type, and whose arguments are the values at those appropriate features in alphabetical order. Each value, in turn, is of the form, Ref-SVs, another pair of root variable and type-functored term. Iqs is a list of the inequations that apply to the feature structure and its substructures. Each member of the list represents a disjunction of inequations, i.e., one must be satisfied, with the list itself being interpreted conjunctively, i.e., every member must be satisfied. Each member is represented by a chain of ineq/5 structures:

ineq(Ref1,SVs1,Ref2,SVs2,ineq(......,done)...)

The first four arguments represent the Ref-SVs pairs of the two inequated feature structures of the first disjunct. The fifth argument contains another ineq/5 structure, or done/0. These three structures are suitable for passing to gen/3 or gen/4. These representations are not grounded; so if you want to assert them into a database, be sure to assert them all in one predicate to preserve variable sharing. For more details on ALE's internal representation, the reader is referred to Carpenter and Penn (1996).

There is also a rec/5, which works just like rec/4, but filters its output through the description in its last argument, just like rec/2:

  | ?- rec([kim,walks],Ref,SVs,Iqs,phrase).
 
  Iqs = [ineq(_A,index(_B-third,_C-plur,_D-masc),_E,index(_B-third,_C-plur,
  _D-masc),done)],
  SVs =  phrase(_F-synsem(_G-loc(_H-cat(_I-verb(_J-minus,_K-minus,_L-none,_M-
  bool,_N-fin),_O-unmarked,_P-e_list),...)))?

The call succeeds here because the given answer is of type, phrase.

rec_list/2 iteratively displays all of the solutions for each string in a list of list of words that satisfy a description: [Code]

  | ?- rec_list([[kim,sees,sandy],[sandy,sees,kim]],phrase).

  STRING: 
  0 kim 1 sees 2 sandy 3
  CATEGORY: 
  
  phrase
  QRETR e_list
  QSTORE e_set
  SYNSEM synsem...

  ANOTHER? y.

  STRING: 
  0 sandy 1 sees 2 kim 3
  CATEGORY: 

  phrase
  QRETR e_list
  QSTORE e_set
  SYNSEM synsem

  ANOTHER?  y.

  no

If no filtering through a description is desired, the description, bot, which is trivially satisfied, can be used. When rec_list/2 runs out of solutions for a string, it moves on to the next string.

rec_best/2 iteratively displays all of the solutions satisfying a given description for the first string in a list of list of words that has a solution satisfying that description. [Code]

  | ?- rec_best([[kim,sees,sandy],[sandy,sees,kim]],phrase).

  STRING: 
  0 kim 1 sees 2 sandy 3
  CATEGORY: 
  
  phrase
  QRETR e_list
  QSTORE e_set
  SYNSEM synsem...

  ANOTHER? y.

  no

When rec_best/2 runs out of solutions for a string that had at least one solution, it fails. It only tries the strings in its first argument until it finds one that has solutions.

There is also a three-place rec_list/3 that collects the solutions to rec_list/2 in a list of terms of the form fs(Tag-SVs,Iqs).

Next: Executing Grammars: Generation Up: Running and Debugging ALE Previous: Displaying Grammars Contents

TRALE User's Manual