Phenogrammatical rules (pheno-rules) are represented with the infix operator, topo/2 in a grammar file. The left-hand side of the operator represents a region and the right-hand side a list of the topological field descriptors for the fields inside the region in order from left to right. The syntax of topo/2 is as follows:
<topo-rule> ::= <region> topo <field-descriptor-list>
<region> is the name of the region and must be a Prolog atom. <field-descriptor-list> is a list of field descriptors formulated in Penn and Haji-Abdolhosseini (2002). Field names also have to be Prolog atoms.
The following example shows two rules for clause region in German. As in Prolog, these are interpreted disjunctively. Therefore, what the following two rules together mean is that all clause regions in German either have the first topology or the second.
clause topo [vf, cf, mf*, {vc}, {nf}]. clause topo [ cf, mf+, vc , {nf}].
Recall that `*' describes zero or more fields, `+' at least one field and `{}' optional fields. A bare field name represents a single obligatory field. According to the first rule above, then, a clause region consists of a single vf, one cf, zero or more mf's, an optional vc and an optional nf, and these fields are arranged in the same order as in the list. Note that as in Prolog, every predicate ends in a period.
Two other pheno-rules have been defined in the file german.pl:
npr topo [sprf, adjf*, nof, postf*]. ppr topo [pf, objf].
The first describes the topology of a noun-phrase region and the second that of a prepositional-phrase region.
Note that all pheno-rules must at least be binary branching. If this is not the case an exception is raised at compile time. In addition, the same field name cannot be used ambiguously for more than one region. Ambiguous field names also result in an exception being raised.
A condition implicit in all topo rules is that a region must at least span over one word. Thus, in the following rule:
r1 topo [{f1},{f2},{f3}].
although all the daughter fields of r1 are optional, we will not have any zero-length r1's in the chart during parsing. An r1 is introduced in the chart only if it spans at least one word.
The relationships among various regions are described with linkage constraints. These license links in a pheno-tree with field mothers and region daughters. The syntax for these constraints is as follows:
<linking-rule> ::= <region> <<-- <field> | <region> <<-- (<fields>) | <field> -->> <region> | <field> -->> (<regions>) <field> ::= <prolog-atom> | matrix <region> ::= <prolog-atom> <fields> ::= <field> | <fields> ; <fields> <regions> ::= <region> | <regions> ; <regions>
These constraints are universally quantified on the left-hand side and existentially quantified on the right-hand side. Therefore, only one such constraint can be declared for each field or region. If more than one linkage constraint is given for a field/region, a warning message is displayed and only the last one is considered.
In a grammar file, we use the two infix operators «-/2 and -»/2 for linkage rules with regions or fields on the left-hand side respectively. Disjunction on the right-hand side is shown by semicolons (;). Therefore, to show that an npr may fit into vf, mf, objf or nf, we can use the following declaration. Note that the parentheses around the disjuncts are required. The special field, matrix is discussed in section T8.3.6.
npr <<-- (vf; mf; objf; nf).
If both «-/2 and -»/2 are needed in a single grammar file, care must be exercised that the two declarations not contradict each other. For example, the following two linkage declarations are inconsistent:
r1 <<-- (f1; f2; f3). f4 -->> (r1; r2; r3).
The first constraint states that all r1 regions can only be linked to either an f1, f2 or f3 and nothing else. The second constraint, on the other hand, states that all f4 fields can only be linked to either r1, r2 or r3 and nothing else. According to the first constraint, however, r1 cannot be linked to f4. To resolve this inconsistency, we should either add f4 to the first constraint, or delete r1 from the second as appropriate. Therefore, both of the following sets of constraints are logically sound:
r1 <<-- (f1; f2; f3; f4). f4 -->> (r1; r2; r3).
or
r1 <<-- (f1; f2; f3). f4 -->> (r2; r3).
The first set of constraints that connect tectogrammar with phenogrammar is the set of matches constraints. These constraints say what tectogrammatical categories match in their yields with what phenogrammatical categories. The parser uses these constraints to find what field or region in the phenogrammar each lexical category matches. Therefore,these constraints are crucial to the proper operation of the system. The matches constraints have the following syntax:
<matches-rule> ::= <phi> matches <field-or-region> | <phi> matches (<fields-or-regions>) | (<phi> matches <field-or-region>) :- <prolog-goals> <phi> ::= <prolog-atom> | <prolog-variable> <field-or-region> ::= <field> | <region> <fields-or-regions> ::= <field-or-region> | <fields-or-regions> ; <fields-or-regions>
<phi> stands for any tectogrammatical category. In this version, TGC assumes tectogrammatical categories to be either atomic types (e.g. adv, s or aux) or descriptions (e.g. (n,gender:G, case:C) or (np,person:P, number:N, gender:G, case:C)). The following are, therefore, well-formed matches declarations:
adv matches mf. marker matches cf.
Assuming that we consider type n to have two appropriate features NUMBER and GENDER, then we may have the following matches declaration to mean that a first person, neuter noun matches a `noun field' (nof).
(n,number:first,gender:neuter) matches nof.
As noted above, the right-hand side of these rules may contain a disjunction of fields or regions; therefore, to state that a proper noun (pn) matches an mf, vf or objf, the following declaration is used:
pn matches (mf; vf; objf).
Again, note that the parentheses are required around the disjuncts.
Sometimes, it is necessary to have a condition specified for the matches constraints. For example, in the Croatian and Serbian language, a third person singular present tense auxiliary verb (je) has its own field, jef, and any other auxiliary verb matches an auxf. We represent these constraints as follows:
((aux,person:third,number:sg,tense:pres) matches jef):- !. aux matches auxf.
What we have done here is place a matches constraint inside the head of a Prolog conditional clause. The body of the rule is normal Prolog code. In the above example, the cut (!) prevents je from matching auxf as well as jef.
This strategy of placing a grammatical constraint inside the head of a Prolog conditional only works for matches/2 and lexical entries (-->/2 to be discussed in section T8.4 below). The conditional matches rules are placed before the unconditional ones in the compiled file. It should be kept in mind that disjunction is not allowed in conditional matches constraints.
Universal matches rules for non-lexical categories enforce the requirement that all tectogrammatical categories that can unify with <phi> should always match a topologically accessible <field-or-region>.
The syntax of matched_by constraints is as follows:
<matched-by-rule> ::= <field-or-region> matched_by <phi> | <field-or-region> matched_by (<phis>) <phis> ::= <phi> | <phis> ; <phis>
These constraints are responsible for predicting tectogrammatical categories (top-down) once the parser has found certain regions or fields. Because they block topological accessibility to higher nodes in the tecto-tree, matched_by constraints create inseparable structures that make parsing more efficient. It is, therefore, prudent to try to introduce as many different types of such constraints as possible in addition to clause-level categories. In German, for instance, one can have both of the following constraints at the same time in a single grammar:
clause matched_by (s; rp; cp). ppr matched_by pp.
Once a clause region is found during pheno-parse, the first constraint results in a top-down tecto-parse in search for an S, RP or CP. The second constraint has the same effect only at the level of PP. The advantage of having the second constraint is that the internal structure of PP's in this case will remain inaccessible to higher nodes and therefore, no active edge higher up in the tree will try to steal the NP or NP's inside PP's, which results in a more efficient parse.
As shown in the previous example, disjunction of tecto-categories on the right-hand side of matched_by constraints is also acceptable. As usual, the parentheses around the disjuncts are required.
Because, matched_by constraints have universal quantification on their right-hand side, only one such constraint is allowed in each grammar for a field or region. In case more than one matched_by constraint is specified for a field/region, a warning message is issued and only the last one gets considered.
A shorthand for
phi matches f, f matched_by phi
is phi <==> f. Note that no disjunction is allowed on either side of <==>/2. The syntax of <==>/2 is as follows:
<bidirectional-matching> ::= <phi> <==> <field-or-region>
Global covers declarations have the following syntax:
<covers-rule> ::= <phi> covers <field-or-region> | <phi> covers (<fields-or-regions>)
These constraints state that the yield of all <phi> must include all of <field-or-region>. As usual, disjunction is represented with semicolons (;) and parentheses are required around disjuncts.
The syntax of covered_by/2 (presented below) is almost identical to that of matched_by/2. It also has similar semantics. The only difference is that covered_by constraints require all <field-or-region> (declared on the left-hand side) be consumed by some <phi> (declared on the right-hand side), but <phi> may be larger in its yield than <field-or-region>.
<covered-by-rule> ::= <field-or-region> covered_by <phi> | <field-or-region> covered_by (<phis>) <field-or-region> ::= <field> | <region> <phis> ::= <phi> ; <phis>
It should be noted that covered_by constraints are the least efficient of constraints and grammar writers should try to avoid them by using other more efficient means such as covers constraints, structure-sharing, precedence or immediate precedence constraints etc.
Just as matched_by, covered_by constraints are also universally quantified on the left-hand side. Therefore, if more than one such constraint is declared for a given field or region, a warning message is issued and the last one gets considered.
The operator <-> in covering is the counterpart of <==> in matching. Therefore,
phi <--> f.
is equivalent to
phi covers f. f covered_by phi.
Note that no disjunction is allowed in either side of <->/2. The syntax of <->/2 is provided below:
<bidirectional-covering> ::= <phi> <--> <field-or-region>
Compaction applies to tectogrammatical categories and states that the elements in the list that is its argument be contiguous strings. As a global constraint, compaction applies this condition to all categories that are consistent with any of the elements in the argument list of the constraint.
<compaction-rule> ::= compacts(<phi-list>)
In a grammar file, the predicate compacts/1 is used to show compaction. The argument of compacts/1 is a list of tectogrammatical categories. For example, in order to show that all NP's in a language are contiguous, we can use the following global constraint:
compacts([np]).
There can only be one global compacts/1 declaration in the grammar. Should there be more than one, a warning message is issued and the last one gets considered. Also note that the argument of compacts/1 must be a list. A non-list argument generates an error.
A special type of field that is recognised by TGC is matrix, which must always be maximal in a grammar (i.e. not linked to any higher fields or regions) and it must span over the whole input string. By using matrix, a grammar writer may delay prediction of sentences until such time that the whole input sentence has been pheno-parsed. An example of using matrix follows:
clause topo [vf, cf, mf*, {vc}, {nf}]. clause topo [ cf, mf+, vc , {nf}]. clause <<-- (nf;matrix). matrix matched_by us.
According to the mini-phenogrammar above, the clause region is linked to nf and matrix. The fact that only matrix is in charge of predicting an unembedded sentence (US) delays tecto-parsing until all the input sentence has been pheno-parsed and this is because no matrix is introduced to the chart until the clause that links to it consumes the whole input string.
The tectogrammatical rules that TGC uses resemble phrase-structure rules in earlier generative grammars except that the rules do not assume order or contiguity in any way. The operator for tecto-rules is *->/2. The syntax of tecto-rules is defined below:
<tecto-rule> ::= <phi> *--> [<phi-conj>] | <phi> *--> [<phi-conj>, {<conditions>}] <phi> ::= <description> | <prolog-variable> <phi-conj> ::= <phi> | <phi-conj>, <phi-conj> <conditions> ::= <condition> | (<conditions>, <conditions>) | (<conditions>; <conditions>) <condition> ::= <local-covers> | <local-matches> | <precedence> | <immediate-precedence> | <local-compaction> | <inequation>
The category on the left-hand side of a tecto-rule represents the mother node, and the daughters are provided (in no particular order) on the right-hand side. The daughters should be separated by commas. It should be noted that TGC assumes no empty categories.
As is suggested in the BNF above, one may provide some conditions for the applicability of a tecto-rule much in the same manner as DCG's. The conditions are separated from the rule by {}, which appear at the very end of the rule. The acceptable conditions are local covering, local matching, precedence, immediate precedence, local compaction and inequations or other Prolog goals. The following subsections discuss various types of conditions that are allowed in tecto-rules.
As well as global covering constraints, one can state covers/2 constraints locally inside tecto-rules. The scope of local constraints does not surpass the tecto-rule in which they are used. The syntax of local covering is as follows:
<local-covers> ::= <non-zero-index> covers <field-or-region>
Disjunction is not allowed on the right-hand-side of local covers constraints. The <non-zero-index> stands for a non-zero integer that ranges from 1 to the number of daughters in the applicable tecto-rule. Therefore, the following hypothetical rule is syntactically acceptable:
pp *--> [(p,case:C), (nbar,case:C), {2 covers npr}].
What the above rule says is that a PP consists of a P and an NBar that shares its CASE feature with P and that the second daughter in the rule, namely, NBar covers an npr. In this context the field or region specified will be one that is topologically accessible to the sponsor of the mother node. Therefore, the npr in the above example has to be topologically accessible to the pheno-edge that resulted in the prediction of that PP.
Local matching is a special case of local covering. The only difference is that in matching the daughter covering the region cannot be larger than it. The syntax of local matching is provided below:
<local-matches> ::= <non-zero-index> matches <field-or-region>
An example of a local matches constraint in German grammar is:
(nbar, person:P, number:N, gender:G, case:C) *--> [(nbar, person:P, number:N, gender:G, case:C), rp, { 2 matches nf ; 2 matches postf }].
This rule states that an NBar consists of another NBar with the same PERSON, NUMBER, GENDER and CASE features as well as an RP, which matches either an nf or postf that is accessible to the sponsor of the mother NBar.
Precedence constraints have this syntax:
<precedence> ::= <non-zero-index> < <non-zero-index>
What they mean is that the daughter whose index is mentioned on the left-hand side must be entirely located before the daughter whose index is on the right-hand side. Note that this constraint does not make any assumptions about the contiguity of the daughters. As an example, one can provide the following hypothetical rule that states an NP consists of a determiner that precedes but might not be adjacent to an NBar with the same agreement features.
(np, person:P, number:N, gender:G, case:C) *--> [(det, number:N, gender:G, caes:C), (nbar, person:P, number:N, gender:G, case:C), {1 < 2}].
A special kind of precedence is immediate precedence, which means that the two daughters mentioned in the constraint have to be adjacent to one another. This only applies to the rightmost word of the first daughter specified in the constraint and the leftmost word of the second one. This constraint does not assume contiguity of daughters either. The syntax of an immediate precedence constraint is as follows:
<immediate-precedence> ::= <non-zero-index> << <non-zero-index>
The following serves as an example for this and means that an NP consists of a determiner which is immediately followed by an NBar.
(np, person:P, number:N, gender:G, case:C) *--> [(det, number:N, gender:G, case:C), (nbar, person:P, number:N, gender:G, case:C), {1 << 2}].
As is clear from the name of the constraint, a local compaction constraint states that a daughter or the mother node in a rule is a contiguous string. Obviously, you do not need to specify that for lexical categories that are by definition one word long and therefore contiguous. The syntax of local compaction constraints is given below:
<local-compaction> ::= compacts(<index>)
The variable <index> stands for any integer from 0 to the number of the daughters in the tecto-rule in which the constraint appears. If <index> is set to zero, it means that the mother node should be a contiguous string. For example:
(np, person:P, number:N, gender:G, case:C) *--> [(det, number:N, gender:G, case:C), (nbar, person:P, number:N, gender:G, case:C), {compacts(0)}].
means that an NP consists of a determiner and an NBar and that the mother NP is contiguous.
Note that the argument of a local compaction constraint is not a list but a single index.
Sometimes it is necessary to make sure that certain features do not have certain values and sometimes one may need to manipulate symbols or make sure that some other constraints apply. In such cases, it is possible to embed some Prolog code inside the tecto-rules as in DCG's. For example, if you want to make sure that a tenseless verb is not combined with an NP to make a sentence, you can use the following tecto-rule:
s *--> [(np, person:P, number:N, case:nom), (vp, person:P, number:N, form:F), { get_type(F,FType), FType\==inf, FType\==pastp, FType\==presp }].
This rule states that a sentence contains an NP and a VP who share the same person and number features and that the form of the verb (F) is not infinitival, past-participial or present-participial.