This file contains a set of heuristic rules to perform dependency parsing.
The file consists of three sections:
sections: <GRPAR>
, <GRLAB>
, and <VCLASS>
,
respectively closed by tags </GRPAR>
, </GRLAB>
, and </VCLASS>
.
<GRPAR>
contains rules to complete the
partial parsing provided by the chart parser. The tree is
completed by combining chunk pairs as stated by the rules. Rules
are applied from highest priority (lower values) to lowest
priority (higher values), and left-to rigth.
That is, the pair of adjacent chunks matching the most prioritary
rule is found, and the rule is applied, joining both chunks in
one. The process is repeated until only one chunk is left.
Each line contains a rule, with the format:
ancestor-label descendant-label label operation prioritywhere:
ancestor-label
and descendant-label
are the
syntactic labels (either assigned by the chunk parser, or a
new label
created by some other completion rule) of two
consecutive nodes in the tree.
label
has two meanings, depending on the
operation
field value.
For top_left
and top_right
operations, it states the
label with with the root node of the resulting tree must be
relabelled (``-'' means no relabelling).
For last_left
and last_right
operations, it states the
label that the node to be considered ``last'' must have to get the
subtree as a new child. If no node with this label is found, the
subtree is attached as a new child to the root node.
operation
is the way in which ancestor-label
and descendant-label
nodes are to be combined.
priority
is the priority value of the rule (low
values mean high priority and viceversa).
For instance, the rule:
np pp - top_left 20
states that if two subtrees labelled np
and pp
are
found contiguous in the partial tree, the later is added as a new
child of the former.
The supported tree-building operations are the following:
top_left
: The right subtree is added as a daughter of the
left subtree. The root of the new tree is the root of the left
subtree. If a label
value other than ``-'' is specified,
the root is relabelled with that string.
last_left
: The right subtree is added as a daughter of
the last node inside the left subtree matching label
value
(or to the root if none is found). The root of the new tree is the
root of the left subtree.
top_right
: The left subtree is added as a new daughter
of the right subtree. The root of the new tree is the root of the
right subtree. If a label
value other than ``-'' is
specified, the root is relabelled with that string.
last_right
: The left subtree is added as a daughter of the
last node inside the right subtree matching label
value
(or to the root if none is found). The root of the new
tree is the root of the right subtree.
<GRLAB>
contains rules to label the
dependences extracted from the full parse tree build with the
rules in previous section:
Each line contains a rule, with the format:
ancestor-label dependence-label condition1 condition2 ...
where:
ancestor-label
is the label of the node which is
head of the dependence.
dependence-label
is the label to be assigned to the dependence
condition
is a list of conditions that the dependence
has to match to satisfy the rule.
Each condition
has one of the forms:
attribute = value attribute != value
The supported attributes are the following, although this list may be enlarged at will by the client application:
d.label
: label of the daughter node.
d.side
: (left or right) position of the daughter node
with respect to the head (parent node).
d.lemma
: lemma of daughter node
p.class
: word class (see below) of lemma of parent node.
p.lemma
: lemma of parent node.
For instance, the rule:
verb-phr subj d.label=np* d.side=leftstates that if a
verb-phr
node has a daughter to its left, with a label
starting by np
, this dependence is to be labeled as subj
.
<CLASS>
contains class definitions which may
be used as attributes in the dependency labelling rules.
Each line contains a class assignation for a lemma, with format:
class-name lemma comments
For instance, the following lines assign to the class mov
the four listed verbs. Everything to the right of the second field
is considered a comment and ignored.
mov go prep= to,towards mov come prep= from mov walk prep= through mov run prep= to,towards D.O.