Next: Dependency parser heuristic rules
Up: Using the sample main
Previous: Relaxation Labelling constraint grammar
Contents
Chart parser CFG file
This file contains a CFG grammar for the chart parser, and some
directives to control which chart edges are selected to build the
final tree.
Comments may be introduced in the file, starting with ``%'', the
comment will finish at the end of the line.
Grammar rules have the form: x ==> y, A, B.
That is, the head of the rule is a non-terminal specified at the
left hand side of the arrow symbol. The body of the rule is a
sequence of terminals ans nonterminals separated with commas and
ended with a dot.
Empty rules are not allowed, since they dramatically slow chart
parsers. Nevertheless, any grammar may be written without empty
rules (assuming you are not going to accept empty sentences).
Rules with the same head may be or'ed using the bar symbol,
as in: x ==> A, y | B, C.
The grammar is case-sensitive, so make sure to write your terminals
(PoS tags) exactly as they are output by the tagger. Also, make
sure that you capitalize your non-terminals in the same way
everywhere they appear.
Terminals are PoS tags, but some variations are allowed for
flexibility:
- Plain tag: A terminal may be a plain complete PoS tag,
e.g. VMIP3S0
- Wildcarding: A terminal may be a PoS tag prefix,
right-wilcarded, e.g. VMI*, VMIP*.
- Specifying lemma: A terminal may be a PoS tag (or a
wilcarded prefix) with a lemma enclosed in angle brackets,
e.g
VMIP3S0<comer>
, VMI*<comer>
will match only
words with those tag/prefix and lemma.
- Specifying form: A terminal may be a PoS tag (or a
wilcarded prefix) with a form enclosed in parenthesis,
e.g VMIP3S0(comió), VMI*(comió) will match only
words with those tag/prefix and form.
The grammar file may contain also some directives to help
the parser decide which chart edges must be selected to build the
tree.
Directive commands start with the directive name (always prefixed
with ``@''), followed by one or more non-terminal symbols,
separated with spaces. The list must end with a dot.
- @NOTOP Non-terminal symbols listed under this
directive will not be considered as valid tree roots, even if
they cover the complete sentence.
- @START Specify which is the start symbol of the
grammar. Exactly one non-terminal must be specified under this
directive.
The parser will attempt to build a tree with this symbol as a
root. If the result of the parsing is not a complete tree, or
no valid root nodes are found, a fictitious root node is
created with this label.
- @FLAT Subtrees for "flat" non-terminal symbols are flattened when
the symbol is recursive. Only the highest occurrence appears
in the final parse tree.
- @HIDEN Non-teminal symbols specified under this
directive will not appear in the final parse tree (their
descendant nodes will be attached to their parent).
Next: Dependency parser heuristic rules
Up: Using the sample main
Previous: Relaxation Labelling constraint grammar
Contents
2006-04-26