The syntax of the file is based on that of Constraint Grammars [KVHA95], but simplified in many aspects, and modified to include weighted constraints.
An initial file based on statistical constraints may be generated from a tagged corpus using the src/utilities/train-relax.perl script provided with FreeLing. Later, hand written constraints can be added to the file to improve the tagger behaviour.
The file consists of a serie of context constraits, each of the form: weight label context;
Where:
<comer>
, VMIP3S0<comer>
, VMI*<comer>
will match any
word analysis with those tag/prefix and lemma.
Conditions may be negated using the token not, i.e. (not pos terms)
Where:
<comer>
, VMIP3S0<comer>
, VMI*<comer>
will match any word analysis with those tag/prefix and
lemma.
Examples:
The next constraint states a high incompatibility for a word
being a definite determiner (DA*) if the next word is a personal form
of a verb (VMI*):
-8.143 DA* (1 VMI*);
The next constraint states a very high compatibility for the
word mucho (much) being an indefinite determiner (DI*)
-and thus not being a pronoun or an adverb, or any
other analysis it may have- if the following word is a noun (NC*):
60.0 DI* (mucho) (1 NC*);
The next constraint states a positive compatibility value for
a word being a noun (NC*) if somewhere to its left
there is a determiner or an adjective (DA* or AQ*), and
between them there is not any other noun:
5.0 NC* (-1* DA* or AQ* barrier NC*);
The next constraint adds some positive compatibility to a
3rd person personal pronoun being of undefined gender and
number (PP3CNA00) if it has the possibility of being
masculine singular (PP3MSA00), the next word may have
lemma estar (to be), and the sencond word to the right
is not a gerund (VMG). This rule is intended to solve the
different behaviour of the Spanish word lo in sentences
such as si, lo estoy or lo estoy viendo.
0.5 PP3CNA00 (0 PP3MSA00) (1 <estar>
) (not 2 VMG*);