next up previous contents
Next: Sense dictionary file Up: Using the sample main Previous: Named entity recognition data   Contents


Named entity classification data files

The Named Entity Classification module requires three configuration files, with the same path and name, with suffixes .rgf, .lex, and .abm. Only the basename must be given as a configuration option, suffixes are automatically added.

The .abm file contains an AdaBoost model based on shallow Decision Trees (see [CMP03] for details). You don't need to understand this, unless you want to enter inot the code of the AdaBoost classifier.

The .lex file is a dictionary that assigns a number to each symbolic feature used in the AdaBoost model. You don't need to understand this either unless you are a Machine Learning hacker..

Both .abm and .lex files may be generated from an annotated corpus using the programs in src/utilities/nec

The important file in the set is the .rgf file. This contains a definition of the context features that must be extracted for each named entity. The feature extraction language is that of [RCSY04] with some useful extensions.

If you need to know more about this (e.g. to develop a NE classifier for your language) please contact FreeLing authors.


next up previous contents
Next: Sense dictionary file Up: Using the sample main Previous: Named entity recognition data   Contents
2006-04-26