gate.creole.gazetteer
Class DefaultGazetteer
java.lang.Object
|
+--gate.util.AbstractFeatureBearer
|
+--gate.creole.AbstractResource
|
+--gate.creole.AbstractProcessingResource
|
+--gate.creole.AbstractLanguageAnalyser
|
+--gate.creole.gazetteer.AbstractGazetteer
|
+--gate.creole.gazetteer.DefaultGazetteer
- All Implemented Interfaces:
- ANNIEConstants, Executable, FeatureBearer, Gazetteer, LanguageAnalyser, NameBearer, ProcessingResource, Resource, Serializable
- public class DefaultGazetteer
- extends AbstractGazetteer
This component is responsible for doing lists lookup. The implementaion is
based on finite state machines.
The phrases to be recognised should be listed in a set of files, one for
each type of occurences.
The gazeteer is build with the information from a file that contains the set
of lists (which are files as well) and the associated type for each list.
The file defining the set of lists should have the following syntax:
each list definition should be written on its own line and should contain:
- the file name (required)
- the major type (required)
- the minor type (optional)
- the language(s) (optional)
The elements of each definition are separated by ":".
The following is an example of a valid definition:
personmale.lst:person:male:english
Each list file named in the lists definition file is just a list containing
one entry per line.
When this gazetter will be run over some input text (a Gate document) it
will generate annotations of type Lookup having the attributes specified in
the definition file.
- See Also:
- Serialized Form
Fields inherited from interface gate.creole.ANNIEConstants |
ANNOTATION_COREF_FEATURE_NAME, DATE_ANNOTATION_TYPE, DOCUMENT_COREF_FEATURE_NAME, LOCATION_ANNOTATION_TYPE, LOOKUP_ANNOTATION_TYPE, LOOKUP_CLASS_FEATURE_NAME, LOOKUP_MAJOR_TYPE_FEATURE_NAME, LOOKUP_MINOR_TYPE_FEATURE_NAME, LOOKUP_ONTOLOGY_FEATURE_NAME, MONEY_ANNOTATION_TYPE, ORGANIZATION_ANNOTATION_TYPE, PERSON_ANNOTATION_TYPE, PERSON_GENDER_FEATURE_NAME, PR_NAMES, SENTENCE_ANNOTATION_TYPE, SPACE_TOKEN_ANNOTATION_TYPE, TOKEN_ANNOTATION_TYPE, TOKEN_CATEGORY_FEATURE_NAME, TOKEN_KIND_FEATURE_NAME, TOKEN_LENGTH_FEATURE_NAME, TOKEN_ORTH_FEATURE_NAME, TOKEN_STRING_FEATURE_NAME |
Constructor Summary |
DefaultGazetteer()
Build a gazetter using the default lists from the agte resources
{@see init()} |
Method Summary |
boolean |
add(String singleItem,
Lookup lookup)
Adds a new string to the gazetteer |
void |
addLookup(String text,
Lookup lookup)
Adds one phrase to the list of phrases recognised by this gazetteer |
void |
execute()
This method runs the gazetteer. |
String |
getFSMgml()
Returns a string representation of the deterministic FSM graph using
GML. |
Resource |
init()
Does the actual loading and parsing of the lists. |
Set |
lookup(String singleItem)
lookup
|
boolean |
remove(String singleItem)
Removes a string from the gazetteer |
void |
removeLookup(String text,
Lookup lookup)
Removes one phrase to the list of phrases recognised by this gazetteer |
Methods inherited from class gate.creole.gazetteer.AbstractGazetteer |
addGazetteerListener, fireGazetteerEvent, getAnnotationSetName, getCaseSensitive, getEncoding, getFeatures, getLinearDefinition, getListsURL, getMappingDefinition, reInit, setAnnotationSetName, setCaseSensitive, setEncoding, setFeatures, setListsURL, setMappingDefinition |
Methods inherited from class gate.creole.AbstractResource |
checkParameterValues, getName, getParameterValue, getParameterValue, removeResourceListeners, setName, setParameterValue, setParameterValue, setParameterValues, setParameterValues, setResourceListeners |
DEF_GAZ_DOCUMENT_PARAMETER_NAME
public static final String DEF_GAZ_DOCUMENT_PARAMETER_NAME
DEF_GAZ_ANNOT_SET_PARAMETER_NAME
public static final String DEF_GAZ_ANNOT_SET_PARAMETER_NAME
DEF_GAZ_LISTS_URL_PARAMETER_NAME
public static final String DEF_GAZ_LISTS_URL_PARAMETER_NAME
DEF_GAZ_ENCODING_PARAMETER_NAME
public static final String DEF_GAZ_ENCODING_PARAMETER_NAME
DEF_GAZ_CASE_SENSITIVE_PARAMETER_NAME
public static final String DEF_GAZ_CASE_SENSITIVE_PARAMETER_NAME
DefaultGazetteer
public DefaultGazetteer()
- Build a gazetter using the default lists from the agte resources
{@see init()}
init
public Resource init()
throws ResourceInstantiationException
- Does the actual loading and parsing of the lists. This method must be
called before the gazetteer can be used
- Overrides:
init
in class AbstractProcessingResource
addLookup
public void addLookup(String text,
Lookup lookup)
- Adds one phrase to the list of phrases recognised by this gazetteer
- Parameters:
text
- the phrase to be addedlookup
- the description of the annotation to be added when this
phrase is recognised
removeLookup
public void removeLookup(String text,
Lookup lookup)
- Removes one phrase to the list of phrases recognised by this gazetteer
- Parameters:
text
- the phrase to be removedlookup
- the description of the annotation associated to this phrase
getFSMgml
public String getFSMgml()
- Returns a string representation of the deterministic FSM graph using
GML.
execute
public void execute()
throws ExecutionException
- This method runs the gazetteer. It assumes that all the needed parameters
are set. If they are not, an exception will be fired.
- Overrides:
execute
in class AbstractProcessingResource
lookup
public Set lookup(String singleItem)
- lookup
- Parameters:
singleItem
- a single string to be looked up by the gazetteer- Returns:
- set of the Lookups associated with the parameter
remove
public boolean remove(String singleItem)
- Description copied from interface:
Gazetteer
- Removes a string from the gazetteer
- Following copied from interface:
gate.creole.gazetteer.Gazetteer
- Parameters:
singleItem
- - Returns:
- true if the operation was successful
add
public boolean add(String singleItem,
Lookup lookup)
- Description copied from interface:
Gazetteer
- Adds a new string to the gazetteer
- Following copied from interface:
gate.creole.gazetteer.Gazetteer
- Parameters:
singleItem
- lookup
- the lookup to be associated with the new string- Returns:
- true if the operation was successful