gate.creole.gazetteer
Class DefaultGazetteer
java.lang.Object
|
+--gate.util.AbstractFeatureBearer
|
+--gate.creole.AbstractResource
|
+--gate.creole.AbstractProcessingResource
|
+--gate.creole.AbstractLanguageAnalyser
|
+--gate.creole.gazetteer.DefaultGazetteer
- All Implemented Interfaces:
- Executable, FeatureBearer, LanguageAnalyser, NameBearer, ProcessingResource, Resource, Serializable
- public class DefaultGazetteer
- extends AbstractLanguageAnalyser
- implements ProcessingResource
This component is responsible for doing lists lookup. The implementaion is
based on finite state machines.
The phrases to be recognised should be listed in a set of files, one for
each type of occurences.
The gazeteer is build with the information from a file that contains the set
of lists (which are files as well) and the associated type for each list.
The file defining the set of lists should have the following syntax:
each list definition should be written on its own line and should contain:
- the file name (required)
- the major type (required)
- the minor type (optional)
- the language(s) (optional)
The elements of each definition are separated by ":".
The following is an example of a valid definition:
personmale.lst:person:male:english
Each list file named in the lists definition file is just a list containing
one entry per line.
When this gazetter will be run over some input text (a Gate document) it
will generate annotations of type Lookup having the attributes specified in
the definition file.
- See Also:
- Serialized Form
Field Summary |
protected String |
annotationSetName
Used to store the annotation set currently being used for the newly
generated annotations |
private Boolean |
caseSensitive
Should this gazetteer be case sensitive. |
private static boolean |
DEBUG
Debug flag |
private String |
encoding
|
protected FeatureMap |
features
|
(package private) Set |
fsmStates
A set containing all the states of the FSM backing the gazetteer |
(package private) FSMState |
initialState
The initial state of the FSM that backs this gazetteer |
private URL |
listsURL
The value of this property is the URL that will be used for reading the
lists dtaht define this Gazetteer |
Constructor Summary |
DefaultGazetteer()
Build a gazetter using the default lists from the agte resources
{@see init()} |
Methods inherited from class gate.creole.AbstractProcessingResource |
addProgressListener, addStatusListener, cleanup, fireProcessFinished, fireProgressChanged, fireStatusChanged, interrupt, isInterrupted, reInit, removeProgressListener, removeStatusListener |
Methods inherited from class gate.creole.AbstractResource |
checkParameterValues, getName, getParameterValue, getParameterValue, removeResourceListeners, setName, setParameterValue, setParameterValue, setParameterValues, setParameterValues, setResourceListeners |
Methods inherited from class java.lang.Object |
, clone, equals, finalize, getClass, hashCode, notify, notifyAll, registerNatives, toString, wait, wait, wait |
DEBUG
private static final boolean DEBUG
- Debug flag
initialState
FSMState initialState
- The initial state of the FSM that backs this gazetteer
fsmStates
Set fsmStates
- A set containing all the states of the FSM backing the gazetteer
features
protected FeatureMap features
annotationSetName
protected String annotationSetName
- Used to store the annotation set currently being used for the newly
generated annotations
encoding
private String encoding
listsURL
private URL listsURL
- The value of this property is the URL that will be used for reading the
lists dtaht define this Gazetteer
caseSensitive
private Boolean caseSensitive
- Should this gazetteer be case sensitive. The default value is true.
DefaultGazetteer
public DefaultGazetteer()
- Build a gazetter using the default lists from the agte resources
{@see init()}
init
public Resource init()
throws ResourceInstantiationException
- Does the actual loading and parsing of the lists. This method must be
called before the gazetteer can be used
- Specified by:
init
in interface Resource
- Overrides:
init
in class AbstractProcessingResource
readList
void readList(String listDesc,
boolean add)
throws FileNotFoundException,
IOException,
GazetteerException
- Reads one lists (one file) of phrases
- Parameters:
listDesc
- the line from the definition fileadd
-
addLookup
public void addLookup(String text,
Lookup lookup)
- Adds one phrase to the list of phrases recognised by this gazetteer
- Parameters:
text
- the phrase to be addedlookup
- the description of the annotation to be added when this
phrase is recognised
removeLookup
public void removeLookup(String text,
Lookup lookup)
- Removes one phrase to the list of phrases recognised by this gazetteer
- Parameters:
text
- the phrase to be removedlookup
- the description of the annotation associated to this phrase
getFSMgml
public String getFSMgml()
- Returns a string representation of the deterministic FSM graph using
GML.
getFeatures
public FeatureMap getFeatures()
- Description copied from interface:
FeatureBearer
- Get the feature set
- Specified by:
getFeatures
in interface FeatureBearer
- Overrides:
getFeatures
in class AbstractFeatureBearer
setFeatures
public void setFeatures(FeatureMap features)
- Description copied from interface:
FeatureBearer
- Set the feature set
- Specified by:
setFeatures
in interface FeatureBearer
- Overrides:
setFeatures
in class AbstractFeatureBearer
execute
public void execute()
throws ExecutionException
- This method runs the gazetteer. It assumes that all the needed parameters
are set. If they are not, an exception will be fired.
- Specified by:
execute
in interface Executable
- Overrides:
execute
in class AbstractProcessingResource
setAnnotationSetName
public void setAnnotationSetName(String newAnnotationSetName)
- Sets the AnnotationSet that will be used at the next run for the newly
produced annotations.
setEncoding
public void setEncoding(String newEncoding)
getEncoding
public String getEncoding()
setListsURL
public void setListsURL(URL newListsURL)
getListsURL
public URL getListsURL()
setCaseSensitive
public void setCaseSensitive(Boolean newCaseSensitive)
getCaseSensitive
public Boolean getCaseSensitive()
getAnnotationSetName
public String getAnnotationSetName()