gate.creole.gazetteer
Class DefaultGazetteer

java.lang.Object
  |
  +--gate.util.AbstractFeatureBearer
        |
        +--gate.creole.AbstractResource
              |
              +--gate.creole.AbstractProcessingResource
                    |
                    +--gate.creole.gazetteer.DefaultGazetteer
All Implemented Interfaces:
FeatureBearer, ProcessingResource, Resource, Runnable, Serializable

public class DefaultGazetteer
extends AbstractProcessingResource
implements ProcessingResource

This component is responsible for doing lists lookup. The implementaion is based on finite state machines. The phrases to be recognised should be listed in a set of files, one for each type of occurences. The gazeteer is build with the information from a file that contains the set of lists (which are files as well) and the associated type for each list. The file defining the set of lists should have the following syntax: each list definition should be written on its own line and should contain:

  1. the file name (required)
  2. the major type (required)
  3. the minor type (optional)
  4. the language(s) (optional)
The elements of each definition are separated by ":". The following is an example of a valid definition:
personmale.lst:person:male:english Each list file named in the lists definition file is just a list containing one entry per line. When this gazetter will be run over some input text (a Gate document) it will generate annotations of type Lookup having the attributes specified in the definition file.

See Also:
Serialized Form

Field Summary
protected  String annotationSetName
          Used to store the annotation set currently being used for the newly generated annotations
private  Boolean caseSensitive
          Should this gazetteer be case sensitive.
private static boolean DEBUG
          Debug flag
protected  Document document
          Used to store the document currently being parsed
private  String encoding
           
protected  FeatureMap features
           
(package private)  Set fsmStates
          A set containing all the states of the FSM backing the gazetteer
(package private)  FSMState initialState
          The initial state of the FSM that backs this gazetteer
private  URL listsURL
          The value of this property is the URL that will be used for reading the lists dtaht define this Gazetteer
private  Vector progressListeners
           
private  Vector statusListeners
           
 
Fields inherited from class gate.creole.AbstractProcessingResource
executionException
 
Fields inherited from class gate.creole.AbstractResource
serialVersionUID
 
Constructor Summary
DefaultGazetteer()
          Build a gazetter using the default lists from the agte resources {@see init()}
 
Method Summary
 void addLookup(String text, Lookup lookup)
          Adds one phrase to the list of phrases recognised by this gazetteer
 void addProgressListener(ProgressListener l)
           
 void addStatusListener(StatusListener l)
           
protected  void fireProcessFinished()
           
protected  void fireProgressChanged(int e)
           
protected  void fireStatusChanged(String e)
           
 Boolean getCaseSensitive()
           
 String getEncoding()
           
 FeatureMap getFeatures()
          Get the feature set
 String getFSMgml()
          Returns a string representation of the deterministic FSM graph using GML.
 URL getListsURL()
           
 Resource init()
          Does the actual loading and parsing of the lists.
(package private)  void readList(String listDesc, boolean add)
          Reads one lists (one file) of phrases
 void removeLookup(String text, Lookup lookup)
          Removes one phrase to the list of phrases recognised by this gazetteer
 void removeProgressListener(ProgressListener l)
           
 void removeStatusListener(StatusListener l)
           
 void reset()
          Resets this resource preparing it for a new run
 void run()
          This method runs the gazetteer.
 void setAnnotationSetName(String newAnnotationSetName)
          Sets the AnnotationSet that will be used at the next run for the newly produced annotations.
 void setCaseSensitive(Boolean newCaseSensitive)
           
 void setDocument(Document newDocument)
          Sets the document to be processed by the next run
 void setEncoding(String newEncoding)
           
 void setFeatures(FeatureMap features)
          Set the feature set
 void setListsURL(URL newListsURL)
           
 
Methods inherited from class gate.creole.AbstractProcessingResource
check, reInit
 
Methods inherited from class gate.creole.AbstractResource
getName, setName
 
Methods inherited from class java.lang.Object
, clone, equals, finalize, getClass, hashCode, notify, notifyAll, registerNatives, toString, wait, wait, wait
 
Methods inherited from interface gate.ProcessingResource
check, reInit
 
Methods inherited from interface gate.util.FeatureBearer
getName, setName
 

Field Detail

DEBUG

private static final boolean DEBUG
Debug flag

initialState

FSMState initialState
The initial state of the FSM that backs this gazetteer

fsmStates

Set fsmStates
A set containing all the states of the FSM backing the gazetteer

features

protected FeatureMap features

document

protected Document document
Used to store the document currently being parsed

annotationSetName

protected String annotationSetName
Used to store the annotation set currently being used for the newly generated annotations

progressListeners

private transient Vector progressListeners

statusListeners

private transient Vector statusListeners

encoding

private String encoding

listsURL

private URL listsURL
The value of this property is the URL that will be used for reading the lists dtaht define this Gazetteer

caseSensitive

private Boolean caseSensitive
Should this gazetteer be case sensitive. The default value is true.
Constructor Detail

DefaultGazetteer

public DefaultGazetteer()
Build a gazetter using the default lists from the agte resources {@see init()}
Method Detail

init

public Resource init()
              throws ResourceInstantiationException
Does the actual loading and parsing of the lists. This method must be called before the gazetteer can be used
Specified by:
init in interface Resource
Overrides:
init in class AbstractProcessingResource

reset

public void reset()
Resets this resource preparing it for a new run

readList

void readList(String listDesc,
              boolean add)
        throws FileNotFoundException,
               IOException,
               GazetteerException
Reads one lists (one file) of phrases
Parameters:
listDesc - the line from the definition file
add -  

addLookup

public void addLookup(String text,
                      Lookup lookup)
Adds one phrase to the list of phrases recognised by this gazetteer
Parameters:
text - the phrase to be added
lookup - the description of the annotation to be added when this phrase is recognised

removeLookup

public void removeLookup(String text,
                         Lookup lookup)
Removes one phrase to the list of phrases recognised by this gazetteer
Parameters:
text - the phrase to be removed
lookup - the description of the annotation associated to this phrase

getFSMgml

public String getFSMgml()
Returns a string representation of the deterministic FSM graph using GML.

getFeatures

public FeatureMap getFeatures()
Description copied from interface: FeatureBearer
Get the feature set
Specified by:
getFeatures in interface FeatureBearer
Overrides:
getFeatures in class AbstractFeatureBearer

setFeatures

public void setFeatures(FeatureMap features)
Description copied from interface: FeatureBearer
Set the feature set
Specified by:
setFeatures in interface FeatureBearer
Overrides:
setFeatures in class AbstractFeatureBearer

run

public void run()
This method runs the gazetteer. It assumes that all the needed parameters are set. If they are not, an exception will be fired.
Specified by:
run in interface Runnable
Overrides:
run in class AbstractProcessingResource

setDocument

public void setDocument(Document newDocument)
Sets the document to be processed by the next run

setAnnotationSetName

public void setAnnotationSetName(String newAnnotationSetName)
Sets the AnnotationSet that will be used at the next run for the newly produced annotations.

removeProgressListener

public void removeProgressListener(ProgressListener l)

addProgressListener

public void addProgressListener(ProgressListener l)

fireProgressChanged

protected void fireProgressChanged(int e)

fireProcessFinished

protected void fireProcessFinished()

removeStatusListener

public void removeStatusListener(StatusListener l)

addStatusListener

public void addStatusListener(StatusListener l)

fireStatusChanged

protected void fireStatusChanged(String e)

setEncoding

public void setEncoding(String newEncoding)

getEncoding

public String getEncoding()

setListsURL

public void setListsURL(URL newListsURL)

getListsURL

public URL getListsURL()

setCaseSensitive

public void setCaseSensitive(Boolean newCaseSensitive)

getCaseSensitive

public Boolean getCaseSensitive()