gate.corpora
Class CorpusImpl
java.lang.Object
|
+--gate.util.AbstractFeatureBearer
|
+--gate.creole.AbstractResource
|
+--gate.creole.AbstractLanguageResource
|
+--gate.corpora.CorpusImpl
- All Implemented Interfaces:
- Collection, Corpus, CreoleListener, EventListener, FeatureBearer, LanguageResource, List, NameBearer, Resource, Serializable
- Direct Known Subclasses:
- DatabaseCorpusImpl
- public class CorpusImpl
- extends AbstractLanguageResource
- implements Corpus, CreoleListener
Corpora are sets of Document. They are ordered by lexicographic collation
on Url.
- See Also:
- Serialized Form
Inner Class Summary |
protected class |
CorpusImpl.VerboseList
A proxy list that stores the actual data in an internal list and forwards
all operations to that one but it also fires the appropiate corpus events
when necessary. |
Method Summary |
void |
add(int index,
Object element)
|
boolean |
add(Object o)
|
boolean |
addAll(Collection c)
|
boolean |
addAll(int index,
Collection c)
|
void |
addCorpusListener(CorpusListener l)
Registers a new CorpusListener with this corpus. |
void |
cleanup()
Construction |
void |
clear()
|
protected void |
clearDocList()
|
boolean |
contains(Object o)
|
boolean |
containsAll(Collection c)
|
void |
datastoreClosed(CreoleEvent e)
Called when a DataStore has been closed |
void |
datastoreCreated(CreoleEvent e)
Called when a DataStore has been created |
void |
datastoreOpened(CreoleEvent e)
Called when a DataStore has been opened |
boolean |
equals(Object o)
|
protected void |
fireDocumentAdded(CorpusEvent e)
|
protected void |
fireDocumentRemoved(CorpusEvent e)
|
Object |
get(int index)
|
String |
getDocumentName(int index)
Gets the name of a document in this corpus. |
List |
getDocumentNames()
Gets the names of the documents in this corpus. |
List |
getDocumentsList()
|
int |
hashCode()
|
int |
indexOf(Object o)
|
Resource |
init()
Initialise this resource, and return it. |
boolean |
isDocumentLoaded(int index)
This method returns true when the document is already loaded in memory |
boolean |
isEmpty()
|
Iterator |
iterator()
|
int |
lastIndexOf(Object o)
|
ListIterator |
listIterator()
|
ListIterator |
listIterator(int index)
|
static void |
populate(Corpus corpus,
URL directory,
FileFilter filter,
String encoding,
boolean recurseDirectories)
Fills the provided corpus with documents created on the fly from selected
files in a directory. |
void |
populate(URL directory,
FileFilter filter,
String encoding,
boolean recurseDirectories)
Fills this corpus with documents created from files in a directory. |
Object |
remove(int index)
|
boolean |
remove(Object o)
|
boolean |
removeAll(Collection c)
|
void |
removeCorpusListener(CorpusListener l)
Removes one of the listeners registered with this corpus. |
void |
resourceLoaded(CreoleEvent e)
Called when a new Resource has been loaded into the system |
void |
resourceUnloaded(CreoleEvent e)
Called when a Resource has been removed from the system |
boolean |
retainAll(Collection c)
|
Object |
set(int index,
Object element)
|
void |
setDocumentsList(List documentsList)
|
int |
size()
|
List |
subList(int fromIndex,
int toIndex)
|
Object[] |
toArray()
|
Object[] |
toArray(Object[] a)
|
void |
unloadDocument(Document doc)
This method does not make sense for transient corpora, so it does
nothing. |
Methods inherited from class gate.creole.AbstractResource |
checkParameterValues, getName, getParameterValue, getParameterValue, removeResourceListeners, setName, setParameterValue, setParameterValue, setParameterValues, setParameterValues, setResourceListeners |
Methods inherited from class java.lang.Object |
, clone, finalize, getClass, notify, notifyAll, registerNatives, toString, wait, wait, wait |
DEBUG
private static final boolean DEBUG
- Debug flag
supportList
protected List supportList
- The underlying list that holds the documents in this corpus.
serialVersionUID
static final long serialVersionUID
- Freeze the serialization UID.
corpusListeners
private transient Vector corpusListeners
documentsList
protected transient List documentsList
CorpusImpl
public CorpusImpl()
getDocumentNames
public List getDocumentNames()
- Gets the names of the documents in this corpus.
- Specified by:
getDocumentNames
in interface Corpus
- Returns:
- a
CorpusImpl.VerboseList
of Strings representing the names of the documents
in this corpus.
getDocumentName
public String getDocumentName(int index)
- Gets the name of a document in this corpus.
- Specified by:
getDocumentName
in interface Corpus
- Parameters:
index
- the index of the document- Returns:
- a String value representing the name of the document at
index in this corpus.
unloadDocument
public void unloadDocument(Document doc)
- This method does not make sense for transient corpora, so it does
nothing.
- Specified by:
unloadDocument
in interface Corpus
- Following copied from interface:
gate.Corpus
- Parameters:
Document
- to be unloaded from memory.- Returns:
- void.
isDocumentLoaded
public boolean isDocumentLoaded(int index)
- This method returns true when the document is already loaded in memory
- Specified by:
isDocumentLoaded
in interface Corpus
clearDocList
protected void clearDocList()
size
public int size()
- Specified by:
size
in interface List
isEmpty
public boolean isEmpty()
- Specified by:
isEmpty
in interface List
contains
public boolean contains(Object o)
- Specified by:
contains
in interface List
iterator
public Iterator iterator()
- Specified by:
iterator
in interface List
toArray
public Object[] toArray()
- Specified by:
toArray
in interface List
toArray
public Object[] toArray(Object[] a)
- Specified by:
toArray
in interface List
add
public boolean add(Object o)
- Specified by:
add
in interface List
remove
public boolean remove(Object o)
- Specified by:
remove
in interface List
containsAll
public boolean containsAll(Collection c)
- Specified by:
containsAll
in interface List
addAll
public boolean addAll(Collection c)
- Specified by:
addAll
in interface List
addAll
public boolean addAll(int index,
Collection c)
- Specified by:
addAll
in interface List
removeAll
public boolean removeAll(Collection c)
- Specified by:
removeAll
in interface List
retainAll
public boolean retainAll(Collection c)
- Specified by:
retainAll
in interface List
clear
public void clear()
- Specified by:
clear
in interface List
equals
public boolean equals(Object o)
- Specified by:
equals
in interface List
- Overrides:
equals
in class Object
hashCode
public int hashCode()
- Specified by:
hashCode
in interface List
- Overrides:
hashCode
in class Object
get
public Object get(int index)
- Specified by:
get
in interface List
set
public Object set(int index,
Object element)
- Specified by:
set
in interface List
add
public void add(int index,
Object element)
- Specified by:
add
in interface List
remove
public Object remove(int index)
- Specified by:
remove
in interface List
indexOf
public int indexOf(Object o)
- Specified by:
indexOf
in interface List
lastIndexOf
public int lastIndexOf(Object o)
- Specified by:
lastIndexOf
in interface List
listIterator
public ListIterator listIterator()
- Specified by:
listIterator
in interface List
listIterator
public ListIterator listIterator(int index)
- Specified by:
listIterator
in interface List
subList
public List subList(int fromIndex,
int toIndex)
- Specified by:
subList
in interface List
cleanup
public void cleanup()
- Construction
- Specified by:
cleanup
in interface Resource
- Overrides:
cleanup
in class AbstractLanguageResource
init
public Resource init()
- Initialise this resource, and return it.
- Specified by:
init
in interface Resource
- Overrides:
init
in class AbstractResource
populate
public static void populate(Corpus corpus,
URL directory,
FileFilter filter,
String encoding,
boolean recurseDirectories)
throws IOException,
ResourceInstantiationException
- Fills the provided corpus with documents created on the fly from selected
files in a directory. Uses a link {@FileFilter} to select which files will
be used and which will be ignored.
A simple file filter based on extensions is provided in the Gate
distribution ({@link gate.util.ExtensionFileFilter}).
- Parameters:
corpus
- the corpus to be populateddirectory
- the directory from which the files will be picked. This
parameter is an URL for uniformity. It needs to be a URL of type file
otherwise an InvalidArgumentException will be thrown.filter
- the file filter used to select files from the target
directory. If the filter is null all the files will be accepted.encoding
- the encoding to be used for reading the documentsrecurseDirectories
- should the directory be parsed recursively?. If
true all the files from the provided directory and all its
children directories (on as many levels as necessary) will be picked if
accepted by the filter otherwise the children directories will be ignored.
populate
public void populate(URL directory,
FileFilter filter,
String encoding,
boolean recurseDirectories)
throws IOException,
ResourceInstantiationException
- Fills this corpus with documents created from files in a directory.
- Specified by:
populate
in interface Corpus
- Parameters:
filter
- the file filter used to select files from the target
directory. If the filter is null all the files will be accepted.directory
- the directory from which the files will be picked. This
parameter is an URL for uniformity. It needs to be a URL of type file
otherwise an InvalidArgumentException will be thrown.
An implementation for this method is provided as a static method at
gate.corpora.CorpusImpl#populate(Corpus,URL,FileFilter,boolean)
.encoding
- the encoding to be used for reading the documentsrecurseDirectories
- should the directory be parsed recursively?. If
true all the files from the provided directory and all its
children directories (on as many levels as necessary) will be picked if
accepted by the filter otherwise the children directories will be ignored.
removeCorpusListener
public void removeCorpusListener(CorpusListener l)
- Description copied from interface:
Corpus
- Removes one of the listeners registered with this corpus.
- Specified by:
removeCorpusListener
in interface Corpus
- Following copied from interface:
gate.Corpus
- Parameters:
l
- the listener to be removed.
addCorpusListener
public void addCorpusListener(CorpusListener l)
- Description copied from interface:
Corpus
- Registers a new
CorpusListener
with this corpus.
- Specified by:
addCorpusListener
in interface Corpus
- Following copied from interface:
gate.Corpus
- Parameters:
l
- the listener to be added.
fireDocumentAdded
protected void fireDocumentAdded(CorpusEvent e)
fireDocumentRemoved
protected void fireDocumentRemoved(CorpusEvent e)
setDocumentsList
public void setDocumentsList(List documentsList)
getDocumentsList
public List getDocumentsList()
resourceLoaded
public void resourceLoaded(CreoleEvent e)
- Description copied from interface:
CreoleListener
- Called when a new
Resource
has been loaded into the system
- Specified by:
resourceLoaded
in interface CreoleListener
resourceUnloaded
public void resourceUnloaded(CreoleEvent e)
- Description copied from interface:
CreoleListener
- Called when a
Resource
has been removed from the system
- Specified by:
resourceUnloaded
in interface CreoleListener
datastoreOpened
public void datastoreOpened(CreoleEvent e)
- Description copied from interface:
CreoleListener
- Called when a
DataStore
has been opened
- Specified by:
datastoreOpened
in interface CreoleListener
datastoreCreated
public void datastoreCreated(CreoleEvent e)
- Description copied from interface:
CreoleListener
- Called when a
DataStore
has been created
- Specified by:
datastoreCreated
in interface CreoleListener
datastoreClosed
public void datastoreClosed(CreoleEvent e)
- Description copied from interface:
CreoleListener
- Called when a
DataStore
has been closed
- Specified by:
datastoreClosed
in interface CreoleListener