gate
Interface Corpus
- All Superinterfaces:
- Collection, FeatureBearer, LanguageResource, List, NameBearer, Resource, Serializable
- All Known Implementing Classes:
- SerialCorpusImpl, CorpusImpl
- public interface Corpus
- extends LanguageResource, List, NameBearer
Corpora are lists of Document. TIPSTER equivalent: Collection.
Methods inherited from interface java.util.List |
add, add, addAll, addAll, clear, contains, containsAll, equals, get, hashCode, indexOf, isEmpty, iterator, lastIndexOf, listIterator, listIterator, remove, remove, removeAll, retainAll, set, size, subList, toArray, toArray |
getDocumentNames
public List getDocumentNames()
- Gets the names of the documents in this corpus.
- Returns:
- a
List
of Strings representing the names of the documents
in this corpus.
getDocumentName
public String getDocumentName(int index)
- Gets the name of a document in this corpus.
- Parameters:
index
- the index of the document- Returns:
- a String value representing the name of the document at
index in this corpus.
unloadDocument
public void unloadDocument(Document doc)
- Unloads the document from memory. Only needed if memory
preservation is an issue. Only supported for Corpus which is
stored in a Datastore. To get this document back in memory,
use get() on Corpus or if you have its persistent ID, request it
from the Factory.
Transient Corpus objects do nothing,
because there would be no way to get the document back
again afterwards.
- Parameters:
Document
- to be unloaded from memory.- Returns:
- void.
populate
public void populate(URL directory,
FileFilter filter,
String encoding,
boolean recurseDirectories)
throws IOException,
ResourceInstantiationException
- Fills this corpus with documents created on the fly from selected files in
a directory. Uses a link {@FileFilter} to select which files will be used
and which will be ignored.
A simple file filter based on extensions is provided in the Gate
distribution ({@link gate.util.ExtensionFileFilter}).
- Parameters:
directory
- the directory from which the files will be picked. This
parameter is an URL for uniformity. It needs to be a URL of type file
otherwise an InvalidArgumentException will be thrown.
An implementation for this method is provided as a static method at
gate.corpora.CorpusImpl#populate(Corpus,URL,FileFilter,boolean)
.filter
- the file filter used to select files from the target
directory. If the filter is null all the files will be accepted.encoding
- the encoding to be used for reading the documentsrecurseDirectories
- should the directory be parsed recursively?. If
true all the files from the provided directory and all its
children directories (on as many levels as necessary) will be picked if
accepted by the filter otherwise the children directories will be ignored.
isDocumentLoaded
public boolean isDocumentLoaded(int index)
- This method returns true when the document is already loaded in memory.
The transient corpora will always return true as they can only contain
documents that are present in the memory.
removeCorpusListener
public void removeCorpusListener(CorpusListener l)
- Removes one of the listeners registered with this corpus.
- Parameters:
l
- the listener to be removed.
addCorpusListener
public void addCorpusListener(CorpusListener l)
- Registers a new
CorpusListener
with this corpus.
- Parameters:
l
- the listener to be added.