|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--gate.util.AbstractFeatureBearer | +--gate.creole.AbstractResource | +--gate.creole.AbstractLanguageResource | +--gate.DocumentFormat | +--gate.corpora.TextualDocumentFormat
The format of Documents. Subclasses of DocumentFormat know about particular MIME types and how to unpack the information in any markup or formatting they contain into GATE annotations. Each MIME type has its own subclass of DocumentFormat, e.g. XmlDocumentFormat, RtfDocumentFormat, MpegDocumentFormat. These classes register themselves with a static index residing here when they are constructed. Static getDocumentFormat methods can then be used to get the appropriate format class for a particular document.
Field Summary | |
private static boolean |
DEBUG
Debug flag |
Fields inherited from class gate.DocumentFormat |
element2StringMap, isGateXmlDocument, magic2mimeTypeMap, markupElementsMap, mimeString2ClassHandlerMap, mimeString2mimeTypeMap, suffixes2mimeTypeMap |
Fields inherited from class gate.creole.AbstractLanguageResource |
dataStore, lrPersistentId |
Fields inherited from class gate.creole.AbstractResource |
name |
Constructor Summary | |
TextualDocumentFormat()
Default construction |
Method Summary | |
void |
annotateParagraphs(Document aDoc,
int startOffset,
int endOffset,
String annotSetName)
This method annotates paragraphs in a GATE document. |
DataStore |
getDataStore()
Get the data store that this LR lives in. |
Resource |
init()
Initialise this resource, and return it. |
private void |
removeExtraNewLine(Document doc)
Delete '\r' in combination CRLF or LFCR in document content |
protected void |
setNewLineProperty(Document doc)
Check the new line sequence and set document property. |
void |
unpackMarkup(Document doc)
Unpack the markup in the document. |
void |
unpackMarkup(Document doc,
RepositioningInfo repInfo,
RepositioningInfo ampCodingInfo)
|
Methods inherited from class gate.creole.AbstractLanguageResource |
cleanup, getLRPersistenceId, getParent, isModified, setDataStore, setLRPersistenceId, setParent, sync |
Methods inherited from class gate.creole.AbstractResource |
checkParameterValues, getName, getParameterValue, getParameterValue, removeResourceListeners, setName, setParameterValue, setParameterValue, setParameterValues, setParameterValues, setResourceListeners |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface gate.LanguageResource |
getLRPersistenceId, getParent, isModified, setDataStore, setLRPersistenceId, setParent, sync |
Methods inherited from interface gate.Resource |
cleanup, getParameterValue, setParameterValue, setParameterValues |
Methods inherited from interface gate.util.NameBearer |
getName, setName |
Field Detail |
private static final boolean DEBUG
Constructor Detail |
public TextualDocumentFormat()
Method Detail |
public Resource init() throws ResourceInstantiationException
init
in interface Resource
init
in class AbstractResource
ResourceInstantiationException
public void unpackMarkup(Document doc) throws DocumentFormatException
unpackMarkup
in class DocumentFormat
DocumentFormatException
public void unpackMarkup(Document doc, RepositioningInfo repInfo, RepositioningInfo ampCodingInfo) throws DocumentFormatException
unpackMarkup
in class DocumentFormat
DocumentFormatException
protected void setNewLineProperty(Document doc)
private void removeExtraNewLine(Document doc)
public void annotateParagraphs(Document aDoc, int startOffset, int endOffset, String annotSetName) throws DocumentFormatException
aDoc
- is the gate document on which the paragraph detection would
be performed.If it is null or its content it's null then the method woul
simply return doing nothing.startOffset
- is the index form the document content from which the
paragraph detection will startendOffset
- is the offset where the detection will end.annotSetName
- is the name of the set in which paragraph annotation
would be created.The annotation type created will be "paragraph"
DocumentFormatException
public DataStore getDataStore()
LanguageResource
getDataStore
in interface LanguageResource
getDataStore
in class AbstractLanguageResource
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |