gate.corpora
Class TextualDocumentFormat

java.lang.Object
  |
  +--gate.util.AbstractFeatureBearer
        |
        +--gate.creole.AbstractResource
              |
              +--gate.creole.AbstractLanguageResource
                    |
                    +--gate.DocumentFormat
                          |
                          +--gate.corpora.TextualDocumentFormat
All Implemented Interfaces:
FeatureBearer, LanguageResource, Resource, Serializable
Direct Known Subclasses:
EmailDocumentFormat, HtmlDocumentFormat, RtfDocumentFormat, SgmlDocumentFormat, XmlDocumentFormat

public class TextualDocumentFormat
extends DocumentFormat

The format of Documents. Subclasses of DocumentFormat know about particular MIME types and how to unpack the information in any markup or formatting they contain into GATE annotations. Each MIME type has its own subclass of DocumentFormat, e.g. XmlDocumentFormat, RtfDocumentFormat, MpegDocumentFormat. These classes register themselves with a static index residing here when they are constructed. Static getDocumentFormat methods can then be used to get the appropriate format class for a particular document.

See Also:
Serialized Form

Field Summary
private static boolean DEBUG
          Debug flag
 
Fields inherited from class gate.DocumentFormat
element2StringMap, features, isGateXmlDocument, magic2mimeTypeMap, markupElementsMap, mimeString2ClassHandlerMap, mimeString2mimeTypeMap, mimeType, myStatusListeners, statusListeners, suffixes2mimeTypeMap
 
Fields inherited from class gate.creole.AbstractLanguageResource
dataStore, serialVersionUID
 
Constructor Summary
TextualDocumentFormat()
          Default construction
 
Method Summary
 DataStore getDataStore()
          Get the data store that this LR lives in.
 Resource init()
          Initialise this resource, and return it.
 void unpackMarkup(Document doc)
          Unpack the markup in the document.
 void unpackMarkup(Document doc, String originalContentFeatureType)
          Unpack the markup in the document.
 
Methods inherited from class gate.DocumentFormat
addStatusListener, areEqual, decideBetweenThreeMimeTypes, decideBetweenTwoMimeTypes, fireStatusChanged, getDocumentFormat, getDocumentFormat, getDocumentFormat, getElement2StringMap, getFeatures, getFileSufix, getMarkupElementsMap, getMimeType, getMimeType, getMimeType, guessTypeUsingMagicNumbers, removeStatusListener, runMagicNumbers, setElement2StringMap, setFeatures, setMarkupElementsMap, setMimeType
 
Methods inherited from class gate.creole.AbstractLanguageResource
setDataStore, sync
 
Methods inherited from class gate.creole.AbstractResource
getName, setName
 
Methods inherited from class java.lang.Object
, clone, equals, finalize, getClass, hashCode, notify, notifyAll, registerNatives, toString, wait, wait, wait
 
Methods inherited from interface gate.LanguageResource
setDataStore, sync
 
Methods inherited from interface gate.util.FeatureBearer
getName, setName
 

Field Detail

DEBUG

private static final boolean DEBUG
Debug flag
Constructor Detail

TextualDocumentFormat

public TextualDocumentFormat()
Default construction
Method Detail

init

public Resource init()
              throws ResourceInstantiationException
Initialise this resource, and return it.
Overrides:
init in class AbstractResource

unpackMarkup

public void unpackMarkup(Document doc)
                  throws DocumentFormatException
Unpack the markup in the document. This converts markup from the native format (e.g. XML, RTF) into annotations in GATE format. Uses the markupElementsMap to determine which elements to convert, and what annotation type names to use.
Overrides:
unpackMarkup in class DocumentFormat

unpackMarkup

public void unpackMarkup(Document doc,
                         String originalContentFeatureType)
                  throws DocumentFormatException
Description copied from class: DocumentFormat
Unpack the markup in the document. This converts markup from the native format (e.g. XML, RTF) into annotations in GATE format. Uses the markupElementsMap to determine which elements to convert, and what annotation type names to use.
Overrides:
unpackMarkup in class DocumentFormat

getDataStore

public DataStore getDataStore()
Description copied from interface: LanguageResource
Get the data store that this LR lives in. Null for transient LRs.
Overrides:
getDataStore in class AbstractLanguageResource