gate
Class DocumentFormat

java.lang.Object
  |
  +--gate.util.AbstractFeatureBearer
        |
        +--gate.creole.AbstractResource
              |
              +--gate.creole.AbstractLanguageResource
                    |
                    +--gate.DocumentFormat
All Implemented Interfaces:
FeatureBearer, LanguageResource, NameBearer, Resource, Serializable
Direct Known Subclasses:
TextualDocumentFormat

public abstract class DocumentFormat
extends AbstractLanguageResource
implements LanguageResource

The format of Documents. Subclasses of DocumentFormat know about particular MIME types and how to unpack the information in any markup or formatting they contain into GATE annotations. Each MIME type has its own subclass of DocumentFormat, e.g. XmlDocumentFormat, RtfDocumentFormat, MpegDocumentFormat. These classes register themselves with a static index residing here when they are constructed. Static getDocumentFormat methods can then be used to get the appropriate format class for a particular document.

See Also:
Serialized Form

Constructor Summary
DocumentFormat()
          Default construction
 
Method Summary
 void addStatusListener(StatusListener l)
           
static DocumentFormat getDocumentFormat(Document aGateDocument, org.w3c.www.mime.MimeType mimeType)
          Find a DocumentFormat implementation that deals with a particular MIME type, given that type.
static DocumentFormat getDocumentFormat(Document aGateDocument, String fileSuffix)
          Find a DocumentFormat implementation that deals with a particular MIME type, given the file suffix (e.g.
static DocumentFormat getDocumentFormat(Document aGateDocument, URL url)
          Find a DocumentFormat implementation that deals with a particular MIME type, given the URL of the Document.
 Map getElement2StringMap()
          Get the element 2 string map
 FeatureMap getFeatures()
          Get the feature set
 Map getMarkupElementsMap()
          Get the markup elements map
 org.w3c.www.mime.MimeType getMimeType()
          Gets the mime Type
 Boolean getShouldCollectRepositioning()
           
 void removeStatusListener(StatusListener l)
           
 void setElement2StringMap(Map anElement2StringMap)
          Set the element 2 string map
 void setFeatures(FeatureMap features)
          Set the features map
 void setMarkupElementsMap(Map markupElementsMap)
          Set the markup elements map
 void setMimeType(org.w3c.www.mime.MimeType aMimeType)
          Set the mime type
 void setShouldCollectRepositioning(Boolean b)
           
 Boolean supportsRepositioning()
          If the document format could collect repositioning information during the unpack phase this method will return true.
abstract  void unpackMarkup(Document doc)
          Unpack the markup in the document.
abstract  void unpackMarkup(Document doc, RepositioningInfo repInfo, RepositioningInfo ampCodingInfo)
           
 void unpackMarkup(Document doc, String originalContentFeatureType)
          Unpack the markup in the document.
 
Methods inherited from class gate.creole.AbstractLanguageResource
cleanup, getDataStore, getLRPersistenceId, getParent, isModified, setDataStore, setLRPersistenceId, setParent, sync
 
Methods inherited from class gate.creole.AbstractResource
checkParameterValues, getName, getParameterValue, getParameterValue, init, removeResourceListeners, setName, setParameterValue, setParameterValue, setParameterValues, setParameterValues, setResourceListeners
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface gate.LanguageResource
getDataStore, getLRPersistenceId, getParent, isModified, setDataStore, setLRPersistenceId, setParent, sync
 
Methods inherited from interface gate.Resource
cleanup, getParameterValue, init, setParameterValue, setParameterValues
 
Methods inherited from interface gate.util.NameBearer
getName, setName
 

Constructor Detail

DocumentFormat

public DocumentFormat()
Default construction
Method Detail

supportsRepositioning

public Boolean supportsRepositioning()
If the document format could collect repositioning information during the unpack phase this method will return true.
You should override this method in the child class of the defined document format if it could collect the repositioning information.

setShouldCollectRepositioning

public void setShouldCollectRepositioning(Boolean b)

getShouldCollectRepositioning

public Boolean getShouldCollectRepositioning()

unpackMarkup

public abstract void unpackMarkup(Document doc)
                           throws DocumentFormatException
Unpack the markup in the document. This converts markup from the native format (e.g. XML, RTF) into annotations in GATE format. Uses the markupElementsMap to determine which elements to convert, and what annotation type names to use.

unpackMarkup

public abstract void unpackMarkup(Document doc,
                                  RepositioningInfo repInfo,
                                  RepositioningInfo ampCodingInfo)
                           throws DocumentFormatException

unpackMarkup

public void unpackMarkup(Document doc,
                         String originalContentFeatureType)
                  throws DocumentFormatException
Unpack the markup in the document. This method calls unpackMarkup on the GATE document, but after it saves its content as a feature atached to the document. This method is usefull if one wants to save the content of the document being unpacked. After the markups have been unpacked, the content of the document will be replaced with a new one containing the text between markups.
Parameters:
doc - the document that will be upacked
originalContentFeatureType - the name of the feature that will hold the document's content.

getDocumentFormat

public static DocumentFormat getDocumentFormat(Document aGateDocument,
                                               org.w3c.www.mime.MimeType mimeType)
Find a DocumentFormat implementation that deals with a particular MIME type, given that type.
Parameters:
aGateDocument - this document will receive as a feature the associated Mime Type. The name of the feature is MimeType and its value is in the format type/subtype
mimeType - the mime type that is given as input

getDocumentFormat

public static DocumentFormat getDocumentFormat(Document aGateDocument,
                                               String fileSuffix)
Find a DocumentFormat implementation that deals with a particular MIME type, given the file suffix (e.g. ".txt") that the document came from.
Parameters:
aGateDocument - this document will receive as a feature the associated Mime Type. The name of the feature is MimeType and its value is in the format type/subtype
fileSuffix - the file suffix that is given as input

getDocumentFormat

public static DocumentFormat getDocumentFormat(Document aGateDocument,
                                               URL url)
Find a DocumentFormat implementation that deals with a particular MIME type, given the URL of the Document. If it is an HTTP URL, we can ask the web server. If it has a recognised file extension, we can use that. Otherwise we need to use a map of magic numbers to MIME types to guess the type, and then look up the format using the type.
Parameters:
aGateDocument - this document will receive as a feature the associated Mime Type. The name of the feature is MimeType and its value is in the format type/subtype
url - the URL that is given as input

getFeatures

public FeatureMap getFeatures()
Get the feature set
Specified by:
getFeatures in interface FeatureBearer
Overrides:
getFeatures in class AbstractFeatureBearer

getMarkupElementsMap

public Map getMarkupElementsMap()
Get the markup elements map

getElement2StringMap

public Map getElement2StringMap()
Get the element 2 string map

setMarkupElementsMap

public void setMarkupElementsMap(Map markupElementsMap)
Set the markup elements map

setElement2StringMap

public void setElement2StringMap(Map anElement2StringMap)
Set the element 2 string map

setFeatures

public void setFeatures(FeatureMap features)
Set the features map
Specified by:
setFeatures in interface FeatureBearer
Overrides:
setFeatures in class AbstractFeatureBearer

setMimeType

public void setMimeType(org.w3c.www.mime.MimeType aMimeType)
Set the mime type

getMimeType

public org.w3c.www.mime.MimeType getMimeType()
Gets the mime Type

removeStatusListener

public void removeStatusListener(StatusListener l)

addStatusListener

public void addStatusListener(StatusListener l)