|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--gate.util.AbstractFeatureBearer | +--gate.creole.AbstractResource | +--gate.creole.AbstractLanguageResource | +--gate.DocumentFormat
The format of Documents. Subclasses of DocumentFormat know about particular MIME types and how to unpack the information in any markup or formatting they contain into GATE annotations. Each MIME type has its own subclass of DocumentFormat, e.g. XmlDocumentFormat, RtfDocumentFormat, MpegDocumentFormat. These classes register themselves with a static index residing here when they are constructed. Static getDocumentFormat methods can then be used to get the appropriate format class for a particular document.
Field Summary | |
private static boolean |
DEBUG
Debug flag |
protected Map |
element2StringMap
This map is used inside uppackMarkup() method... |
private FeatureMap |
features
The features of this resource |
protected static boolean |
isGateXmlDocument
This fields indicates whether the document being processed is in a Gate XML custom format. |
protected static Map |
magic2mimeTypeMap
Map of Set of magic numbers to MimeType. |
protected Map |
markupElementsMap
Map of markup elements to annotation types. |
protected static Map |
mimeString2ClassHandlerMap
Map of MimeTypeString to ClassHandler class. |
protected static Map |
mimeString2mimeTypeMap
Map of MimeType to DocumentFormat Class. |
private org.w3c.www.mime.MimeType |
mimeType
The MIME type of this format. |
private Boolean |
shouldCollectRepositioning
Flag for enable/disable collecting of repositioning information |
private Vector |
statusListeners
listeners for status report |
protected static Map |
suffixes2mimeTypeMap
Map of Set of file suffixes to MimeType. |
Fields inherited from class gate.creole.AbstractLanguageResource |
dataStore, lrPersistentId, serialVersionUID |
Fields inherited from class gate.creole.AbstractResource |
name |
Constructor Summary | |
DocumentFormat()
Default construction |
Method Summary | |
void |
addStatusListener(StatusListener l)
|
protected static boolean |
areEqual(org.w3c.www.mime.MimeType aMimeType,
org.w3c.www.mime.MimeType anotherMimeType)
Tests if two MimeType objects are equal. |
protected static org.w3c.www.mime.MimeType |
decideBetweenThreeMimeTypes(org.w3c.www.mime.MimeType aMimeTypeFromWebServer,
org.w3c.www.mime.MimeType aMimeTypeFromFileSuffix,
org.w3c.www.mime.MimeType aMimeTypeFromMagicNumbers)
This method decides what mimeType is in majority |
protected static org.w3c.www.mime.MimeType |
decideBetweenTwoMimeTypes(org.w3c.www.mime.MimeType aMimeType,
org.w3c.www.mime.MimeType anotherMimeType)
Decide between two mimeTypes. |
protected void |
fireStatusChanged(String e)
|
static DocumentFormat |
getDocumentFormat(Document aGateDocument,
org.w3c.www.mime.MimeType mimeType)
Find a DocumentFormat implementation that deals with a particular MIME type, given that type. |
static DocumentFormat |
getDocumentFormat(Document aGateDocument,
String fileSuffix)
Find a DocumentFormat implementation that deals with a particular MIME type, given the file suffix (e.g. |
static DocumentFormat |
getDocumentFormat(Document aGateDocument,
URL url)
Find a DocumentFormat implementation that deals with a particular MIME type, given the URL of the Document. |
Map |
getElement2StringMap()
Get the element 2 string map |
FeatureMap |
getFeatures()
Get the feature set |
private static String |
getFileSufix(URL url)
Return the fileSuffix or null if the url doesn't have a file suffix If the url is null then the file suffix will be null also |
Map |
getMarkupElementsMap()
Get the markup elements map |
org.w3c.www.mime.MimeType |
getMimeType()
Gets the mime Type |
private static org.w3c.www.mime.MimeType |
getMimeType(String fileSufix)
Returns a MimeType having as input a fileSufix. |
private static org.w3c.www.mime.MimeType |
getMimeType(URL url)
Returns a MymeType having as input a URL object. |
Boolean |
getShouldCollectRepositioning()
|
protected static org.w3c.www.mime.MimeType |
guessTypeUsingMagicNumbers(InputStream aInputStream,
String anEncoding)
This method tries to guess the mime Type using some magic numbers. |
void |
removeStatusListener(StatusListener l)
|
protected static org.w3c.www.mime.MimeType |
runMagicNumbers(InputStreamReader aReader)
Performs magic over Gate Document |
void |
setElement2StringMap(Map anElement2StringMap)
Set the element 2 string map |
void |
setFeatures(FeatureMap features)
Set the features map |
void |
setMarkupElementsMap(Map markupElementsMap)
Set the markup elements map |
void |
setMimeType(org.w3c.www.mime.MimeType aMimeType)
Set the mime type |
void |
setShouldCollectRepositioning(Boolean b)
|
Boolean |
supportsRepositioning()
If the document format could collect repositioning information during the unpack phase this method will return true. |
abstract void |
unpackMarkup(Document doc)
Unpack the markup in the document. |
abstract void |
unpackMarkup(Document doc,
RepositioningInfo repInfo,
RepositioningInfo ampCodingInfo)
|
void |
unpackMarkup(Document doc,
String originalContentFeatureType)
Unpack the markup in the document. |
Methods inherited from class gate.creole.AbstractLanguageResource |
cleanup, getDataStore, getLRPersistenceId, getParent, isModified, setDataStore, setLRPersistenceId, setParent, sync |
Methods inherited from class gate.creole.AbstractResource |
checkParameterValues, getName, getParameterValue, getParameterValue, init, removeResourceListeners, setName, setParameterValue, setParameterValue, setParameterValues, setParameterValues, setResourceListeners |
Methods inherited from class java.lang.Object |
|
Methods inherited from interface gate.LanguageResource |
getDataStore, getLRPersistenceId, getParent, isModified, setDataStore, setLRPersistenceId, setParent, sync |
Methods inherited from interface gate.Resource |
cleanup, getParameterValue, init, setParameterValue, setParameterValues |
Methods inherited from interface gate.util.NameBearer |
getName, setName |
Field Detail |
private static final boolean DEBUG
protected static boolean isGateXmlDocument
private org.w3c.www.mime.MimeType mimeType
protected static Map mimeString2ClassHandlerMap
protected static Map mimeString2mimeTypeMap
protected static Map suffixes2mimeTypeMap
protected static Map magic2mimeTypeMap
protected Map markupElementsMap
protected Map element2StringMap
private FeatureMap features
private transient Vector statusListeners
private Boolean shouldCollectRepositioning
Constructor Detail |
public DocumentFormat()
Method Detail |
public Boolean supportsRepositioning()
public void setShouldCollectRepositioning(Boolean b)
public Boolean getShouldCollectRepositioning()
public abstract void unpackMarkup(Document doc) throws DocumentFormatException
public abstract void unpackMarkup(Document doc, RepositioningInfo repInfo, RepositioningInfo ampCodingInfo) throws DocumentFormatException
public void unpackMarkup(Document doc, String originalContentFeatureType) throws DocumentFormatException
doc
- the document that will be upackedoriginalContentFeatureType
- the name of the feature that will hold
the document's content.private static org.w3c.www.mime.MimeType getMimeType(String fileSufix)
fileSufix
- The file sufix associated with a recognisabe mime type.private static org.w3c.www.mime.MimeType getMimeType(URL url)
url
- The URL object from which the MimeType will be extractedprotected static org.w3c.www.mime.MimeType decideBetweenThreeMimeTypes(org.w3c.www.mime.MimeType aMimeTypeFromWebServer, org.w3c.www.mime.MimeType aMimeTypeFromFileSuffix, org.w3c.www.mime.MimeType aMimeTypeFromMagicNumbers)
aMimeTypeFromWebServer
- a MimeTypeaMimeTypeFromFileSuffix
- a MimeTypeaMimeTypeFromMagicNumbers
- a MimeTypeprotected static org.w3c.www.mime.MimeType decideBetweenTwoMimeTypes(org.w3c.www.mime.MimeType aMimeType, org.w3c.www.mime.MimeType anotherMimeType)
aMimeType
- a MimeType object with "Prority" parameter setanotherMimeType
- a MimeType object with "Prority" parameter setprotected static boolean areEqual(org.w3c.www.mime.MimeType aMimeType, org.w3c.www.mime.MimeType anotherMimeType)
protected static org.w3c.www.mime.MimeType guessTypeUsingMagicNumbers(InputStream aInputStream, String anEncoding)
aInputStream
- a InputStream which has to be transformed into a
InputStreamReaderanEncoding
- the encoding. If is null or unknown then a
InputStreamReader with default encodings will be created.protected static org.w3c.www.mime.MimeType runMagicNumbers(InputStreamReader aReader)
private static String getFileSufix(URL url)
public static DocumentFormat getDocumentFormat(Document aGateDocument, org.w3c.www.mime.MimeType mimeType)
aGateDocument
- this document will receive as a feature
the associated Mime Type. The name of the feature is
MimeType and its value is in the format type/subtypemimeType
- the mime type that is given as inputpublic static DocumentFormat getDocumentFormat(Document aGateDocument, String fileSuffix)
aGateDocument
- this document will receive as a feature
the associated Mime Type. The name of the feature is
MimeType and its value is in the format type/subtypefileSuffix
- the file suffix that is given as inputpublic static DocumentFormat getDocumentFormat(Document aGateDocument, URL url)
aGateDocument
- this document will receive as a feature
the associated Mime Type. The name of the feature is
MimeType and its value is in the format type/subtypeurl
- the URL that is given as inputpublic FeatureMap getFeatures()
getFeatures
in interface FeatureBearer
getFeatures
in class AbstractFeatureBearer
public Map getMarkupElementsMap()
public Map getElement2StringMap()
public void setMarkupElementsMap(Map markupElementsMap)
public void setElement2StringMap(Map anElement2StringMap)
public void setFeatures(FeatureMap features)
setFeatures
in interface FeatureBearer
setFeatures
in class AbstractFeatureBearer
public void setMimeType(org.w3c.www.mime.MimeType aMimeType)
public org.w3c.www.mime.MimeType getMimeType()
public void removeStatusListener(StatusListener l)
public void addStatusListener(StatusListener l)
protected void fireStatusChanged(String e)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |