gate.sgml
Class Sgml2Xml

java.lang.Object
  |
  +--gate.sgml.Sgml2Xml

public class Sgml2Xml
extends Object

Not so fast... This class is not a realy Sgml2Xml convertor. It takes an SGML document and tries to prepare it for an XML parser For a true conversion we need an Java SGML parser... If you know one let me know.... What does it do:

What doesn't:


Field Summary
private  int attrEnd
           
private  int attrStart
           
private  int charPos
           
private  int closePos
           
private static boolean DEBUG
          Debug flag
private  List dubiousElements
           
private  String elemName
           
private  int elemNameEnd
           
private  int elemNameStart
           
private  char endPair
           
private  char m_currChar
           
private  int m_currState
           
private  int m_cursor
           
private  Document m_doc
           
private  StringBuffer m_modifier
           
private  Stack stack
           
 
Constructor Summary
Sgml2Xml(Document doc)
          The other constructor
Sgml2Xml(String SgmlDoc)
          The constructor initialises some member fields
 
Method Summary
 String convert()
          This method is responsable with document conversion
private  void doState1(char currChar)
          It analises the char that was red in state 1 If it finds '<' it then goes to state 2 Otherwise it stays in state 1 and keeps track about the text that is not white spaces.
private  void doState10(char currChar)
          If any C -> state 4 If '=' state 6 Stays here while reads WS
private  void doState11(char currChar)
          We are preparing to read the and definition of an element Stays in this state while reading WS
private  void doState12(char currChar)
          Here we read the element's name ...this is an end tag Stays here while reads a char
private  void doState13(char currChar)
          If '>' -> state 1 Stays here while reads WS
private  void doState2(char currChar)
          We came from state 1 and just read '<' If currChar == '/' -> state 11 If is a char != white spaces -> state 3 stay in state 2 while there are only white spaces
private  void doState3(char currChar)
          Just read the first char from the element's name and now analize the next char.
private  void doState4(char currChar)
          We read the name of the element and we prepare for '>' or attributes '>' -> state 1 any char !- white space -> state 5
private  void doState5(char currChar)
          '=' -> state 6 '>' -> state 4 (we didn't read an attribute but a value of the defaultAtt ) WS (white spaces) we don't know yet if we read an attribute or the value of the defaultAttr -> state 10 This state modifies the content onf m_modifier ...
private  void doState6(char currChar)
          IF we read ' or " then we have to get prepared to read everything until the next ' or " If we read a char then -> state 8; Stay here while we read WS
private  void doState7(char currChar)
          If we find the pair ' or " go to state 9 Otherwhise read everything and stay in state 7 If in state 7 we read '>' then we add automaticaly a " at the end and go to state 1
private  void doState8(char currChar)
          If '>' go to state 1 If WS go to state 9 Stays in state 8 and read the attribute's value
private  void doState9(char currChar)
          Here we prepare to read another attrib, value pair (any char -> state 5) If '>' we just read a beggining tag -> state 1 Stay here while read WS
private  boolean isWhiteSpace(char c)
          Tests if c is a white space char
private  void makeFinalModifications(CustomObject aCustomObject)
          This method is called after we read the entire SGML document It resolves the dobious Elements this way: 1.
private  void performActionWithEndElem(String elemName)
          This is the action performed when an end tag is read.
private  void performFinalAction(String elemName, int pos)
          This is the action when we finished to read the entire tag The action means that we put the tag into stack and consider that is empty as default
private  char read()
          This method reads a char and increments the m_cursor
private  boolean thereAreCharsToBeProcessed()
          This method tests to see if there are more char to be read It will return false when there are no more chars to be read
 
Methods inherited from class java.lang.Object
, clone, equals, finalize, getClass, hashCode, notify, notifyAll, registerNatives, toString, wait, wait, wait
 

Field Detail

DEBUG

private static final boolean DEBUG
Debug flag

m_doc

private Document m_doc

m_modifier

private StringBuffer m_modifier

stack

private Stack stack

dubiousElements

private List dubiousElements

m_cursor

private int m_cursor

m_currState

private int m_currState

m_currChar

private char m_currChar

charPos

private int charPos

elemName

private String elemName

elemNameStart

private int elemNameStart

elemNameEnd

private int elemNameEnd

closePos

private int closePos

attrStart

private int attrStart

attrEnd

private int attrEnd

endPair

private char endPair
Constructor Detail

Sgml2Xml

public Sgml2Xml(String SgmlDoc)
The constructor initialises some member fields
Parameters:
SgmlDoc - the content of the Sgml document that will be modified

Sgml2Xml

public Sgml2Xml(Document doc)
The other constructor
Parameters:
doc - The Gate document that will be transformed to XML
Method Detail

doState1

private void doState1(char currChar)
It analises the char that was red in state 1 If it finds '<' it then goes to state 2 Otherwise it stays in state 1 and keeps track about the text that is not white spaces.

doState2

private void doState2(char currChar)
We came from state 1 and just read '<' If currChar == '/' -> state 11 If is a char != white spaces -> state 3 stay in state 2 while there are only white spaces

doState3

private void doState3(char currChar)
Just read the first char from the element's name and now analize the next char. If '>' the elem name was a single char -> state 1 IF is WhiteSpaces -> state 4 Otherwise stay in state 3 and read the elemnt's name

doState4

private void doState4(char currChar)
We read the name of the element and we prepare for '>' or attributes '>' -> state 1 any char !- white space -> state 5

doState5

private void doState5(char currChar)
'=' -> state 6 '>' -> state 4 (we didn't read an attribute but a value of the defaultAtt ) WS (white spaces) we don't know yet if we read an attribute or the value of the defaultAttr -> state 10 This state modifies the content onf m_modifier ... it adds text

doState6

private void doState6(char currChar)
IF we read ' or " then we have to get prepared to read everything until the next ' or " If we read a char then -> state 8; Stay here while we read WS

doState7

private void doState7(char currChar)
If we find the pair ' or " go to state 9 Otherwhise read everything and stay in state 7 If in state 7 we read '>' then we add automaticaly a " at the end and go to state 1

doState8

private void doState8(char currChar)
If '>' go to state 1 If WS go to state 9 Stays in state 8 and read the attribute's value

doState9

private void doState9(char currChar)
Here we prepare to read another attrib, value pair (any char -> state 5) If '>' we just read a beggining tag -> state 1 Stay here while read WS

doState10

private void doState10(char currChar)
If any C -> state 4 If '=' state 6 Stays here while reads WS

doState11

private void doState11(char currChar)
We are preparing to read the and definition of an element Stays in this state while reading WS

doState12

private void doState12(char currChar)
Here we read the element's name ...this is an end tag Stays here while reads a char

doState13

private void doState13(char currChar)
If '>' -> state 1 Stays here while reads WS

convert

public String convert()
               throws IOException,
                      MalformedURLException
This method is responsable with document conversion

thereAreCharsToBeProcessed

private boolean thereAreCharsToBeProcessed()
This method tests to see if there are more char to be read It will return false when there are no more chars to be read

read

private char read()
This method reads a char and increments the m_cursor

performFinalAction

private void performFinalAction(String elemName,
                                int pos)
This is the action when we finished to read the entire tag The action means that we put the tag into stack and consider that is empty as default

performActionWithEndElem

private void performActionWithEndElem(String elemName)
This is the action performed when an end tag is read. The action consists in colecting all the dubiosElements(elements without an end tag). They are considered dubious because we don't know if they are empty or may be closed... Only the DTD can provide this information. We don't have a DTD so we will consider that all dubious elements followed by text will close at the end of the text... If a dubious element is followed by another element then is automaticaly considered an empty element.
Parameters:
elemName - is the the name of the end tag that was read

makeFinalModifications

private void makeFinalModifications(CustomObject aCustomObject)
This method is called after we read the entire SGML document It resolves the dobious Elements this way:

isWhiteSpace

private boolean isWhiteSpace(char c)
Tests if c is a white space char