|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectconvert.xml.tok.Tokenizer
public class Tokenizer
It provides operations on char arrays that represent all or part of a parsed XML entity.
Several methods operate on char subarrays. The subarray is specified
by a char array buf
and two integers,
off
and end
; off
gives the index in buf
of the first char of the subarray
and end
gives the
index in buf
of the char immediately after the last char.
The main operations provided by Tokenizer
are
tokenizeProlog
, tokenizeContent
and
tokenizeCdataSection
;
these are used to divide up an XML entity into tokens.
tokenizeProlog
is used for the prolog of an XML document
as well as for the external subset and parameter entities (except
when referenced in an EntityValue
);
it can also be used for parsing the Misc
* that follows
the document element.
tokenizeContent
is used for the document element and for
parsed general entities that are referenced in content
except for CDATA sections.
tokenizeCdataSection
is used for CDATA sections, following
the <![CDATA[
up to and including the ]]>
.
tokenizeAttributeValue
and tokenizeEntityValue
are used to further divide up tokens returned by tokenizeProlog
and tokenizeContent
; they are also used to divide up entities
referenced in attribute values or entity values.
Field Summary | |
---|---|
static int |
TOK_ATTRIBUTE_VALUE_S
Represents a white space character in an attribute value, excluding white space characters that are part of line boundaries. |
static int |
TOK_CDATA_SECT_CLOSE
Represents the end of a CDATA section ]]> . |
static int |
TOK_CDATA_SECT_OPEN
Represents the start of a CDATA section <![CDATA[ . |
static int |
TOK_CHAR_PAIR_REF
Represents a numeric character reference (decimal or hexadecimal), when the referenced character is greater than 0xFFFF and so is represented by a pair of chars. |
static int |
TOK_CHAR_REF
Represents a numeric character reference (decimal or hexadecimal), when the referenced character is less than or equal to 0xFFFF and so is represented by a single char. |
static int |
TOK_CLOSE_BRACKET
Represents ] in the prolog. |
static int |
TOK_CLOSE_PAREN
Represents a ) in the prolog that is not
followed immediately by any of
* , + or ? . |
static int |
TOK_CLOSE_PAREN_ASTERISK
Represents )* in the prolog. |
static int |
TOK_CLOSE_PAREN_PLUS
Represents )+ in the prolog. |
static int |
TOK_CLOSE_PAREN_QUESTION
Represents )? in the prolog. |
static int |
TOK_COMMA
Represents , in the prolog. |
static int |
TOK_COMMENT
Represents a comment <!-- comment --> . |
static int |
TOK_COND_SECT_CLOSE
Represents ]]> in the prolog. |
static int |
TOK_COND_SECT_OPEN
Represents <![ in the prolog. |
static int |
TOK_DATA_CHARS
Represents one or more characters of data. |
static int |
TOK_DATA_NEWLINE
Represents a newline (CR, LF or CR followed by LF) in data. |
static int |
TOK_DECL_CLOSE
Represents > in the prolog. |
static int |
TOK_DECL_OPEN
Represents <!NAME in the prolog. |
static int |
TOK_EMPTY_ELEMENT_NO_ATTS
Represents an empty element tag <name/> ,
that doesn't have any attribute specifications. |
static int |
TOK_EMPTY_ELEMENT_WITH_ATTS
Represents an empty element tag <name att="val"/> ,
that contains one or more attribute specifications. |
static int |
TOK_END_TAG
Represents a complete end-tag </name> . |
static int |
TOK_ENTITY_REF
Represents a general entity reference. |
static int |
TOK_LITERAL
Represents a literal (EntityValue, AttValue, SystemLiteral or PubidLiteral). |
static int |
TOK_MAGIC_ENTITY_REF
Represents a general entity reference to a one of the 5 predefined entities amp , lt , gt ,
quot , apos . |
static int |
TOK_NAME
Represents an unprefixed name in the prolog. |
static int |
TOK_NAME_ASTERISK
Represents a name followed immediately by * . |
static int |
TOK_NAME_PLUS
Represents a name followed immediately by + . |
static int |
TOK_NAME_QUESTION
Represents a name followed immediately by ? . |
static int |
TOK_NMTOKEN
Represents a name token in the prolog that is not a name. |
static int |
TOK_OPEN_BRACKET
Represents [ in the prolog. |
static int |
TOK_OPEN_PAREN
Represents a ( in the prolog. |
static int |
TOK_OR
Represents | in the prolog. |
static int |
TOK_PARAM_ENTITY_REF
Represents a parameter entity reference in the prolog. |
static int |
TOK_PERCENT
Represents a % in the prolog that does not start
a parameter entity reference. |
static int |
TOK_PI
Represents a processing instruction. |
static int |
TOK_POUND_NAME
Represents #NAME in the prolog. |
static int |
TOK_PREFIXED_NAME
Represents a name with a prefix. |
static int |
TOK_PROLOG_S
Represents whitespace in the prolog. |
static int |
TOK_START_TAG_NO_ATTS
Represents a complete start-tag <name> ,
that doesn't have any attribute specifications. |
static int |
TOK_START_TAG_WITH_ATTS
Represents a complete start-tag <name att="val"> ,
that contains one or more attribute specifications. |
static int |
TOK_XML_DECL
Represents an XML declaration or text declaration (a processing instruction whose target is xml ). |
Constructor Summary | |
---|---|
Tokenizer()
|
Method Summary | |
---|---|
static java.lang.String |
getPublicId(char[] buf,
int off,
int end)
Checks that a literal contained in the specified char subarray is a legal public identifier and returns a string with the normalized content of the public id. |
static boolean |
matchesXMLString(char[] buf,
int off,
int end,
java.lang.String str)
Returns true if the specified char subarray is equal to the string. |
static void |
movePosition(char[] buf,
int off,
int end,
Position pos)
Moves a position forward. |
static int |
skipIgnoreSect(char[] buf,
int off,
int end)
Skips over an ignored conditional section. |
static int |
skipS(char[] buf,
int off,
int end)
Skips over XML whitespace characters at the start of the specified subarray. |
static int |
tokenizeAttributeValue(char[] buf,
int off,
int end,
Token token)
Scans the first token of a char subarrary that contains part of literal attribute value. |
static int |
tokenizeCdataSection(char[] buf,
int off,
int end,
Token token)
Scans the first token of a char subarrary that starts with the content of a CDATA section. |
static int |
tokenizeContent(char[] buf,
int off,
int end,
ContentToken token)
Scans the first token of a char subarrary that contains content. |
static int |
tokenizeEntityValue(char[] buf,
int off,
int end,
Token token)
Scans the first token of a char subarrary that contains part of literal entity value. |
static int |
tokenizeProlog(char[] buf,
int off,
int end,
Token token)
Scans the first token of a char subarray that contains part of a prolog. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final int TOK_DATA_CHARS
public static final int TOK_DATA_NEWLINE
public static final int TOK_START_TAG_NO_ATTS
<name>
,
that doesn't have any attribute specifications.
public static final int TOK_START_TAG_WITH_ATTS
<name att="val">
,
that contains one or more attribute specifications.
public static final int TOK_EMPTY_ELEMENT_NO_ATTS
<name/>
,
that doesn't have any attribute specifications.
public static final int TOK_EMPTY_ELEMENT_WITH_ATTS
<name att="val"/>
,
that contains one or more attribute specifications.
public static final int TOK_END_TAG
</name>
.
public static final int TOK_CDATA_SECT_OPEN
<![CDATA[
.
public static final int TOK_CDATA_SECT_CLOSE
]]>
.
public static final int TOK_ENTITY_REF
public static final int TOK_MAGIC_ENTITY_REF
amp
, lt
, gt
,
quot
, apos
.
public static final int TOK_CHAR_REF
public static final int TOK_CHAR_PAIR_REF
public static final int TOK_PI
public static final int TOK_XML_DECL
xml
).
public static final int TOK_COMMENT
<!-- comment -->
.
This can occur both in the prolog and in content.
public static final int TOK_ATTRIBUTE_VALUE_S
public static final int TOK_PARAM_ENTITY_REF
public static final int TOK_PROLOG_S
public static final int TOK_DECL_OPEN
<!NAME
in the prolog.
public static final int TOK_DECL_CLOSE
>
in the prolog.
public static final int TOK_NAME
public static final int TOK_PREFIXED_NAME
public static final int TOK_NMTOKEN
public static final int TOK_POUND_NAME
#NAME
in the prolog.
public static final int TOK_OR
|
in the prolog.
public static final int TOK_PERCENT
%
in the prolog that does not start
a parameter entity reference.
This can occur in an entity declaration.
public static final int TOK_OPEN_PAREN
(
in the prolog.
public static final int TOK_CLOSE_PAREN
)
in the prolog that is not
followed immediately by any of
*
, +
or ?
.
public static final int TOK_OPEN_BRACKET
[
in the prolog.
public static final int TOK_CLOSE_BRACKET
]
in the prolog.
public static final int TOK_LITERAL
public static final int TOK_NAME_QUESTION
?
.
public static final int TOK_NAME_ASTERISK
*
.
public static final int TOK_NAME_PLUS
+
.
public static final int TOK_COND_SECT_OPEN
<![
in the prolog.
public static final int TOK_COND_SECT_CLOSE
]]>
in the prolog.
public static final int TOK_CLOSE_PAREN_QUESTION
)?
in the prolog.
public static final int TOK_CLOSE_PAREN_ASTERISK
)*
in the prolog.
public static final int TOK_CLOSE_PAREN_PLUS
)+
in the prolog.
public static final int TOK_COMMA
,
in the prolog.
Constructor Detail |
---|
public Tokenizer()
Method Detail |
---|
public static void movePosition(char[] buf, int off, int end, Position pos)
pos
gives the position of the char at index
off
in buf
.
On exit, it pos
will give the position of the char at index
end
, which must be greater than or equal to off
.
The chars between off
and end
must encode
one or more complete characters.
A carriage return followed by a line feed will be treated as a single
line delimiter provided that they are given to movePosition
together.
public static int tokenizeCdataSection(char[] buf, int off, int end, Token token) throws EmptyTokenException, PartialTokenException, InvalidTokenException, ExtensibleTokenException
TOK_DATA_CHARS
TOK_DATA_NEWLINE
TOK_CDATA_SECT_CLOSE
Information about the token is stored in token
.
After TOK_CDATA_SECT_CLOSE
is returned, the application
should use tokenizeContent
.
EmptyTokenException
- if the subarray is empty
PartialTokenException
- if the subarray contains only part of
a legal token
InvalidTokenException
- if the subarrary does not start
with a legal token or part of one
ExtensibleTokenException
- if the subarray encodes just a carriage
return ('\r')TOK_DATA_CHARS
,
TOK_DATA_NEWLINE
,
TOK_CDATA_SECT_CLOSE
,
Token
,
EmptyTokenException
,
PartialTokenException
,
InvalidTokenException
,
ExtensibleTokenException
,
tokenizeContent(char[], int, int, convert.xml.tok.ContentToken)
public static int tokenizeContent(char[] buf, int off, int end, ContentToken token) throws PartialTokenException, InvalidTokenException, EmptyTokenException, ExtensibleTokenException
TOK_START_TAG_NO_ATTS
TOK_START_TAG_WITH_ATTS
TOK_EMPTY_ELEMENT_NO_ATTS
TOK_EMPTY_ELEMENT_WITH_ATTS
TOK_END_TAG
TOK_DATA_CHARS
TOK_DATA_NEWLINE
TOK_CDATA_SECT_OPEN
TOK_ENTITY_REF
TOK_MAGIC_ENTITY_REF
TOK_CHAR_REF
TOK_CHAR_PAIR_REF
TOK_PI
TOK_XML_DECL
TOK_COMMENT
Information about the token is stored in token
.
When TOK_CDATA_SECT_OPEN
is returned,
tokenizeCdataSection
should be called until
it returns TOK_CDATA_SECT
.
EmptyTokenException
- if the subarray is empty
PartialTokenException
- if the subarray contains only part of
a legal token
InvalidTokenException
- if the subarrary does not start
with a legal token or part of one
ExtensibleTokenException
- if the subarray encodes just a carriage
return ('\r')TOK_START_TAG_NO_ATTS
,
TOK_START_TAG_WITH_ATTS
,
TOK_EMPTY_ELEMENT_NO_ATTS
,
TOK_EMPTY_ELEMENT_WITH_ATTS
,
TOK_END_TAG
,
TOK_DATA_CHARS
,
TOK_DATA_NEWLINE
,
TOK_CDATA_SECT_OPEN
,
TOK_ENTITY_REF
,
TOK_MAGIC_ENTITY_REF
,
TOK_CHAR_REF
,
TOK_CHAR_PAIR_REF
,
TOK_PI
,
TOK_XML_DECL
,
TOK_COMMENT
,
ContentToken
,
EmptyTokenException
,
PartialTokenException
,
InvalidTokenException
,
ExtensibleTokenException
,
tokenizeCdataSection(char[], int, int, convert.xml.tok.Token)
public static int tokenizeProlog(char[] buf, int off, int end, Token token) throws PartialTokenException, InvalidTokenException, EmptyTokenException, ExtensibleTokenException, EndOfPrologException
TOK_PI
TOK_XML_DECL
TOK_COMMENT
TOK_PARAM_ENTITY_REF
TOK_PROLOG_S
TOK_DECL_OPEN
TOK_DECL_CLOSE
TOK_NAME
TOK_NMTOKEN
TOK_POUND_NAME
TOK_OR
TOK_PERCENT
TOK_OPEN_PAREN
TOK_CLOSE_PAREN
TOK_OPEN_BRACKET
TOK_CLOSE_BRACKET
TOK_LITERAL
TOK_NAME_QUESTION
TOK_NAME_ASTERISK
TOK_NAME_PLUS
TOK_COND_SECT_OPEN
TOK_COND_SECT_CLOSE
TOK_CLOSE_PAREN_QUESTION
TOK_CLOSE_PAREN_ASTERISK
TOK_CLOSE_PAREN_PLUS
TOK_COMMA
EmptyTokenException
- if the subarray is empty
PartialTokenException
- if the subarray contains only part of
a legal token
InvalidTokenException
- if the subarrary does not start
with a legal token or part of one
EndOfPrologException
- if the subarray starts with the document
element; tokenizeContent
should be used on the remainder
of the entity
ExtensibleTokenException
- if the subarray is a legal token
but subsequent chars in the same entity could be part of the tokenTOK_PI
,
TOK_XML_DECL
,
TOK_COMMENT
,
TOK_PARAM_ENTITY_REF
,
TOK_PROLOG_S
,
TOK_DECL_OPEN
,
TOK_DECL_CLOSE
,
TOK_NAME
,
TOK_NMTOKEN
,
TOK_POUND_NAME
,
TOK_OR
,
TOK_PERCENT
,
TOK_OPEN_PAREN
,
TOK_CLOSE_PAREN
,
TOK_OPEN_BRACKET
,
TOK_CLOSE_BRACKET
,
TOK_LITERAL
,
TOK_NAME_QUESTION
,
TOK_NAME_ASTERISK
,
TOK_NAME_PLUS
,
TOK_COND_SECT_OPEN
,
TOK_COND_SECT_CLOSE
,
TOK_CLOSE_PAREN_QUESTION
,
TOK_CLOSE_PAREN_ASTERISK
,
TOK_CLOSE_PAREN_PLUS
,
TOK_COMMA
,
ContentToken
,
EmptyTokenException
,
PartialTokenException
,
InvalidTokenException
,
ExtensibleTokenException
,
EndOfPrologException
public static int tokenizeAttributeValue(char[] buf, int off, int end, Token token) throws PartialTokenException, InvalidTokenException, EmptyTokenException, ExtensibleTokenException
TOK_DATA_CHARS
TOK_DATA_NEWLINE
TOK_ATTRIBUTE_VALUE_S
TOK_MAGIC_ENTITY_REF
TOK_ENTITY_REF
TOK_CHAR_REF
TOK_CHAR_PAIR_REF
EmptyTokenException
- if the subarray is empty
PartialTokenException
- if the subarray contains only part of
a legal token
InvalidTokenException
- if the subarrary does not start
with a legal token or part of one
ExtensibleTokenException
- if the subarray encodes just a carriage
return ('\r')TOK_DATA_CHARS
,
TOK_DATA_NEWLINE
,
TOK_ATTRIBUTE_VALUE_S
,
TOK_MAGIC_ENTITY_REF
,
TOK_ENTITY_REF
,
TOK_CHAR_REF
,
TOK_CHAR_PAIR_REF
,
Token
,
EmptyTokenException
,
PartialTokenException
,
InvalidTokenException
,
ExtensibleTokenException
public static int tokenizeEntityValue(char[] buf, int off, int end, Token token) throws PartialTokenException, InvalidTokenException, EmptyTokenException, ExtensibleTokenException
TOK_DATA_CHARS
TOK_DATA_NEWLINE
TOK_PARAM_ENTITY_REF
TOK_MAGIC_ENTITY_REF
TOK_ENTITY_REF
TOK_CHAR_REF
TOK_CHAR_PAIR_REF
EmptyTokenException
- if the subarray is empty
PartialTokenException
- if the subarray contains only part of
a legal token
InvalidTokenException
- if the subarrary does not start
with a legal token or part of one
ExtensibleTokenException
- if the subarray encodes just a carriage
return ('\r')TOK_DATA_CHARS
,
TOK_DATA_NEWLINE
,
TOK_MAGIC_ENTITY_REF
,
TOK_ENTITY_REF
,
TOK_PARAM_ENTITY_REF
,
TOK_CHAR_REF
,
TOK_CHAR_PAIR_REF
,
Token
,
EmptyTokenException
,
PartialTokenException
,
InvalidTokenException
,
ExtensibleTokenException
public static int skipIgnoreSect(char[] buf, int off, int end) throws PartialTokenException, InvalidTokenException
<![ IGNORE [
.
]]>
PartialTokenException
- if the subarray does not contain the
complete ignored conditional section
InvalidTokenException
- if the ignored conditional section
contains illegal characterspublic static java.lang.String getPublicId(char[] buf, int off, int end) throws InvalidTokenException
InvalidTokenException
- if it is not a legal public identifierpublic static boolean matchesXMLString(char[] buf, int off, int end, java.lang.String str)
public static int skipS(char[] buf, int off, int end)
end
if there is the subarray is all whitespace
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |