Dear Community,
many of you will certainly have to deal with XML parsing here and there. Basically there are three ways of XML parsing in Java. The DOM approach, a SAX parser and byte code manipulation approaches like JIBX and such.
I definitely don’t like the JIBX way. So it’s out for me. JDOM is nice for some smaller XMLs, but keeps the nature of a quick’n’dirty solution for me, since it is extremely memory consuming and pushes everything to the memory and puts it into lists, etc. even the parts, that I don’t need. And then accessing a child element is not even done in O(1), but O(n), since not only the names, but all the namespaces have to be compared. XML namespaces are the most useless thing in the XML world anyway. Though there will be opposing opinions.
I like the SAX parser approach. But there are two disadvantages.
- Initializing the parser takes a lot of lines of code.
- In the startElement() methods, etc. I have to know, where I am in the XML hierarchy to decide, what to do with a certain element.
I have written some code, that drastically simplifies the whole process. Have a look at the code here.
How does it work? Let’s take a look.
Disadvantage 1 is addressed by provinding a SimpleXMLParser class, that selects a certain SAXParser (part of the JRE) and initializes it. Of course this restricts you to a single parser implementation. But hey, why do we need more, if one works just fine?
Now for disadvantage 2.
Let’s say, your XML looks like this (omitting the header).
###################################
<cats>
<cat name="Muschi" />
<cat name="Pussy" />
</cats>
</pats>
###################################
So to parse only the dogs out of this data, you have to write an XML handler, that checks in the startElement() method, if the current Element is a “dog” element AND it is parented by a “dogs” element AND this is in a “pats” element AND this is in a “root” element, which IS actually a root element. Ok, these checks have to be done in any case. But we can reduce the number and costs of these checks and we can reduce the necessary knowledge of the parser, that only wants to get the dogs from the XML.
So you would implement a SimpleXMLHandlerDelegate. The onElementStarted() method would look like this:
###################################
@Override
protected void onElementStarted( XMLPath path, String name, Object object, Attributes attributes ) throws SAXException
{
// Notice, that we’re querying for level 0 here!
if ( ( path.getLevel() == 0 ) && name.equals( “dog” ) ) // This could even be skipped, if you have designed the XML yourself and know for sure, that only dog elements are in here.
{
System.out.println( “Found a dog called “” + attributes.getValue( “name” ) + “”.” );
}
}
###################################
This is everything, the dogs parser needs to do and know.
Now we need a parent handler, that navigates to the dogs and then delegates to our dogs handler. This would be a SimpleXMLHandler implementation with the onElementStarted() method as follows.
###################################
@Override
protected void onElementStarted( XMLPath path, String name, Object object, Attributes attributes ) throws SAXException
{
if ( path.isAt( false, “root”, “pats” ) && name.equals( “dogs” ) )
{
delegate( dogsHandler );
}
}
###################################
Isn’t this simple? We could also tune the code a little bit to ged rid of some String compares. But this needs a little more code, but it’s worth it. All you have to do is overriding the getPathObject() method in our root handler as follows.
###################################
private static enum RootElements
{
root;
}
private static enum Level1Elements
{
pats;
}
private static enum Level2Elements
{
dogs,
cats,
;
}
@Override
protected Object getPathObject( XMLPath path, String element )
{
if ( path.getLevel() == 0 )
{
try
{
return ( RootElements.valueOf( element ) );
}
catch ( Throwable t )
{
return ( new Object() );
}
}
else if ( path.isAtByObjects( false, RootElements.root ) )
{
try
{
return ( Level1Elements.valueOf( element ) );
}
catch ( Throwable t )
{
return ( new Object() );
}
}
else if ( path.isAtByObjects( false, RootElements.root, Level1Elements.pats ) )
{
try
{
return ( Level2Elements.valueOf( element ) );
}
catch ( Throwable t )
{
return ( new Object() );
}
}
}
@Override
protected void onElementStarted( XMLPath path, String name, Object object, Attributes attributes ) throws SAXException
{
if ( object == Level2Elements.dogs ) // Simplified and cheaper test
{
delegate( dogsHandler );
}
}
###################################
There’s also a SimpleXMLWriter, that encapsulates an inverse SAX parser and lets you add elements and data in a very easy way, by simply calling the writeElement() method.
On a side note there’s also a very powerful ini file parser and writer in JAGaToo. If you’re interested, have a look here.
What do you think? Please add comments and critics.
Marvin

