I started writing an XML parser today, and it’s getting along pretty well. This one is part of college assignment, and is purely educational. Just did some work on the tokenizer and got some basic tokens.
<?xml version="1.0" ?>
<test>
</test>
Will generate these tokens.
XmlToken{type=TAG_BEGIN, text='<', line=2, start=1, end=2}
XmlToken{type=NAME, text='test', line=2, start=2, end=6}
XmlToken{type=TAG_END, text='>', line=2, start=6, end=7}
XmlToken{type=TAG_CLOSE, text='</', line=3, start=1, end=3}
XmlToken{type=NAME, text='test', line=3, start=3, end=7}
XmlToken{type=TAG_END, text='>', line=3, start=7, end=8}
This is pretty awesome, now time to add support for attributes. I’m omitting the prologue for simplicity and this is not a real parser.