Hey have you guys come across this problem? It’s where the default java XML parser grabs the xml file’s DTD from W3.org every single time it runs. I’ve spent the last 3 days trying to figure out why my app takes so long to load and it’s because of this problem. Unbeknownst to me my app was actually getting the DTD from w3c’s site each time, which caused a delay of about 30 seconds… Geez that’s frustrating!
And smart people are having this problem too:
http://weblogs.java.net/blog/cayhorstmann/archive/2011/12/12/sordid-tale-xml-catalogs
So if i leave out this line from the XML file:
<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">
then the javax.xml.parsers.SAXParser parses the file straight away.
But if I do that then the proper DTD is not used, so the proper solution is to setup the SAX parser like this (http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic/#comment-376):
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
That line is not documented or mentioned anywhere on oracle.com except in the 376th comment on that w3.org blog post… gah!
Apparently W3 serve up 100 million dtd downloads/day, and the w3 guy says in the comments that 1/4 of these are from java apps:
http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic/#comment-359
I couldn’t believe how silly this problem is so I felt the need to air my frustration