I hate XML, but now less then before thanks to SimpleXMLParser
I admit it: I hate xml square brackets dancing orgy, even in Java.
Anyway, all is xml-ized around me. So in 2006 I developed a small XML parser based on SAX. It was a shitty dirty code for JDK 1.4 which let you parse xml stuff defining a method and forgetting about selectors, XPath, XWing, Tie fighters and so on...
I called it UltraSmartParser, a shitty name too. Now I have revivied it from the tomb of darkness, and dressed with fancy super powers. It is on github: https://github.com/daitangio/SimpleXMLParser
To give you a tast of its power, let look at this code: [bash]public class WordpressExportReader extends SimpleXMLParser { public static void main(String[] args) throws SAXException, IOException { BasicConfigurator.configure();
XMLReader sax2Parser = XMLReaderFactory.createXMLReader(); SimpleXMLParser parser = new WordpressExportReader(); parser.getLog().setLevel(Level.INFO); sax2Parser.setContentHandler(parser); File f = new File("c:/jjsoft/gioorgicom.wordpress.2012-08-07.xml"); FileInputStream is = new FileInputStream(f); InputSource s = new InputSource(is); sax2Parser.parse(s); parser.getLog().info("DONE"); }
private String currentTitle,pid;
public void do_RSS_CHANNEL_ITEM_TITLE(String title) { this.currentTitle = title; }
// Catch <wp:post_id>1551</wp:post_id> public void do_RSS_CHANNEL_ITEM_POST_ID(String idz){ pid=idz; }
// Catch stuff like // <category domain="category" // nicename="software"><![CDATA[Software]]></category> // <category domain="series" nicename="version-control"><![CDATA[Version // Control]]></category> public void do_RSS_CHANNEL_ITEM_CATEGORY(Map catAttribs, String cdata) { if(catAttribs.get("domain").equals("series")){ getLog().info(" POST:"+ pid+":"+ currentTitle+":" + cdata+ ":"+catAttribs.get("nicename")); } }
}[/bash] The orginal code targeted JDK 1.4, so it is a bit "vintage". The revamped revision you found on github spots:
- Support for attributes, missed in the original version
- Optimized algorithm
- Stored on Github, for sharing with you
- Better logging & class/method naming