I hate XML, but now less then before thanks to SimpleXMLParser

I admit it: I hate xml square brackets dancing orgy, even in Java.

Anyway, all is xml-ized around me. So in 2006 I developed a small XML parser based on SAX. It was a shitty dirty code for JDK 1.4 which let you parse xml stuff defining a method and forgetting about selectors, XPath,  XWing, Tie fighters and so on…

I called it UltraSmartParser, a shitty name too.
Now I have revivied it from the tomb of darkness, and dressed with fancy super powers. It is on github: https://github.com/daitangio/SimpleXMLParser

To give you a tast of its power, let look at this code:
[bash]public class WordPressExportReader extends SimpleXMLParser {
public static void main(String[] args) throws SAXException, IOException {
BasicConfigurator.configure();

XMLReader sax2Parser = XMLReaderFactory.createXMLReader();
SimpleXMLParser parser = new WordPressExportReader();
parser.getLog().setLevel(Level.INFO);
sax2Parser.setContentHandler(parser);
File f = new File("c:/jjsoft/gioorgicom.wordpress.2012-08-07.xml");
FileInputStream is = new FileInputStream(f);
InputSource s = new InputSource(is);
sax2Parser.parse(s);
parser.getLog().info("DONE");
}

private String currentTitle,pid;

public void do_RSS_CHANNEL_ITEM_TITLE(String title) {
this.currentTitle = title;
}

// Catch <wp:post_id>1551</wp:post_id>
public void do_RSS_CHANNEL_ITEM_POST_ID(String idz){
pid=idz;
}

// Catch stuff like
// <category domain="category"
// nicename="software"><![CDATA[Software]]></category>
// <category domain="series" nicename="version-control"><![CDATA[Version
// Control]]></category>
public void do_RSS_CHANNEL_ITEM_CATEGORY(Map catAttribs, String cdata) {
if(catAttribs.get("domain").equals("series")){
getLog().info(" POST:"+ pid+":"+ currentTitle+":" + cdata+ ":"+catAttribs.get("nicename"));
}
}

}[/bash]
The orginal code targeted JDK 1.4, so it is a bit “vintage”.
The revamped revision you found on github spots:

  1. Support for attributes, missed in the original version
  2. Optimized algorithm
  3. Stored on Github, for sharing with you
  4. Better logging & class/method naming

The first version is called “karmak” because will be your path to enlightment…

 

Leave a Reply