I hate XML, but now less then before thanks to SimpleXMLParser

I admit it: I hate xml square brackets dancing orgy, even in Java.

Anyway, all is xml-ized around me. So in 2006 I developed a small XML parser based on SAX. It was a shitty dirty code for JDK 1.4 which let you parse xml stuff defining a method and forgetting about selectors, XPath,  XWing, Tie fighters and so on…

I called it UltraSmartParser, a shitty name too.
Now I have revivied it from the tomb of darkness, and dressed with fancy super powers. It is on github: https://github.com/daitangio/SimpleXMLParser

To give you a tast of its power, let look at this code:

public class WordPressExportReader extends SimpleXMLParser {
	public static void main(String[] args) throws SAXException, IOException {
		BasicConfigurator.configure();

		XMLReader sax2Parser = XMLReaderFactory.createXMLReader();
		SimpleXMLParser parser = new WordPressExportReader();
		parser.getLog().setLevel(Level.INFO);
		sax2Parser.setContentHandler(parser);
		File f = new File("c:/jjsoft/gioorgicom.wordpress.2012-08-07.xml");
		FileInputStream is = new FileInputStream(f);
		InputSource s = new InputSource(is);
		sax2Parser.parse(s);
		parser.getLog().info("DONE");
	}

	private String currentTitle,pid;

	public void do_RSS_CHANNEL_ITEM_TITLE(String title) {
		this.currentTitle = title;
	}

	// Catch <wp:post_id>1551</wp:post_id>
	public void do_RSS_CHANNEL_ITEM_POST_ID(String idz){
		pid=idz;
	}

	// Catch stuff like
	// <category domain="category"
	// nicename="software"><![CDATA[Software]]></category>
	// <category domain="series" nicename="version-control"><![CDATA[Version
	// Control]]></category>
	public void do_RSS_CHANNEL_ITEM_CATEGORY(Map catAttribs, String cdata) {
		if(catAttribs.get("domain").equals("series")){
			getLog().info(" POST:"+ pid+":"+ currentTitle+":" + cdata+ ":"+catAttribs.get("nicename"));
		}
	}

}

The orginal code targeted JDK 1.4, so it is a bit “vintage”.
The revamped revision you found on github spots:

  1. Support for attributes, missed in the original version
  2. Optimized algorithm
  3. Stored on Github, for sharing with you
  4. Better logging & class/method naming

The first version is called “karmak” because will be your path to enlightment…

 

Leave a Reply