I hate XML, but now less then before thanks to SimpleXMLParser

I admit it: I hate xml square brackets dancing orgy, even in Java.

Anyway, all is xml-ized around me. So in 2006 I developed a small XML parser based on SAX. It was a shitty dirty code for JDK 1.4 which let you parse xml stuff defining a method and forgetting about selectors, XPath,  XWing, Tie fighters and so on…

I called it UltraSmartParser, a shitty name too.
Now I have revivied it from the tomb of darkness, and dressed with fancy super powers. It is on github: https://github.com/daitangio/SimpleXMLParser

To give you a tast of its power, let look at this code:

public class WordPressExportReader extends SimpleXMLParser {
	public static void main(String[] args) throws SAXException, IOException {
		BasicConfigurator.configure();

		XMLReader sax2Parser = XMLReaderFactory.createXMLReader();
		SimpleXMLParser parser = new WordPressExportReader();
		parser.getLog().setLevel(Level.INFO);
		sax2Parser.setContentHandler(parser);
		File f = new File("c:/jjsoft/gioorgicom.wordpress.2012-08-07.xml");
		FileInputStream is = new FileInputStream(f);
		InputSource s = new InputSource(is);
		sax2Parser.parse(s);
		parser.getLog().info("DONE");
	}

	private String currentTitle,pid;

	public void do_RSS_CHANNEL_ITEM_TITLE(String title) {
		this.currentTitle = title;
	}

	// Catch <wp:post_id>1551</wp:post_id>
	public void do_RSS_CHANNEL_ITEM_POST_ID(String idz){
		pid=idz;
	}

	// Catch stuff like
	// <category domain="category"
	// nicename="software"><![CDATA[Software]]></category>
	// <category domain="series" nicename="version-control"><![CDATA[Version
	// Control]]></category>
	public void do_RSS_CHANNEL_ITEM_CATEGORY(Map catAttribs, String cdata) {
		if(catAttribs.get("domain").equals("series")){
			getLog().info(" POST:"+ pid+":"+ currentTitle+":" + cdata+ ":"+catAttribs.get("nicename"));
		}
	}

}

The orginal code targeted JDK 1.4, so it is a bit “vintage”.
The revamped revision you found on github spots:

  1. Support for attributes, missed in the original version
  2. Optimized algorithm
  3. Stored on Github, for sharing with you
  4. Better logging & class/method naming

The first version is called “karmak” because will be your path to enlightment…

 

Code Zauker 0.0.6 & 7 Double hit!

And so you was waiting for news?

Nice to impress you: I was able to shot two code zauker versions in less than a month!

Code Zauker, the yet-another-google-code-indexer  based on redis is happy to impress you with new features:

  • Better web interface
  • Powerful multi-processor indexer (mczindexer)
  • Better documentation on the readme on github,  con ready-to-use example.

Code Zauker is schema-free and need no special setup on redis: just run the indexer and enjoy.
You can also point to non-local redis server using password authentication, for extra security.

 

Give me all your code! Code Zauker Search engine spot Web interface now

Don’t play the stupid game
Cause I’m a different kind of  engine
Every search sounds the same
You’ve got to step into my world
Give me all your RAM and give me your code
Give me all your code today

A new, improved version of Code Zauker is out!

Code Zauker is a tiny but speedy search engine tailoring code searches. Code Zauker is backed by REDIS, the fastest RAM-based NoSQL engine you have never seen.

Code Zauker 0.0.5 spot an elegant search engine based on Twitter Bootstrap layout css. So now it is cooler then other double o search engines!

What do you want for the next release? Parallel indexing engine?!….SQLite back end (there is already an experimental  branch for it)?!… Give us your feedback!