Date: Sun, 19 May 2013 04:31:26 +0200
- Darwinian Web
There has to be a better way to do XML programming
- Two weeks ago I decided to do some programming for analysis of link patterns among bloggers. I had just given up on Ruby out of frustration over its poor XML lbraries, so I decided to try out PHP. I haven't used PHP since the late Nineties, but it is a simple enough language, so browsing a few books showed me what I needed to know. I was able to write the code to parse out the links from Tech Memorandum and then autodiscover the RSS feeds on these pages without much trouble at all. I'm not a great coder, but I can pick things up fast, and can generally force my way through most programming issues. Then I ran into the XML libraries in PHP and came to a dead halt again. I need to read through the RSS feeds of each blog I find on Tech Memeorandum to find the links to other blogs, and that means parsing the XML of these feeds. I've been beating on this problem on and off for the past week and a half, and am about to give up again. Giving up on a programming problem is not something I do lightly. The whole point of being a programmer is never letting the machine beat you. I also have enough confidence to think that if I'm having so many problems lots of other people are dealing with the same thing.
What I've decided to do in response is work with one of my favorite programmers from my Andover.net days to try and build a better language solution for XML processing, with an emphasis on RSS and OPML. A few weeks ago John Casey emailed me after I posted my frustration with Ruby, and asked why I don't just write my own language for this type of work. We've been talking about this ever since, and now I'm ready to go ahead. I'm not capable of writing my own XML parser, at least not one that isn't a horrible hack, but I do know a lot about language design, especially about making programming languages easy to use. John, however, is a great coder, and if he thinks he can write a clean, fast parser, I believe him.
The idea at first will be to create a library of functions that are real smart about RSS and OPML. We're not sure what language this will be working with, but since the library will be written in C, it should be possible to add it to all of the standard Web languages, like Perl, Python, PHP, etc. I'm interested in having the library handle all the standard tasks you would need when working with RSS and OPML, so it should be possible to read multiple feeds and combine them in interesting ways in just a few lines of code. Once this library is built, we can see about possibly extending it into more of a mini-language.
The working title for this library/language is OPML Script, but that name may change as its functionality expands to more general XML tasks. This will be released under an Open Source license of some type, so it will be available for no charge. John and I will share the ownership of the copyright, although there doesn't seem to be any likelihood of ever making money from it. I've said in the past that I didn't want to get directly involved with any startups for at least a year, but this is something that I need for my own work, so I don't have any choice. If I want something that will let me program in an easy manner, I'm going to have to help build it. We don't have any delivery schedule yet, but we hope to have something we can demonstrate by OPML Camp on May 20th.