*** MOVED ***

NOTE: I have merged the contents of this blog with my web-site. I will not be updating this blog any more.

2006-07-21

Validating XML Documents

I was writing a couple of XML documents conforming to certain XML Schemata the other day. I was looking for a simple command-line tool that would take an XML document and an XML Schema and check if the document was well-formed and really conformed to the given XML Schema (I did not want to use web-based validation). I could have written a tool like this myself but I was feeling rather lazy and just wanted to quickly download a tool from somewhere to do this.

It turned out to be a surprisingly frustrating task and eventually took more time than what I would have taken to write it myself. Perhaps my Google queries were not good enough, perhaps people are just happy with their respective IDEs, perhaps everyone just writes their own little tool around the usual XML parsing libraries, perhaps people are not so anal about writing XML documents that strictly conform to the applicable XML Schema, etc. - I don't know why, but it took me a while to locate such a tool.

I first used Sun's Multi-Schema XML Validator (MSV) and it worked for me for a while but then tripped with a StackOverflowError on a particular XML Schema that I had to use so I had to abandon it. I next tried XMLStarlet but the error messages it spewed were a bit confusing and it did not fully support XML Schemata so I abandoned it as well. I am now using a little tool called "DOMCount" that is included with Apache Xerces and that ostensibly parses a document and prints out the number of DOM elements it has encountered but that also works fairly well as a document validator. The error messages shown by this tool, while better than those from XMLStarlet, can still confuse some times but I can live with it for the moment.

While creating these documents from the appropriate XML Schemata, I found xs3p [link currently seems broken] to be really useful. This is a stylesheet that takes an XML Schema and generates a pretty HTML document from it that you can use to understand the original XML Schema and easily navigate through its structure. I used Apache Xalan to generate the HTML documents.

3 comments:

  1. This is not Java but xmllint, from libxml2 tools, always did the job for me.

    ReplyDelete
  2. Thanks. I don't know how I missed it.

    I tried it out and it seems to do the job as well, but like XMLStarlet it attaches the whole URL of a namespace in an error message about (say) missing elements which makes it a bit difficult to decipher it.

    ReplyDelete
  3. A colleague suggested oXygen once. It appeared to be pretty nifty on a first look.

    ReplyDelete

Note: Only a member of this blog may post a comment.