Over the last several weeks I‘ve been working on developing XML schemas for a client to support information exchanges between several different organizations, so it was important to make the schemas very explicit and “tight“ so that each party can validate XML before or after sending it. The XML documents could be used in conjunction with Web services or as part of “old-fashion“ file based exchange. In short, this was pretty typical system integration task.

The client had already decided to standardize on XML Schema, so using Relax NG or Schematron was not an option.

XML Schema provides a lot of different capabilities but based on my experience I think that it could benefit from some improvements. Here are my random thoughts on this. Now, I don‘t claim to be the ultimate XML Schema expert, so take it for what it‘s worth.

  • Schema‘s verbosity and unwieldy syntax makes it a poor candidate for communicating and publishing XML structure and rules to the wide audience of technical people from different organizations that may or may not know XML Schema. For example, “minOccur=0“ means “optional field” which is probably not very intuitive to anyone unfamiliar with the Schema specification. Even after formatting the schema for publishing (e.g., by using xsddoc) schemas are still hard to understand. Of course, one can use the annotations and try to explain each type in plain English, but then the documentation always tends to get out of synch.
    The obvious counter-argument here is that XML Schema is designed to be the data modeling/validation tool and as such it is not suitable for capturing business requirements but I just think that it would be nice if it could really be used for both, essentially becoming the “system of records“ for integrating different systems and organization.
  • Error messages thrown by XML parsers are far from being the most intuitive (this obviously depends on the parser and I have not done any comparative analysis). For example, missing required element results in “Element ‘element name‘ is not valid for content model“ where ‘element name‘ is the name of the element following the missing required missing element. Why can‘t the parser simply say “Required element is missing“? Again, this problem is exacerbated when you‘re dealing with people with only cursory XML Schema knowledge. I‘m not aware of a standard way to customize error messages, so in my case developers will have to do error translation in the code.
  • XML Schema users are forced to use regular expressions for defining any more or less complex template for simple types (phone number, SSN , etc). This poses a problem in an environment where you can‘t expect all users to be familiar with regexp syntax. When you get a message “Value does not match regular expression facet ‘\+?[0–9\-\(\)\s]{1,25}“, it could very easily befuddle an uninitiated. I wish there was a simplified templating mechanism, may be something similar to java.text.MessageFormat “##.##“.
  • Reuse capabilities in XML Schema are not perfect. “extend“ only allows to append an element to the end of the sequence. “Restrict“ requires repeating the content model of the parent type. This creates very brittle schemas and violates DRY principle. There is no way to parameterize an XML type. Let‘s say there is “name“ type with “first“ and “last“ elements. When a new person is added, I want “last“ element to be mandatory. In “update“ situation all fields could be optional. I wish I could make “minOccur“ a parameter here.
  • XML Schema may seem very OO-like at the first glance, but in fact it is missing some important OO-like capabilities. For instance, there is no element-level polymorphism. In the example above, I wanted to change the “behavior“ of “last” (some aspect of this type definition) in a subtype and I can‘t do that. Inheritance by restriction for complex types (I don‘t have a problem with using it for simple types) is IMO counter-intuitive. So now I can have a child which does not have all properties of its parent, and so there is no way to enforce optional elements for all children.
  • Element and type scoping could‘ve been more flexible. All declarations are either parent element-scoped (local) or global. This does not allow me to define a reusable type or a group scoped to certain parent element or to a file (any more or less complex schema would have to broken down into multiple files for manageability sake). So say I have a name type for person‘s name (first, middle, last) and business‘ name type with a different structure. If I want use to person‘s name type for different elements within Person type, I will have to define as global and name it PersonNameType, essentially embedding parent‘s name into the child‘s name. I wish I could simply define NameType and specify what parent type or element it is scoped to.
  • XML Schema is a declarative language and so it lacks programming constructs, which is fine. But there is still a need for Schematron-like facility or the scripting language for expressing complex rules (such as cross-field validation). Schematron works fine when embedded inside annotations, but it requires a separate validation step and Schematron XSLT . So it would be great if this capability was supported by the Schema standard and natively understood by XML parsers. This would make schemas truly self-contained.

So my wish list is actually quite simple:

  • Make XML schemas easier to understand for non-technical users or people without schema knowledge perhaps via some intelligent translation mechanism.
  • Make the Schema more powerful by allowing programming constructs, variables and more flexible scoping.

Tags: ,