Over the last several weeks I‘ve been working on developing XML schemas for a
client to support information exchanges between several different organizations,
so it was important to make the schemas very explicit and “tight“ so that each
party can validate XML before or after sending it. The XML documents could be
used in conjunction with Web services or as part of “old-fashion“ file based
exchange. In short, this was pretty typical system integration task.
The client had already decided to standardize on XML Schema, so using Relax NG
or Schematron was not an option.
XML Schema provides a lot of different capabilities but based on my experience I
think that it could benefit from some improvements. Here are my random thoughts
on this. Now, I don‘t claim to be the ultimate XML Schema expert, so take it for
what it‘s worth.
-
Schema‘s verbosity and unwieldy syntax makes it a poor candidate for
communicating and publishing XML structure and rules to the wide audience of
technical people from different organizations that may or may not know XML
Schema. For example, “minOccur=0“ means “optional field†which is probably not
very intuitive to anyone unfamiliar with the Schema specification. Even after
formatting the schema for publishing (e.g., by using xsddoc)
schemas are still hard to understand. Of course, one can use the annotations and
try to explain each type in plain English, but then the documentation always
tends to get out of synch.The obvious counter-argument here is that XML Schema is designed to be the data
modeling/validation tool and as such it is not suitable for capturing business
requirements but I just think that it would be nice if it could really be used
for both, essentially becoming the “system of records“ for integrating different
systems and organization. -
Error messages thrown by XML parsers are far from being the most intuitive (this
obviously depends on the parser and I have not done any comparative analysis).
For example, missing required element results in “Element ‘element name‘ is not
valid for content model“ where ‘element name‘ is the name of the element
following the missing required missing element. Why can‘t the parser simply say
“Required element is missing“? Again, this problem is exacerbated when you‘re
dealing with people with only cursory XML Schema knowledge. I‘m not aware of a
standard way to customize error messages, so in my case developers will have to
do error translation in the code. -
XML Schema users are forced to use regular expressions for defining any more or
less complex template for simple types (phone number, SSN , etc). This poses a
problem in an environment where you can‘t expect all users to be familiar with
regexp syntax. When you get a message “Value does not match regular expression
facet ‘\+?[0–9\-\(\)\s]{1,25}“, it could very easily befuddle an uninitiated. I
wish there was a simplified templating mechanism, may be something similar to
java.text.MessageFormat “##.##“. -
Reuse capabilities in XML Schema are not perfect. “extend“ only allows to append
an element to the end of the sequence. “Restrict“ requires repeating the content
model of the parent type. This creates very brittle schemas and violates DRY
principle. There is no way to parameterize an XML type. Let‘s say there is “name“
type with “first“ and “last“ elements. When a new person is added, I want “last“
element to be mandatory. In “update“ situation all fields could be optional. I
wish I could make “minOccur“ a parameter here. -
XML Schema may seem very OO-like at the first glance, but in fact it is missing
some important OO-like capabilities. For instance, there is no element-level
polymorphism. In the example above, I wanted to change the “behavior“ of “lastâ€
(some aspect of this type definition) in a subtype and I can‘t do that.
Inheritance by restriction for complex types (I don‘t have a problem with using
it for simple types) is IMO counter-intuitive. So now I can have a child which
does not have all properties of its parent, and so there is no way to enforce
optional elements for all children. -
Element and type scoping could‘ve been more flexible. All declarations are
either parent element-scoped (local) or global. This does not allow me to define
a reusable type or a group scoped to certain parent element or to a file (any
more or less complex schema would have to broken down into multiple files for
manageability sake). So say I have a name type for person‘s name (first, middle,
last) and business‘ name type with a different structure. If I want use to
person‘s name type for different elements within Person type, I will have to
define as global and name it PersonNameType, essentially embedding parent‘s name
into the child‘s name. I wish I could simply define NameType and specify what
parent type or element it is scoped to. -
XML Schema is a declarative language and so it lacks programming constructs,
which is fine. But there is still a need for Schematron-like facility or the
scripting language for expressing complex rules (such as cross-field validation).
Schematron works fine when embedded inside annotations, but it requires a
separate validation step and Schematron XSLT . So it would be great if this
capability was supported by the Schema standard and natively understood by XML
parsers. This would make schemas truly self-contained.
So my wish list is actually quite simple:
-
Make XML schemas easier to understand for non-technical users or people without
schema knowledge perhaps via some intelligent translation mechanism. -
Make the Schema more powerful by allowing programming constructs, variables and
more flexible scoping.