What’s Missing from XML Schema

Over the last several weeks Iâ€˜ve been working on developing XML schemas for a
client to support information exchanges between several different organizations,
so it was important to make the schemas very explicit and â€œtightâ€œ so that each
party can validate XML before or after sending it. The XML documents could be
used in conjunction with Web services or as part of â€œold-fashionâ€œ file based
exchange. In short, this was pretty typical system integration task.

The client had already decided to standardize on XML Schema, so using Relax NG
or Schematron was not an option.

XML Schema provides a lot of different capabilities but based on my experience I
think that it could benefit from some improvements. Here are my random thoughts
on this. Now, I donâ€˜t claim to be the ultimate XML Schema expert, so take it for
what itâ€˜s worth.

Schemaâ€˜s verbosity and unwieldy syntax makes it a poor candidate for
communicating and publishing XML structure and rules to the wide audience of
technical people from different organizations that may or may not know XML
Schema. For example, â€œminOccur=0â€œ means â€œoptional fieldâ€ which is probably not
very intuitive to anyone unfamiliar with the Schema specification. Even after
formatting the schema for publishing (e.g., by using xsddoc)
schemas are still hard to understand. Of course, one can use the annotations and
try to explain each type in plain English, but then the documentation always
tends to get out of synch.

The obvious counter-argument here is that XML Schema is designed to be the data
modeling/validation tool and as such it is not suitable for capturing business
requirements but I just think that it would be nice if it could really be used
for both, essentially becoming the â€œsystem of recordsâ€œ for integrating different
systems and organization.
Error messages thrown by XML parsers are far from being the most intuitive (this
obviously depends on the parser and I have not done any comparative analysis).
For example, missing required element results in â€œElement â€˜element nameâ€˜ is not
valid for content modelâ€œ where â€˜element nameâ€˜ is the name of the element
following the missing required missing element. Why canâ€˜t the parser simply say
â€œRequired element is missingâ€œ? Again, this problem is exacerbated when youâ€˜re
dealing with people with only cursory XML Schema knowledge. Iâ€˜m not aware of a
standard way to customize error messages, so in my case developers will have to
do error translation in the code.
XML Schema users are forced to use regular expressions for defining any more or
less complex template for simple types (phone number, SSN , etc). This poses a
problem in an environment where you canâ€˜t expect all users to be familiar with
regexp syntax. When you get a message â€œValue does not match regular expression
facet â€˜\+?[0â€“9\-\(\)\s]{1,25}â€œ, it could very easily befuddle an uninitiated. I
wish there was a simplified templating mechanism, may be something similar to
java.text.MessageFormat â€œ##.##â€œ.
Reuse capabilities in XML Schema are not perfect. â€œextendâ€œ only allows to append
an element to the end of the sequence. â€œRestrictâ€œ requires repeating the content
model of the parent type. This creates very brittle schemas and violates DRY
principle. There is no way to parameterize an XML type. Letâ€˜s say there is â€œnameâ€œ
type with â€œfirstâ€œ and â€œlastâ€œ elements. When a new person is added, I want â€œlastâ€œ
element to be mandatory. In â€œupdateâ€œ situation all fields could be optional. I
wish I could make â€œminOccurâ€œ a parameter here.
XML Schema may seem very OO-like at the first glance, but in fact it is missing
some important OO-like capabilities. For instance, there is no element-level
polymorphism. In the example above, I wanted to change the â€œbehaviorâ€œ of â€œlastâ€
(some aspect of this type definition) in a subtype and I canâ€˜t do that.
Inheritance by restriction for complex types (I donâ€˜t have a problem with using
it for simple types) is IMO counter-intuitive. So now I can have a child which
does not have all properties of its parent, and so there is no way to enforce
optional elements for all children.
Element and type scoping couldâ€˜ve been more flexible. All declarations are
either parent element-scoped (local) or global. This does not allow me to define
a reusable type or a group scoped to certain parent element or to a file (any
more or less complex schema would have to broken down into multiple files for
manageability sake). So say I have a name type for personâ€˜s name (first, middle,
last) and businessâ€˜ name type with a different structure. If I want use to
personâ€˜s name type for different elements within Person type, I will have to
define as global and name it PersonNameType, essentially embedding parentâ€˜s name
into the childâ€˜s name. I wish I could simply define NameType and specify what
parent type or element it is scoped to.
XML Schema is a declarative language and so it lacks programming constructs,
which is fine. But there is still a need for Schematron-like facility or the
scripting language for expressing complex rules (such as cross-field validation).
Schematron works fine when embedded inside annotations, but it requires a
separate validation step and Schematron XSLT . So it would be great if this
capability was supported by the Schema standard and natively understood by XML
parsers. This would make schemas truly self-contained.

So my wish list is actually quite simple:

Make XML schemas easier to understand for non-technical users or people without
schema knowledge perhaps via some intelligent translation mechanism.
Make the Schema more powerful by allowing programming constructs, variables and
more flexible scoping.

Tags: XML, Schema

Posted on: October 16, 2005 | Posted in: XML