Category Archives: XML

Easy-to-Use Alternative to DataPower Deployment Policies

DataPower deployment policies allow for “tweaking” device and domain configuration for different environments. Let’s face it, there are always differences between environments. Sometimes, these differences are small, such as different back-end hosts, and in other cases these differences could be significant, such as different security policies.

Deployment policies do a decent job of dealing with differences between environments. Deployment policies support changing, adding and deleting configuration, so it is possible to implement fairly complex transformations.

However, dealing with deployment policies could be confusing. Deployment policy’s match rules utilize xpath, but the syntax of the rules is not pure xpath. Consider this simple deployment policy match rule:


*/*/protocol/http?Name=personSreviceHTTP&Property=LocalPort

The part before “?” looks like xpath. But what schema is this xpath based on? There is no “protocol” element in the DataPower XML management schema. The part after “?” that uses name-value parameters is even more odd. Why use this instead of proper xpath? After all, DataPower has an XML-processing engine with full xpath support, so it would certainly be more logical to rely on XML standards.

The bottom line is that while deployment policies are useful, they have limitations. They have to be developed using Deployment Policy builder in WebGUI. They can only be applied to configuration elements supported in WebGUI. For example, creating a deployment policy that updates RemoteEndpointPort of a Web Services Proxy proves to be a non-trivial task.

This is why we added support for xpath-based transformations to our DPBuddy DataPower management tool. Instead of dealing with the obscure syntax of DataPower deployment policies, developers can simply look at the configuration export file and specify an xpath expression against this file. It is also possible to specify filters based on the types and names of DataPower objects.


Read the rest of this post »

WebSphere 7 Supports Properties-Based Configuration

IBM WebSphere 7 (currently in beta) comes a property-file based configuration tool that provides a “human-consumable” interface to the currently XML-based configuration repository of the application server. This is another proof that XML is simply not the right mechanism for managing configuration of complex software products.

From the release notes:


Properties (name/value pairs) files are more consumable for human administrators than a mix of XML and other formats spread across multiple configuration directories.

Kudos to IBM for recognizing that.

It is still not clear though how hierarchical relationships between configuration objects will be supported.

Back in WAS 6 world, I’ve been using a simple jython script that converts python named parameters into wsadmin format. This is an example of a resource described in this format:


 WASConfig.DataSource(parent="testJDBCProvider", name="testDS", jndiName="jdbc/testDS",
                              description="Test DataSource", propertySet=dict(
                              resourceProperties=[
                                  dict(name="dbName", value="testDB", type="java.lang.String" ),
                                  dict(name="connectionAttribute",value="", type="java.lang.String")
                               ]))
    


I think that a slightly more streamlined python-based format will be superior to properties.

Ant Scripts without XML

It’s pretty easy to create an Ant file for a simple project. A simple Ant script typically contains ubiquitous “init”, “compile”, “test”, “war” (or “jar), “build” targets all wired together. It’s easy to change and easy to understand and the script’s flow has a declarative, rule-based feel to it. The problem is, projects and their build files rarely stay simple for long. Soon we need to add “validate.xml” target, junit reports, deployment to your application server and so on. Then we begin supporting multiple environments; we discover that our local desktop configuration is different from how integration environment is configured so we add numerous property files, “ftp” tasks and multiple “copy” targets for various application files. Before we know it, the build script becomes a convoluted mess of XML tags and there is nothing declarative about it anymore; it’s morphed into a full-fledged, very procedural program. Perhaps we even had to resort to using ant-contrib “if” and “for” targets to implement procedural logic in Ant. And nothing is uglier than an “if” with complex conditions expressed in XML.

A better approach would be to implement “procedural” portion of the build script in Java or any of the scripting languages that Ant supports. The problem is, configuring and invoking Ant tasks from Java or a scripting language leads to verbose code. For example:


    execTask = project.createTask("exec")
    execTask.setOutputproperty(outputPropertyName)
    execTask.setErrorProperty(errorPropertyName)
    execTask.setResultProperty(resultPropertyName)
    execTask.setExecutable(execName)
    arg=execTask.createArg()
    arg.setLine(paramString)

Doing the same thing in XML is shorter and cleaner:


<exec executable="${execName}" outputPropert="p1" 
    errorProperty="p2" resultProperty="p3">
    <arg line="${params}" />
</exec>

So what can we do to make task invocation syntax more concise and easier to understand? In fact, the syntax could be drastically simplified with the help of simple Ant “adapters” that can be developed for popular scripting languages since Groovy, Ruby and python all have fairly intuitive syntax for supporting lists, dictionaries and other data structures. I developed such an adapter for jython. It uses python named arguments and dictionary syntax, so executing a “copy” task looks like this:


pant=PAnt(project)
pant.exTask("copy", todir="testdir",  fileset={"dir":".","includes":"*.xml"} )

“PAnt” is the name of the “ant adapter” class for Jython. The class configures and executes Ant tasks based on the provided arguments using Ant task configuration API.

“pant” module also comes with a simple helper function called “nested” so that named arguments can be consistently used for nested elements. With syntax highlighting supported by most editors/IDEs (e.g., you can try PyDev for jython/python development), it allows for better visual distinctions between attribute names and values:


pant.exTask("copy", todir="testdir",  fileset=nested(dir=".", includes="*.xml") )

To use “PAnt” from Ant, you can develop custom tasks using “scriptdef” or simply embed python code directly into a target:


    <target name="test.pant" >
        <script language="jython">
from pant import *
pant=PAnt(project)
pant.execTask(name="echo", message="foo")
        </script>
    </target>

The “pant” module itself is just a few lines of code as you can see from its code. Don’t forget to properly setup your “python.path” if you want to make it globally available to all your Ant scripts.

There is also an open-source project Gant that provides similar (in fact, much more extensive) capabilities for Groovy, but I have not had a chance to play with it; I specifically wanted to use python/jython because jython can also be used for WebSphere Application Server administration.

In my mind scripting language-based approach for writing build files provides for much more flexible and easier to understand and maintain scripts. When you start implementing your Ant logic in python, you’ll see that Ant targets become much more coarse-grained, since there is no need to create internal targets (the ones that are never invoked by the users) to simulate subroutines or conditional targets to simulate “if” statements . It is also nice to be able to use all the capabilities of a full-blown programming language as oppose a limited subset of procedural tasks that Ant provides (such as “condition”). Being able to user properly scoped variables instead of inherently global Ant properties is another great benefit.

At the same time, it is still possible to use Ant targets for expressing a flow wiring together major functions of the build script. I would prefer something less XML-like for this purpose too, but that’s a task for another day.

Please refer to our official PAnt project page for more information and to download PAnt

Why XML is Bad for Humans

XML is everywhere these days. It is used for passing data around, for specifying metadata and even as a programming language for tools such as Ant and Jetty.

When XML is generated by various development and run-time tools (e.g., for serializing Java objects into SOAP), its complexity and readability don’t matter much since humans have to deal with raw XML only occasionally (e.g., to troubleshoot a problem).

However, more often than not, XML is written directly by developers mostly with the help of a validating XML editor/IDE (that is, if developers are lucky and Schema/DTD are available). WSDL (in the case of WSDL-to-Java approach), XML schema and Ant build files are a just a few examples when this is the case.

Using XML as a mark-up language for otherwise mostly text documents (e.g., XHTML) it’s not a totally bad idea. However, XML is ill-suited for specifying complex metadata which dynamic dependencies or for wiring command-based logic (e.g., Ant) or for defining domain-specific languages. That is, ill-suited for humans.

For starters, XML is unlike any other programming language (or a natural language). Consider a basic XML construct: <name>value</name>. In any other language it would’ve been written as “name=value” (or “name:=value”, or something similar). An assignment is a construct familiar to most of us from math, even though we may not understand the intricacies of r-value versus l-value. It is intuitive. XML relegates this basic construct to attributes that can only be used as part of an element. Using attributes, a simple assignment can be expressed as <name value=”my value” />, which is a bit easier to understand than a purely element-based construct. However, “value” attribute still seems kind of redundant.

Another annoying feature of XML is closing tags. Closing tags is what makes XML verbose. (What’s interesting, SGML, which XML is derived from, did not require closing tags, so one could write something like <TAG/this/.) In most programming languages we express grouping and nesting using brackets or parenthesis or braces. This is true for function arguments, arrays, lists, maps/dictionaries, tuples, you name it, in any modern programming language. XML creators for some reason decided that repeating the name of a variable (tag) is the way to go. This is a great choice for XML parsers but a poor alternative when XML is written/read by humans.

Closing tags do help when the nesting level runs deep. But it does hurt in cases when there is a need to express a simple construct with just a few (or one!) data items. Problem is, our brain can only process limited number of items at a time, so intermixing data that needs to be processed with tags that serve as delimiters for this data makes comprehension more difficult. For simple lists, a comma-delimited format could be a better choice in many situations.

In general, repeating the same set of tags over and over again to define repetitive groups makes XML difficult to read, especially when each element contains just text:

    <welcome-file-list>
        <welcome-file>index.html</welcome-file>
        <welcome-file>index.htm</welcome-file>
        <welcome-file>index.jsp</welcome-file>
    </welcome-file-list>

Compare this with a simple property/comma delimited format:


welcome-file-list= index.html, index.htm, index.jsp

Finally, what's up with angle brackets? I suppose, brackets could be justified when an element has multiple attributes. In many cases, however, elements don't have attributes and so an angle bracket is simply a way to distinguish a tag name from data. This is again, counter-intuitive and different from many modern programming language. Normally, variable names are not bracketed or quoted, instead, values are. Also, if there was a need to use a special symbol for denoting variables, wouldn't using "$" or "${}" be a more intuitive option for most of us?

Of course, XML has many advantages, the key one being that it is very easy to develop grammars for XML documents (using DTD or Schema). Another one is the fact that grammars are extensible via namespaces. Finally, any XML grammar can be parsed by any XML parser; to a parser an XML document is just a hierarchy of elements.

This simplicity, however, comes at great price. Expressiveness of XML is extremely limited. It only has a limited number of constructs and no operators. While it's adequate for its role as a markup language for text files, it puts a lot of constraints on any more-or-less complex metadata format, let alone something requiring procedural logic, such as Ant or XSLT. As a result, intuitiveness of XML-based grammars suffers.

I'm not saying that we must stop using XML altogether. It has its place. But we should not be applying it blindly just because it's the only widely available tool for creating domain-specific languages. For starters, BNF/EBNF, should be part of any developer's arsenal (along with ANTLR). And good old name/value pair and comma-delimited formats should be seriously considered for simple situations that do not require support for hierarchical structures.

It Takes a Mental Shift to Benefit from XML Appliances

This article on techtarget is a great illustration of my point from the previous post about the importance of the proper design patterns and techniques required to be able to benefit from XML appliance capabilities.

When implementing Web services Java developers tend to think in terms of Java classes that XML documents map to. Using XSLT (or even Schema) for implementing part of their processing logic is not on their list because the common thinking is that it is too expensive to do it in Java.

With XML appliances the situation is exactly the opposite. XSLT all of a sudden becomes one of the best performing part of an application (although, I would imagine that using Java hardware acceleration such as the one provided by Azul might once again change that). This could be a serious “paradigm shift” for many developers and architects.

Another obstacle to more effective usage of appliances could simply be the lack of XSLT skills. XSLT is essentially a functional language and so it comes with a learning curve attached, especially for complex transformations. It is important to have a good knowledge of XSLT to understand what kind of work can be “outsourced” to an appliance. Not that many developers have this knowledge today, but perhaps it will change with more widespread use of XML appliances.

JSON Pros and Cons

JSON is a simple object serialization approach based on
the JavaScript object initializers syntax. The code for initializer (object
literal) is put into a string and then interpreted using JavaScript eval()
function or JSON parser (which is very lightweight):

serializedObj='{firstName:"john", lastName:"doe"}';
...
// This is just an example, JSON parser should be used instead
// to avoid security vulnerabilities of "eval"
var obj = eval("(" + serializedObj + ")");
document.getElementById("firstName").innerHTML=person.firstName;

JSON is used extensively in various AJAX frameworks and toolkits to provide easy
object serialization for remote calls. It is is supported by both GWT
and DOJO. There is a growing realization that
perhaps JSON should be considered as an option when implementing SOA, for
example Dion Hinchcliffe recently published a blog
entry
suggesting that JSON (and other Web 2.0 technologies) must be
seriously considered by SOA architects.

So what are the benefits of using JSON relative to XML (for this comparison I
want to focus just on XML and stay away from SOAP)? Here is my brief take on
that, there are also numerous other articles
and posts
on the subjects.

JSON XML
JSON strengths
Fully automated way of de-serializing/serializing JavaScript objects,
minimum to no coding
Developers have to write JavaScript code to serialize/de-serialize to/from
XML
Very good support by all browsers While XML parsers are built-in into all modern browsers, cross-browser XML
parsing can be tricky, e.g., see this
article
Concise format thanks to name/value pair -based approach Verbose format because of tags and namespaces
Fast object de-serialization in JavaScript (based on anecdotal evidence ) Slower de-serialization in JavaScript (based on anecdotal evidence )
Supported by many AJAX toolkits and JavaScript libraries Not very well supported by AJAX toolkits
Simple API (available for JavaScript and many other languages) More complex APIs
JSON weaknesses
No support for formal grammar definition, hence interface contracts are hard to
communicate and enforce
XML Schema or DTD can be used to define grammars
No namespace support, hence poor extensibility Good namespace support, many different extensibility options in Schema
Limited development tools support Supported by wide array of development and other (e.g., transformation)
tools
Narrow focus, used for RPC only, primarily with JavaScript clients (although
one can argue that it’s one of the strengths)
Broad focus – can be used for RPC, EDI, metadata, you name it
No support in Web services -related products (application servers, ESBs, etc),
at least not yet
Supported by all Web services products

So the bottom line is that JSON and XML are, of course, two very different
technologies; XML is much broader in scope so I’m not even sure if comparing
them side by side is fair.

As an object serialization technology for AJAX (or should it now be called AJAJ
since we’ve replaced XML with JSON?) JSON looks very appealing. Anybody who ever
tried parsing SOAP directly in a browser (while having to support multiple
browsers) can attest that this is not a straightforward task. JSON simplifies
this task dramatically. So I think that ESB vendors should definitely start
thinking about adding JSON to the list of formats they support.

One of the keys to SOA success is that it should be easy to consume a service,
i.e., the entry barrier for service consumers must be low to support “grass root” SOA adoption.
While a top-down SOA effort may succeed, it will certainly take
longer than bottom-up (“grass-root”) approach when developers are able to
consume services as they see fit. AJAX/JSON fits this bill perfectly – it is
easily understood by developers and it does not require any Web services -specific
tools or infrastructure.

So overall I’m pretty enthusiastic about JSON.

Schema Compliance is the Key to Interoperability

Good Web services interoperability
is an absolute must for a successful SOA implementation, but why
interoperability has been so difficult to achieve?

I think that inability to comply with a published Web services contract
expressed via its WSDL/Schema could be one of the leading causes of
interoperability problems (I use the term “interoperability” pretty broadly here).

For example:

  • Many “wrapper” service implementations use positional parameter binding, as I
    described in
    my previous post
    . This allows a service provider to accept messages
    will never validate against the schema. Then, certain changes in
    implementation (a different binding mechanism, switching from “wrapper” to “non-wrapper”),
    could all of a sudden start causing issues for clients that have never had any
    problems before.

  • Some Web services clients always generate messages using qualified local
    elements thus ignoring what is in the schema (the default is actually “unqualified”).
    This may work for some binding frameworks but not for others.

  • Different JAX-RPC implementation handle “xsd:anyType” differently. Some generate “Object”
    bindings; some use SOAPElement and some actually allow using DOM’s “Document”
    or “Element” (as an extension to JAX-RPC).
    This would be Ok if serialization/de-serialization
    process did not change depending on the binding,
    but that unfortunately is not always the case.

  • Handling of “nillable” (xsi:nil) and required elements. If an element is
    declared as required but nillable, it must be provided as part of a message.
    Instead, some Web service clients omit an element if it is “null” in Java.

My biggest pet peeve is that most JAX-RPC implementations today don’t even
support schema validation out of the box (hopefully this will change in JAX-WS).
I understand that full validation against a schema could be expensive, but can
we at least handle required parameters/fields as they defined in the schema?
Types mismatch will certainly cause an error, why not a missing mandatory field?

Note that I’m not talking about WS-I profiles, .NET/Java interop, and
differences in WS-* implementations. I’ve seen all of the above problems while working
with different Web services implementations in Java (mostly JAX-RPC).

And it is not just about interoperability – reliance on subtle conventions as
opposed to a published contract makes architectures more fragile and more
tightly coupled.

So my advice is simple – write a handler (at lest for development/test
environments) and enable schema validation on the server and, under some
circumstances (i.e., you don’t control the service), on the client. This will
save you from many interoperability problems, guaranteed.

Web Services and Publishing Schema

Good Web service design starts with a schema. Binding, port type and all these
other parameters of a WSDL file usually are not interesting at all – 99.9% of
all services have trivial SOAP bindings, no headers and no declared faults. Also,
majority of Web services today are document-style with one “part” per message.
So schemas of input and output messages are really the key to understanding what
service is about. In other words, schema truly defines the contract of a service.

There is no substitution for the schema – you can’t generate a good schema from
a source code or a UML class diagram since they do not convey optionality,
restrictions, patterns, element model (sequence or choice) and other aspect. XML
Schema is verbose and complex (and that’s a topic for another post), but it is
pretty expressive as far the data typing goes.

So I usually start a service design with a well-documented schema. The question
is, however, how to then publish it so other people can review it (you know,
people who represent service consuming applications – these folks are kind of
important). Reviewing a schema in a “raw” format is out of question, especially
if business people are involved. So a nicely formatted HTML documentation which
could present the schema in a readable format (ideally, translating schema lingo
into plain English, such as “minOccur=0 ” could translate into “optional”) is
the way to go.

I suspect that SOA registries, such as Infravio, can take care of this problem.
But oftentimes there is a need to prototype a solution quickly before SOA
infrastructure is put into place. Also, at this day and age, open source/free/inexpensive
tools might be a requirement for many projects.

The best known Schema documentation generator is xnsdoc.
It is not free, but 49 EUR seems like a reasonable price. xnsdoc has an open
source XSLT-based counterpart with fewer capabilities. xnsdoc does produce nice
documentation along the lines of JavaDoc format (although I’m not convinced that
this is the best choice – remember, the documentation has to be suitable for non-developers),
however, in my view it does not do much in terms of explaining the schema in
plain English. In other words, the assumption is that users are familiar with
Schema concepts. I also found that the support for groups and multi-files
schemas had some problems (but keep in mind that I’d done my testing almost a
year back, so please check the latest version)

A better tool that I found is Bluetetra
XSDDoc
(99 USD). In my view, it provides more user-friendly schema
documentations. You can see some examples here.
Unfortunately, it is not clear if XSDDoc is still actively supported – its
website provides only minimal information and the product has not been updated
in a while.

I still think that there is a need for more interactive Wiki-style schema
publishing tool that would allow users reviews schemas expressed in layman terms
and comment on elements and types.

What’s Missing from XML Schema

Over the last several weeks I‘ve been working on developing XML schemas for a
client to support information exchanges between several different organizations,
so it was important to make the schemas very explicit and “tight“ so that each
party can validate XML before or after sending it. The XML documents could be
used in conjunction with Web services or as part of “old-fashion“ file based
exchange. In short, this was pretty typical system integration task.

The client had already decided to standardize on XML Schema, so using Relax NG
or Schematron was not an option.

XML Schema provides a lot of different capabilities but based on my experience I
think that it could benefit from some improvements. Here are my random thoughts
on this. Now, I don‘t claim to be the ultimate XML Schema expert, so take it for
what it‘s worth.

  • Schema‘s verbosity and unwieldy syntax makes it a poor candidate for
    communicating and publishing XML structure and rules to the wide audience of
    technical people from different organizations that may or may not know XML
    Schema. For example, “minOccur=0“ means “optional field” which is probably not
    very intuitive to anyone unfamiliar with the Schema specification. Even after
    formatting the schema for publishing (e.g., by using xsddoc)
    schemas are still hard to understand. Of course, one can use the annotations and
    try to explain each type in plain English, but then the documentation always
    tends to get out of synch.

    The obvious counter-argument here is that XML Schema is designed to be the data
    modeling/validation tool and as such it is not suitable for capturing business
    requirements but I just think that it would be nice if it could really be used
    for both, essentially becoming the “system of records“ for integrating different
    systems and organization.

  • Error messages thrown by XML parsers are far from being the most intuitive (this
    obviously depends on the parser and I have not done any comparative analysis).
    For example, missing required element results in “Element ‘element name‘ is not
    valid for content model“ where ‘element name‘ is the name of the element
    following the missing required missing element. Why can‘t the parser simply say
    “Required element is missing“? Again, this problem is exacerbated when you‘re
    dealing with people with only cursory XML Schema knowledge. I‘m not aware of a
    standard way to customize error messages, so in my case developers will have to
    do error translation in the code.
  • XML Schema users are forced to use regular expressions for defining any more or
    less complex template for simple types (phone number, SSN , etc). This poses a
    problem in an environment where you can‘t expect all users to be familiar with
    regexp syntax. When you get a message “Value does not match regular expression
    facet ‘\+?[0–9\-\(\)\s]{1,25}“, it could very easily befuddle an uninitiated. I
    wish there was a simplified templating mechanism, may be something similar to
    java.text.MessageFormat “##.##“.
  • Reuse capabilities in XML Schema are not perfect. “extend“ only allows to append
    an element to the end of the sequence. “Restrict“ requires repeating the content
    model of the parent type. This creates very brittle schemas and violates DRY
    principle. There is no way to parameterize an XML type. Let‘s say there is “name“
    type with “first“ and “last“ elements. When a new person is added, I want “last“
    element to be mandatory. In “update“ situation all fields could be optional. I
    wish I could make “minOccur“ a parameter here.
  • XML Schema may seem very OO-like at the first glance, but in fact it is missing
    some important OO-like capabilities. For instance, there is no element-level
    polymorphism. In the example above, I wanted to change the “behavior“ of “last”
    (some aspect of this type definition) in a subtype and I can‘t do that.
    Inheritance by restriction for complex types (I don‘t have a problem with using
    it for simple types) is IMO counter-intuitive. So now I can have a child which
    does not have all properties of its parent, and so there is no way to enforce
    optional elements for all children.
  • Element and type scoping could‘ve been more flexible. All declarations are
    either parent element-scoped (local) or global. This does not allow me to define
    a reusable type or a group scoped to certain parent element or to a file (any
    more or less complex schema would have to broken down into multiple files for
    manageability sake). So say I have a name type for person‘s name (first, middle,
    last) and business‘ name type with a different structure. If I want use to
    person‘s name type for different elements within Person type, I will have to
    define as global and name it PersonNameType, essentially embedding parent‘s name
    into the child‘s name. I wish I could simply define NameType and specify what
    parent type or element it is scoped to.
  • XML Schema is a declarative language and so it lacks programming constructs,
    which is fine. But there is still a need for Schematron-like facility or the
    scripting language for expressing complex rules (such as cross-field validation).
    Schematron works fine when embedded inside annotations, but it requires a
    separate validation step and Schematron XSLT . So it would be great if this
    capability was supported by the Schema standard and natively understood by XML
    parsers. This would make schemas truly self-contained.

So my wish list is actually quite simple:

  • Make XML schemas easier to understand for non-technical users or people without
    schema knowledge perhaps via some intelligent translation mechanism.
  • Make the Schema more powerful by allowing programming constructs, variables and
    more flexible scoping.

Tags: ,