XML is everywhere these days. It is used for passing data around, for specifying metadata and even as a programming language for tools such as Ant and Jetty.

When XML is generated by various development and run-time tools (e.g., for serializing Java objects into SOAP), its complexity and readability don’t matter much since humans have to deal with raw XML only occasionally (e.g., to troubleshoot a problem).

However, more often than not, XML is written directly by developers mostly with the help of a validating XML editor/IDE (that is, if developers are lucky and Schema/DTD are available). WSDL (in the case of WSDL-to-Java approach), XML schema and Ant build files are a just a few examples when this is the case.

Using XML as a mark-up language for otherwise mostly text documents (e.g., XHTML) it’s not a totally bad idea. However, XML is ill-suited for specifying complex metadata which dynamic dependencies or for wiring command-based logic (e.g., Ant) or for defining domain-specific languages. That is, ill-suited for humans.

For starters, XML is unlike any other programming language (or a natural language). Consider a basic XML construct: <name>value</name>. In any other language it would’ve been written as “name=value” (or “name:=value”, or something similar). An assignment is a construct familiar to most of us from math, even though we may not understand the intricacies of r-value versus l-value. It is intuitive. XML relegates this basic construct to attributes that can only be used as part of an element. Using attributes, a simple assignment can be expressed as <name value=”my value” />, which is a bit easier to understand than a purely element-based construct. However, “value” attribute still seems kind of redundant.

Another annoying feature of XML is closing tags. Closing tags is what makes XML verbose. (What’s interesting, SGML, which XML is derived from, did not require closing tags, so one could write something like <TAG/this/.) In most programming languages we express grouping and nesting using brackets or parenthesis or braces. This is true for function arguments, arrays, lists, maps/dictionaries, tuples, you name it, in any modern programming language. XML creators for some reason decided that repeating the name of a variable (tag) is the way to go. This is a great choice for XML parsers but a poor alternative when XML is written/read by humans.

Closing tags do help when the nesting level runs deep. But it does hurt in cases when there is a need to express a simple construct with just a few (or one!) data items. Problem is, our brain can only process limited number of items at a time, so intermixing data that needs to be processed with tags that serve as delimiters for this data makes comprehension more difficult. For simple lists, a comma-delimited format could be a better choice in many situations.

In general, repeating the same set of tags over and over again to define repetitive groups makes XML difficult to read, especially when each element contains just text:

    <welcome-file-list>
        <welcome-file>index.html</welcome-file>
        <welcome-file>index.htm</welcome-file>
        <welcome-file>index.jsp</welcome-file>
    </welcome-file-list>

Compare this with a simple property/comma delimited format:


welcome-file-list= index.html, index.htm, index.jsp

Finally, what's up with angle brackets? I suppose, brackets could be justified when an element has multiple attributes. In many cases, however, elements don't have attributes and so an angle bracket is simply a way to distinguish a tag name from data. This is again, counter-intuitive and different from many modern programming language. Normally, variable names are not bracketed or quoted, instead, values are. Also, if there was a need to use a special symbol for denoting variables, wouldn't using "$" or "${}" be a more intuitive option for most of us?

Of course, XML has many advantages, the key one being that it is very easy to develop grammars for XML documents (using DTD or Schema). Another one is the fact that grammars are extensible via namespaces. Finally, any XML grammar can be parsed by any XML parser; to a parser an XML document is just a hierarchy of elements.

This simplicity, however, comes at great price. Expressiveness of XML is extremely limited. It only has a limited number of constructs and no operators. While it's adequate for its role as a markup language for text files, it puts a lot of constraints on any more-or-less complex metadata format, let alone something requiring procedural logic, such as Ant or XSLT. As a result, intuitiveness of XML-based grammars suffers.

I'm not saying that we must stop using XML altogether. It has its place. But we should not be applying it blindly just because it's the only widely available tool for creating domain-specific languages. For starters, BNF/EBNF, should be part of any developer's arsenal (along with ANTLR). And good old name/value pair and comma-delimited formats should be seriously considered for simple situations that do not require support for hierarchical structures.

19 thoughts on “Why XML is Bad for Humans

  1. IMHO the worst abuse is when XML is used as a programming language, I suppose intentions were good in the case of Ant (ie, specify your build actions descriptively). Unfortunately Ant targets are really procedure steps. Then to make matters worse tasks (and other objects) can be added as extensions, which totally side-steps the DTD concept.

    All that richness in XML garb really makes for a love/hate relationship with Ant.

  2. Not to mention the human readable elements and attributes, which really aren’t supposed to be read by humans, and the programs, that will read them wouldn’t care whether it says “welcome-file” or “x”, but combined with the end tags and often distributed over networks XML documents become too fat to fit the tubes.

    And while I’m ranting: Type checking is thrown out the window – all attributes are strings.

    Ordered lists becomes a nuisance, because the order in which you write the XML is not guaranteed to be the order in which it is read, thus you need to add a list index to each element, and actually have to fetch all elements before you’re guaranteed to have an overview, e.g. whether the list is 1 or 0 indexed.

  3. Hi Chris,

    Of course, Ant is a classic example of XML misuse (isn’t it ironic that in the good old days developers complained a lot about brittle make syntax so Ant was born out of that frustration – and it was a hit, because XML is so unambiguous ), but I think that even traditional uses of XML, such as Schema and WSDL, are not necessarily ideal XML applications (especially WSDL); DSL IMO would’ve been a better fit.

  4. Treating XML structure as assignment seems to me a little broken. An XML file is a tree of tagged data, not a sequence of imperative assignments. If I write:

    Joe Bloggs
    3.14

    Mary Hogg
    3.75

    That really isn’t equivalent to a sequence of four assignments or even two assignments. It’s one single data structure with two records, each having two subordinate records or data members.

  5. Why do you think XML should be more like programming languages? I don’t see how that would be an improvement. And when considering how you talk about assignment, I’m lead to think a big part of your problem with XML is that you are expecting it to be something that very few other people are expecting it to be.

  6. Ever worked with the Spring.NET IOC XML configuration on a sizable app? UUUUGGGGHHH!!!!

  7. Unfortunately, XML has become to platform neutrality (e.g., SOA, containers, etc.) what Windows has become to desktops and laptops: a necessary, though sometimes evil, standard. Any old text-based language that expresses data structure in some coherent and consistent way would have worked. We just happened to be “blessed” with the timing of XML at the end of the distributed object era, which was the beginning of the container and wiring era (e.g., Servlets, Spring, SCA, etc.). XML just naturally happened along at the right time–1999-2000. Do you recall the era of vendor-defined text configuration files? That’s a much worse UGH! I’ll take Spring.NET IOC XML configuration any day in preference to that. –Bryan

    I’m a Java guy, so I’ve never used Spring.NET but I’m sure your UGH assessment is appropriate.

  8. @For starters, XML is unlike any other programming language (or a natural language). Consider a basic XML construct: value. In any other language it would’ve been written as “name=value” (or “name:=value”, or something similar). An assignment is a construct familiar to most of us from math, even though we may not understand the intricacies of r-value versus l-value. It is intuitive. XML relegates this basic construct to attributes that can only be used as part of an element.

    I recommend anyone who creates an XML vocabulary for a DSL read Elliotte Rusty Harold’s Effective XML. Also, Microsoft created a pretty awesome XML language that is very similar to Lisp s-expressions: XAML. If only Spring and Spring.NET understood how awesome this format was, It also includes built-in support for data binding. However, it breaks from traditional SAX and DOM parsing models, so people outside .NET see it as non-standards conforming, regardless of its goodness.

  9. Closing tags aren’t even useful for deeply nested data. Any half-decent text editor can highlight matching braces. Closing tags would be useful if XML permitted overlapping elements, which of course it doesn’t.

Comments are closed.