All posts by Alexander Ananiev

Ant Jython Tasks (PAnt Tasks)

PAnt build tool comes with several Ant tasks to facilitate the use of Jython/Python from Ant.

PAnt tasks have a number of advantages over built-in <script language=”jython”> way of invoking Jython from Ant:

* More graceful exception handling. Jython code invoked using “script” generates long error stack that contains full stack trace of the “script” task itself. Sifting through the traces and trying to distinguish Java trace from Python trace is quite painful. PAnt “jython” task produces brief readable python-only error stack.
* You can use Ant properties as parameters (“jython” task makes them available in the local namespace of the calling script).
* Convenience “import” attribute.
* “jythonInit” task allows for setting python.path using Ant path structure.
* Jython interpreter is initialized once per Ant project. All scripts invoked from the same Ant project reuse the same built-in namespace. So you can define variables and imports in one call and use them in a subsequent call.
* Task name ( the name that prefixes all console output from Ant for a given task) is generated automatically based on the supplied Python code.
* “verbose.jython” property triggers verbose output for jython-related tasks only. This is much easier than trying to scan through hundreds of lines of general “ant -v” verbose log.

Example:

Ant code:






print "Property from ant:", testProp
# define a var that we can use in other scripts
s="test"



print "Var created earlier: ",s




“testmodule” python code:


from pant.pant import project 
def test (prop):
    print "Passed parameter: ",prop
    print "Test property: ", project.properties["testProp"]

Please refer to this build.xml file for more examples.

The tasks can be used independently of PAnt python code.

PAnt Ant Tasks Reference

Getting Started

Download PAnt, extract pant.jar and create “taskdef” as described here

“jythonInit” Task

The tasks initializes jython interpreter. Because of the overhead, the interpreter is initialized only once even if jythonInit is invoked multiple times. The repeating calls are simply ignored.
jythonInit automatically adds pant.pant module to PYTHONPATH.

Attributes:

* pythonPathRef – cachedir used for caching packages (optional). Defaults to ${java.io.tmpdir}/jython_cache (note– this is different from default jython behavior).

Nested elements:

pythonPath – python.path to use defined using Ant path-like structure. Required if “pythonPathRef” attribute was not provided.

Special properties:

log.python.path – if set to “true”, jythonInit will print python path to Ant log. Default: false.

“jython” Task

Invokes python code.
Note: by default, jython does not print python stack trace in case of an exception. To see the trace, run Ant in verbose mode using “-v” or use “-Dverbose.jython=true” property.

Attributes:

* exec – Python code snippet to execute. Typically, this is a function from a module available from python.path. This has to be a single line, e.g., mod.fun() although you could combine multiple statements separated by “;”. Required if “execfile” was not provided.
* import – a convenience attribute for providing “import” statement. Its only purpose is to make the task invocation more readable. Alternatively, you can have “import” as part of the”exec”,e.g., exec="import mod;mod.fun()". Optional.
* execfile – path to a python script file. Required if “exec” was not provided.

Nested elements:

Inline text with python code.

Special properties:

verbose.jython – if set to “true”, jython will print additional information about executing python code to Ant log. Default: false.

pimport Task

Creates Ant targets from a python module. Functions that will be used as targets have to be marked using “@target” decorator as described here.
Python module name is used as Ant project name. Target overriding works the same way with Ant import task. In other words, targets defined using pimport will override targets previously defined using “import” or “pimport” tasks.

Attributes:
module – python module to create targets from. The module has to be available from python.path specified using jythonInit.

WebSphere 7 Supports Properties-Based Configuration

IBM WebSphere 7 (currently in beta) comes a property-file based configuration tool that provides a “human-consumable” interface to the currently XML-based configuration repository of the application server. This is another proof that XML is simply not the right mechanism for managing configuration of complex software products.

From the release notes:


Properties (name/value pairs) files are more consumable for human administrators than a mix of XML and other formats spread across multiple configuration directories.

Kudos to IBM for recognizing that.

It is still not clear though how hierarchical relationships between configuration objects will be supported.

Back in WAS 6 world, I’ve been using a simple jython script that converts python named parameters into wsadmin format. This is an example of a resource described in this format:


 WASConfig.DataSource(parent="testJDBCProvider", name="testDS", jndiName="jdbc/testDS",
                              description="Test DataSource", propertySet=dict(
                              resourceProperties=[
                                  dict(name="dbName", value="testDB", type="java.lang.String" ),
                                  dict(name="connectionAttribute",value="", type="java.lang.String")
                               ]))
    


I think that a slightly more streamlined python-based format will be superior to properties.

Jython in WebSphere Portal

Most developers and administrators working with WebSphere Application Server (WAS) know that both JACL and Jython languages can be used for various WAS administration and configuration tasks. However, JACL has always been a preferred choice, simply because this is the default language used by the product’s admin tool (wsadmin) and also because JACL examples and documentation are more complete.

Using JACL might have been a valid option just a few years back (when WAS just came out) given the uncertainty surrounding the Jython project. Today, however, jython is clearly alive and well; alpha version supporting Python 2.5 was announced recently. Therefore there is really no point in using JACL any longer, except may be for shops with a large collection of existing JACL scripts. JACL syntax is quite arcane compared with Python and the language is clearly not as widely used.

IBM confirmed this view by releasing JACL to Jython converter a couple years back.

Unfortunately, up until recently, jython was not officially supported in another IBM product, WebSphere Portal, which comes with wpsript tool for managing pages, deployable modules and other portal artifacts.

But since portal scripting relies on wsadmin’s shell, jython is in fact fully supported by the product, it’s just not documented.
All that you need to do to switch to jython is to invoke wsadmin with “-lang jython” and “-wsadmin_classpath ” followed by the list of portal jars (you can copy the classpath from SCRPATH variable definition in wpscript.sh).

As an example, I put together a simple Jython script for cleaning up a portal page hierarchy. Removing pages before applying an XMLAccess script with page definitions allows to start portal configuration from a clean “known” state. Very often, especially in a development environment, an application’s page hierarchy gets polluted with various “test” pages created by developers. The script gets rid of them.

In WebSphere Portal 6.1 Jython is finally made a first-class citizen. The product’s documentation proclaims that JACL support will be phased out and that jython is the way to go. Surprisingly, though, all examples still use good old JACL. I assume it’s just a matter of time before they are converted.

Yet Another Build Server

Thoughtworks has finally released a successor to their venerable Cruise Control – Cruise build server. The UI certainly looks nice and it seems quite flexible. There is even a free version (which is limited to two computers), which is great.

What is not clear though is how this product is different from AntHill, Buildforge, Bamboo, TeamCity, Gauntlet and the likes. The field is certainly becoming crowded – and I haven’t even mentioned numerous open-source contenders. All these products seem to be doing the same thing – organizing your build scripts, interfacing with version control, running builds on a distributed “build server farm”, collecting statistics, publishing reports and providing UI for all these functions.

All these features are important and useful. Ironically, however, what build servers don’t do is automatically building or deploying your software. You still have to write Ant or Maven scripts, define and manage configuration parameters (using properties, XML, environment variables), deal with different environments (if I’m not mistaken, AntHill is the only product that has an explicit concept of an environment). For a complex project this could be a lot of work. Granted, every project is unique (if not, just use the default Maven configuration and you’re done), so this could be a tough nut to crack. It should however be possible for a build server to have enough intelligence to infer how to build a project directly from the code base.

Going beyond Builds with Build Servers

A build server provides great benefits to a development team, including:

* Convenient way of sharing build results and artifacts.
* Ability to run builds without having to mess with command line parameters.
* Tight integration with version control.
* History and statistics.
* Accountability and audit (e.g., who ran what build).

In most instances build servers, such as Hudson, Continuum or LuntBuild are used mostly for continuous integration and development builds. But they can be used for much more than that. A build server provides a generic framework for performing any kind of deployment or operations-related activity, including deployment of an application into acceptance and production environments, configuring of an app server (e.g., setting up data sources and JMS destinations) and even app server monitoring.

Several things make it possible. Build servers have flexible scheduling capabilities allowing to schedule any kind of repetitive activity, such as “pinging” a server every 15 minutes. Integration with version control makes it easier to manage the scripts. Very often, scripts for administration or configuration tasks are created in an ad-hoc fashion with multiple versions scattered around different servers and directories. A build server is able to check out a specific version of a script from a version control system and execute it. In other words, a build server helps treating admin scripts same way with business applications. Additionally, build servers can publish logs. This is powerful, since it lets anyone (who has permissions) see results of configuration activities or monitoring scripts. Finally, build servers do a great job of notifying all interested parties via e-mail, RSS, IM and other means of communication.

Security could be another important benefit of leveraging a build server outside of development builds. Usually, access to production environments is tightly controlled. In many organizations, developers have no access to production whatsoever. Build server can help enforce this policy yet give developers visibility into the details of production deployment. This can be done by creating a separate build configuration just for production deployments.

All the benefits that I mentioned in the beginning of this post also apply to non-build activities. For example, the ability to see history of all production deployments can be very valuable.

Not all build servers are created equal, especially open source ones. Security, for one, could be supported and implemented very differently in different products. Support for various version control features, such as labels and branches, also varies. In other words, careful evaluation is required before embarking on an effort of expanding the build server beyond development. But more importantly, this effort requires buy-in from operations and support organization and other groups that will become users of the build server.

As an example, with one of our clients we’ve been able to implement a number of interesting non-build functions using a build server (we started with LuntBuild and now migrated to Hudson):

* Deployment of pre-built EAR/WAR files into test and production environments.
* Configuration of application servers. In our case, most of the app server resources are shared, so we run the configuration script separately from application deployments. In fact, we treat configuration as a separate “unit of release” with its own lifecycle.
* Monitoring of application servers. We have a script which tests health of application servers using JMX.
* Log scanning. We’ve implemented a custom log scanning tool which “greps” logs for various patterns. The build server sends notification if a pattern is found in the logs.

Many of these functions can be accomplished using various commercial system management and monitoring products. These products however can get expensive and very often organizations simply don’t have enough licenses to cover all environments. So a combination of custom scripts with a build server provides a viable alternative.

Secret Weapon of LAMP Applications

I’m surprised by low traction of LAMP applications in an enterprise (I use the LAMP acronym loosely as a catch-all for PHP, Ruby and Python apps). For most large organizations Java EE still reigns supreme. While developers and analysts debate performance and “enterprisey” features (or the lack thereof) of the LAMP stack, there is one aspect that is often overlooked – LAMP infrastructure is much more simple than a typical Java EE application server, hence its operations and maintenance is greatly simplified. And of course operations and maintenance is a big chunk of IT budget of any organization; in many shops it is actually the biggest part of the budget (60% is the average according to Forrester).

It usually goes like this. Data center operations are outsourced, and so data center personnel has to provide the first line of defense for all production problems. Data center folks are not developers, so NullPointerException does not mean much to them. But they have to be able to figure out who to call when they see the NullPointerException.

Here is an example of an error message from an application running under WebSphere Portal. This message is 318 lines long and is completely inscrutable to all but a hardcore WebSphere Portal developer or administrator. The most ironic part is that in spite of multiple “caused by” in the text, the message tells us nothing about the actual root cause of the problem, which, most likely, is a classloader conflict. As a sidebar, why can’t an app server at least give me a warning during deployment about a potential class loader problem (especially since all the app servers, even tomcat, add dozens of jars to the classpath)?

On the other hand, here is an example of an error message I get from a python app implemented using django:


    Validating models...
    djtest.order: name 'test' is not defined
    1 error found.   

So which error message do you think will be more palatable to an operations person looking at the logs?

I know I’ve exaggerated a bit – my django app is extremely simplistic at this point. However, it is true that many Java EE app servers and applications do extremely poor job at exception logging.

Even more obvious benefit of LAMP is availability of the source code. In Java EE world, the common practice is not to include the source code into WAR and JAR files. In many instances, the code is compiled without debug information. Even if the source code and line numbers are available, finding the right file takes some digging since we have to deal with multitude of JAR, EAR and WAR files. Not to mention that the same class can reside in multiple jars.

So if I was the person who had to respond to the “site is down” calls late at night, I’d vote for PHP.

Using Commons Logging from Ant

Here is a common problem with custom Ant tasks. A typical task is implemented using multiple classes, so classes that don’t extend org.apache.tools.ant.Task class don’t necessarily have to have a dependency on Ant APIs. For example, it is pretty easy to pass Ant properties in a hashtable instead of passing the entire project object. This makes the custom task’s code more generic and reusable.

One little issue that still remains is logging. Ant users are accustomed to running Ant with -verbose option which tells Ant tasks to print more detailed information. Oftentimes, -verbose is the only way to debug a build.

Unfortunately, using Ant logging requires access to Project or Task objects. As a result, the dependency on Ant APIs permeates the code that otherwise could have remained generic.

My solution for this is to use a simple class that implements org.apache.commons.logging.Log interface so that we can use jakarta commons logging (JCL) APIs instead of using Ant logging directly.
The class is called AntCommonsLogger.

To initialize AntCommonsLogger, users can either invoke “antCommonsLoggerInit” task in the beginning of a project or call AntCommonsLogger.init( getProject()) somewhere in their custom task class. After that, AntCommonsLogger becomes the default logger so that any class can use the the familiar commons logging pattern without any changes:


private static Log logger = LogFactory.getLog(CustomTask.class.getName());
...
logger.info("message");
logger.debug("message");
logger.trace("message");


“info” messages (i.e. calls to logger.info()) display during normal Ant execution, “debug” messages display if “-verbose” was specified and “trace” messages display if “-debug” was specified. This is a bit counterintuitive but this is the best we could do given that Ant’s “verbose” does not have a direct counterpart in JCL.

To begin using AntCommonsLogger, download myarch-antutil and add antCommonsLoggerInit task definition to your project, e.g.:


    <taskdef resource="com/myarch/antutil/tasks.properties">
        <classpath>
            <pathelement path="${basedir}/myarch-antutil.jar"/>
            <!-- lib.dir must contain commons-logging -->
            <fileset dir="${lib.dir}" />
        </classpath>
    </taskdef>

You also need to add myarch-antutil.jar and commons-logging.jar to the classpath of your custom tasks.

Note that calling antCommonsLoggerInit makes AntCommonsLogger the default logger for this JVM instance. This means that all Java classes invoked by this Ant script (e.g., using “java” task) that use JCL, will use AntCommonsLogger instead of java.util logging or log4j. If this is not what you want, call AntCommonsLogger.init(getProject()) at the begging of your custom task and AntCommonsLogger.restorePreviousDefault() at the end.

Download myarch-antutil

Jython Ant Wrapper Examples

Somebody asked me about examples for PAnt Jython wrapper. Here are some. I’ll be updating this page with more examples in the future.

Following is a simple echo task. Note that pant.py has to be on your “python.path”. You can set it by adding it
to your ANT_OPTS environment variable (ANT_OPTS=-Dpython.path=python_path).



        <script language="jython">
from pant import *
pant=PAnt(project)
pant.execTask(name="echo", message="foo")
        </script>

Following is a copy task example. Note the use of “nested” function to denore nested elements. “expandproperties” is assigned
an empty dictionary since it does not have any attributes.



pant = PAnt( project )
pant.copy(todir="${test.prop}", fileset=nested(dir=".", include=nested(name="*.xml")), 
          filterchain=nested(expandproperties={}) ) 

Following is an example of Exec task. Multiple “env” elements
are distinguished by adding the suffix “_number“. The suffix can in fact be anything, PAnt,
simply ignores the substring starting with underbar.
This example also demonstrates how you can mix and match python variables and Ant properties
in the same piece of code.


shellFile="myshell.sh"
commandLine="options"
pant.exec(dir="${bin.dir}", executable=shellFile,  failonerror="true", resultproperty="result.code",
          env_1=nested(key="key1", value='${val1}'),
          env_2=nested(key="key2", value='${val2}'),
          env_3=nested(key="key3", value='${val3}'),
          arg=nested(line=commandLine) )


Update:

MyArch Jython Task provides tighter integration of Ant and Jython. You may want to use it together with PAnt.

Please refer to our official PAnt project page for more information and to download PAnt

Ant Scripts without XML – Jython Ant Wrapper

In my previous post I blogged about my attempts to replace XML-based syntax for invoking Ant tasks using Jython scripts. But I wasn’t fully satisfied with the result – I did not like that fact that I had to pass a task name as a parameter to PAnt.execTask, e.g. pant.execTask("mkdir", dir=buildDir). It just was not intuitive enough. In Ant a task is equivalent to a subroutine, so I really wanted to use the task name as a function name. So I played a bit with dynamic dispatching in Jython and after a simple override of __getattr__ I was able to invoke Ant tasks using this syntax:


pant.mkdir(dir=buildDir)

PAnt (my Ant wrapper) treats any method call as a request to execute an Ant task (except for explicit calls to “execTask” method). This allows for an elegant (in my mind) and concise syntax.
The updated PAnt script can be downloaded from here.

Meanwhile, I’ve started using this wrapper in earnest in a large-scale enterprise build system that I’m working on. So far, I’ve been absolutely thrilled with the results. This makes writing any non-trivial Ant target so much easier. I’m really hoping that this could make build script development less dull and daunting.

Please refer to our official PAnt project page for more information and to download PAnt

Ant Scripts without XML

It’s pretty easy to create an Ant file for a simple project. A simple Ant script typically contains ubiquitous “init”, “compile”, “test”, “war” (or “jar), “build” targets all wired together. It’s easy to change and easy to understand and the script’s flow has a declarative, rule-based feel to it. The problem is, projects and their build files rarely stay simple for long. Soon we need to add “validate.xml” target, junit reports, deployment to your application server and so on. Then we begin supporting multiple environments; we discover that our local desktop configuration is different from how integration environment is configured so we add numerous property files, “ftp” tasks and multiple “copy” targets for various application files. Before we know it, the build script becomes a convoluted mess of XML tags and there is nothing declarative about it anymore; it’s morphed into a full-fledged, very procedural program. Perhaps we even had to resort to using ant-contrib “if” and “for” targets to implement procedural logic in Ant. And nothing is uglier than an “if” with complex conditions expressed in XML.

A better approach would be to implement “procedural” portion of the build script in Java or any of the scripting languages that Ant supports. The problem is, configuring and invoking Ant tasks from Java or a scripting language leads to verbose code. For example:


    execTask = project.createTask("exec")
    execTask.setOutputproperty(outputPropertyName)
    execTask.setErrorProperty(errorPropertyName)
    execTask.setResultProperty(resultPropertyName)
    execTask.setExecutable(execName)
    arg=execTask.createArg()
    arg.setLine(paramString)

Doing the same thing in XML is shorter and cleaner:


<exec executable="${execName}" outputPropert="p1" 
    errorProperty="p2" resultProperty="p3">
    <arg line="${params}" />
</exec>

So what can we do to make task invocation syntax more concise and easier to understand? In fact, the syntax could be drastically simplified with the help of simple Ant “adapters” that can be developed for popular scripting languages since Groovy, Ruby and python all have fairly intuitive syntax for supporting lists, dictionaries and other data structures. I developed such an adapter for jython. It uses python named arguments and dictionary syntax, so executing a “copy” task looks like this:


pant=PAnt(project)
pant.exTask("copy", todir="testdir",  fileset={"dir":".","includes":"*.xml"} )

“PAnt” is the name of the “ant adapter” class for Jython. The class configures and executes Ant tasks based on the provided arguments using Ant task configuration API.

“pant” module also comes with a simple helper function called “nested” so that named arguments can be consistently used for nested elements. With syntax highlighting supported by most editors/IDEs (e.g., you can try PyDev for jython/python development), it allows for better visual distinctions between attribute names and values:


pant.exTask("copy", todir="testdir",  fileset=nested(dir=".", includes="*.xml") )

To use “PAnt” from Ant, you can develop custom tasks using “scriptdef” or simply embed python code directly into a target:


    <target name="test.pant" >
        <script language="jython">
from pant import *
pant=PAnt(project)
pant.execTask(name="echo", message="foo")
        </script>
    </target>

The “pant” module itself is just a few lines of code as you can see from its code. Don’t forget to properly setup your “python.path” if you want to make it globally available to all your Ant scripts.

There is also an open-source project Gant that provides similar (in fact, much more extensive) capabilities for Groovy, but I have not had a chance to play with it; I specifically wanted to use python/jython because jython can also be used for WebSphere Application Server administration.

In my mind scripting language-based approach for writing build files provides for much more flexible and easier to understand and maintain scripts. When you start implementing your Ant logic in python, you’ll see that Ant targets become much more coarse-grained, since there is no need to create internal targets (the ones that are never invoked by the users) to simulate subroutines or conditional targets to simulate “if” statements . It is also nice to be able to use all the capabilities of a full-blown programming language as oppose a limited subset of procedural tasks that Ant provides (such as “condition”). Being able to user properly scoped variables instead of inherently global Ant properties is another great benefit.

At the same time, it is still possible to use Ant targets for expressing a flow wiring together major functions of the build script. I would prefer something less XML-like for this purpose too, but that’s a task for another day.

Please refer to our official PAnt project page for more information and to download PAnt

Why XML is Bad for Humans

XML is everywhere these days. It is used for passing data around, for specifying metadata and even as a programming language for tools such as Ant and Jetty.

When XML is generated by various development and run-time tools (e.g., for serializing Java objects into SOAP), its complexity and readability don’t matter much since humans have to deal with raw XML only occasionally (e.g., to troubleshoot a problem).

However, more often than not, XML is written directly by developers mostly with the help of a validating XML editor/IDE (that is, if developers are lucky and Schema/DTD are available). WSDL (in the case of WSDL-to-Java approach), XML schema and Ant build files are a just a few examples when this is the case.

Using XML as a mark-up language for otherwise mostly text documents (e.g., XHTML) it’s not a totally bad idea. However, XML is ill-suited for specifying complex metadata which dynamic dependencies or for wiring command-based logic (e.g., Ant) or for defining domain-specific languages. That is, ill-suited for humans.

For starters, XML is unlike any other programming language (or a natural language). Consider a basic XML construct: <name>value</name>. In any other language it would’ve been written as “name=value” (or “name:=value”, or something similar). An assignment is a construct familiar to most of us from math, even though we may not understand the intricacies of r-value versus l-value. It is intuitive. XML relegates this basic construct to attributes that can only be used as part of an element. Using attributes, a simple assignment can be expressed as <name value=”my value” />, which is a bit easier to understand than a purely element-based construct. However, “value” attribute still seems kind of redundant.

Another annoying feature of XML is closing tags. Closing tags is what makes XML verbose. (What’s interesting, SGML, which XML is derived from, did not require closing tags, so one could write something like <TAG/this/.) In most programming languages we express grouping and nesting using brackets or parenthesis or braces. This is true for function arguments, arrays, lists, maps/dictionaries, tuples, you name it, in any modern programming language. XML creators for some reason decided that repeating the name of a variable (tag) is the way to go. This is a great choice for XML parsers but a poor alternative when XML is written/read by humans.

Closing tags do help when the nesting level runs deep. But it does hurt in cases when there is a need to express a simple construct with just a few (or one!) data items. Problem is, our brain can only process limited number of items at a time, so intermixing data that needs to be processed with tags that serve as delimiters for this data makes comprehension more difficult. For simple lists, a comma-delimited format could be a better choice in many situations.

In general, repeating the same set of tags over and over again to define repetitive groups makes XML difficult to read, especially when each element contains just text:

    <welcome-file-list>
        <welcome-file>index.html</welcome-file>
        <welcome-file>index.htm</welcome-file>
        <welcome-file>index.jsp</welcome-file>
    </welcome-file-list>

Compare this with a simple property/comma delimited format:


welcome-file-list= index.html, index.htm, index.jsp

Finally, what's up with angle brackets? I suppose, brackets could be justified when an element has multiple attributes. In many cases, however, elements don't have attributes and so an angle bracket is simply a way to distinguish a tag name from data. This is again, counter-intuitive and different from many modern programming language. Normally, variable names are not bracketed or quoted, instead, values are. Also, if there was a need to use a special symbol for denoting variables, wouldn't using "$" or "${}" be a more intuitive option for most of us?

Of course, XML has many advantages, the key one being that it is very easy to develop grammars for XML documents (using DTD or Schema). Another one is the fact that grammars are extensible via namespaces. Finally, any XML grammar can be parsed by any XML parser; to a parser an XML document is just a hierarchy of elements.

This simplicity, however, comes at great price. Expressiveness of XML is extremely limited. It only has a limited number of constructs and no operators. While it's adequate for its role as a markup language for text files, it puts a lot of constraints on any more-or-less complex metadata format, let alone something requiring procedural logic, such as Ant or XSLT. As a result, intuitiveness of XML-based grammars suffers.

I'm not saying that we must stop using XML altogether. It has its place. But we should not be applying it blindly just because it's the only widely available tool for creating domain-specific languages. For starters, BNF/EBNF, should be part of any developer's arsenal (along with ANTLR). And good old name/value pair and comma-delimited formats should be seriously considered for simple situations that do not require support for hierarchical structures.

You Ain’t Gonna Need ESB

Bobby Wolf posted a great article about a wide-spread problem plaguing many SOA implementations: over-engineering of SOA infrastructure, meaning that people rollout products that are not particularly required to implement their business services. He specifically talks about ESBs, but I would say that “you ain’t gonna need it” principle should be applied to any component of a SOA stack. For example, why implement a super-expensive BPM suite (or a BPEL engine) when an organization is simply trying to build some data services? Or why pay for a registry if there are only a handful of services in place?

What many people don’t realize is just how many SOA-related features are already provided by application servers. Open-source Glassfish server supports JAXR Web services repositories, XSLT mediations, JBI integration and provides extensive Web services monitoring capabilities. It can be augmented by several open-source JBI-based ESBs, such as OpenESB. Other application servers provide similar set of features, although using different technologies (e.g., SCA instead of JBI in IBM products). With an application server being so powerful, there should be a very good reason and business rationale for upgrading to commercial SOA/integration products.

In my mind the right strategy for any SOA is to first implement some services that provide immediate value to the business stakeholders. These services should be implemented using the products already owned and used by an organization (or by using open source ones). After the initial success of this implementation, the organization can evaluate commercial products to see how these products will help implement next set of business requirements/goals more efficiently.

Is The End of SOAP Dominance Nearing?

SOAP-based services currently dominate the enterprise landscape. Main reasons this are:

  • SOAP tight coupling with WSDL. Until recently, SOAP was the only supported WSDL binding. WSDL, with all of its issues (such as the convoluted structure), remains the only widely accepted vendor-neutral way of defining services.
  • In Java world, SOAP was promoted by adding its support to J2EE/Java EE via JAX-RPC and JAX-WS specifications. There is a way to create RESTful services using JAX-WS, however this approach has not gained wide acceptance due to a number of reasons (lack of standardized WSDL binding being one of them).
  • Solid support by software vendors. SOAP is supported in all application servers; there are plenty of development tools (including Eclipse and NetBeans) that can help with creating SOAP-based services.
  • Availability of a number of WS-* specifications that only work with SOAP. WS-* specs, such as WS-Security and WS-ReliableMessaging heavily rely on SOAP headers for passing message metadata.

SOAP, however, is far from perfect. It adds complexity and overhead and makes implementation of message intermediaries (e.g., gateway products) more difficult (XML message has to be parsed in order to obtain any information about the message, such as an operation name). On the other hand, the uptake of WS-* standards has been very slow, perhaps with the exception of WS-Security, so the benefit of WS-* integration with SOAP failed to materialize. Another supposed SOAP benefit is the protocol neutrality, i.e., a SOAP message looks the same over HTTP, JMS, RMI or any other supported protocol. However, most services are implemented using HTTP, or in some cases, JMS without any headers (unless WS-* is used), so this benefit has limited value.

So it is fair to say that in many instances, SOAP itself does not add any value to Web services implementation. Nevertheless, it’s continued to be used because of the reasons listed above. But this situation might be changing in the future.

WSDL 2.0 fully supports HTTP binding. WSDL 2.0 stirred a lot of controversy (I personally think that it’s a step forward from WSDL 1), however, it was finally accepted as a W3C recommendation in June, so we can fully expect that vendors support will follow.

Additionally, JAX-RS specification was created and included into Java EE 6. This will also promote development of RESTful services behind the firewall.

Currently, there a preconceived notion that REST is only good for Internet/WEB 2.0 environment, whereas enterprise services should all be using SOAP. The emergence of the two new standards will help change this notion.

Ideally, developers should be able to use the simplest binding that gets the job done. HTTP/REST can be a good choice for many “query-style” services. Services that accept more complex data can use “Plain Old XML” over HTTP post. Finally, services that have to rely on WS-* specifications will utilize SOAP. In all these cases, developers should be able to use same set of APIs for invoking the service and same set of annotations (obviously, with different parameters) for creating the service.

Perhaps this is how Web service development will be done in the future, but we are certainly not there yet.

It Takes a Mental Shift to Benefit from XML Appliances

This article on techtarget is a great illustration of my point from the previous post about the importance of the proper design patterns and techniques required to be able to benefit from XML appliance capabilities.

When implementing Web services Java developers tend to think in terms of Java classes that XML documents map to. Using XSLT (or even Schema) for implementing part of their processing logic is not on their list because the common thinking is that it is too expensive to do it in Java.

With XML appliances the situation is exactly the opposite. XSLT all of a sudden becomes one of the best performing part of an application (although, I would imagine that using Java hardware acceleration such as the one provided by Azul might once again change that). This could be a serious “paradigm shift” for many developers and architects.

Another obstacle to more effective usage of appliances could simply be the lack of XSLT skills. XSLT is essentially a functional language and so it comes with a learning curve attached, especially for complex transformations. It is important to have a good knowledge of XSLT to understand what kind of work can be “outsourced” to an appliance. Not that many developers have this knowledge today, but perhaps it will change with more widespread use of XML appliances.

Improve Your Application Performance with XML Appliance

XML appliances are capable of extremely fast XML parsing and transformation (sometimes the term “wire-speed” is used). The speed is achieved by using hardware acceleration, specially written XML parsers and XSLT engines (you won’t find Xerces on these devices) and optimized operating system (usually, a trimmed-down version of Linux or BSD).

How fast are XML appliances? I’m not aware of published benchmarks; however, I did have a chance to conduct an informal performance testing of DataPower XI50 for a client and the results were quite impressive; for example we saw little to no overhead validating medium-size XML schemas.

Clearly, offloading as much XML processing as possible to an appliance could be a performance booster. So what kind of processing could be done on the device? Here are my recommendations based on the capabilities of the DataPower appliance; the situation could be slightly different with devices from other vendors.

Read the rest of this post »