XML appliances are capable of extremely fast XML parsing and transformation (sometimes the term "wire-speed" is used). The speed is achieved by using hardware acceleration, specially written XML parsers and XSLT engines (you wonâ€™t find Xerces on these devices) and optimized operating system (usually, a trimmed-down version of Linux or BSD).
How fast are XML appliances? Iâ€™m not aware of published benchmarks; however, I did have a chance to conduct an informal performance testing of DataPower XI50 for a client and the results were quite impressive; for example we saw little to no overhead validating medium-size XML schemas.
Clearly, offloading as much XML processing as possible to an appliance could be a performance booster. So what kind of processing could be done on the device? Here are my recommendations based on the capabilities of the DataPower appliance; the situation could be slightly different with devices from other vendors.
"Shredding"â€ could also be very effective for high-throughput XML processing. Shredding is the process of breaking up large documents into smaller fragments and sending them off to the application for concurrent processing. Shredding helps dealing with the documents consisting of a large number of repetitive groups. Each group/fragment can be processed independently without waiting for the completion of the processing of the entire document. Note that shredding is different from "splitting" which allows for processing of different logical parts (different types) of the document independently (as opposed to repetitive groups). Shredding is the most useful for processing large bathes of data of the same type.
Unfortunately, shredding does not seem to be supported by DataPower, although the appliance does support streaming XML processing which wouldâ€™ve worked very nicely in conjunction with shredding logic. Perhaps devices from other vendors do provide this support.
The bottom line is that just plugging the appliance in and using it in a gateway mode is not necessarily going to improve XML processing performance by much. Web services (and any other application components that interface via XML such as XML-based batch file processing components) have to be designed with the appliance in mind. In the course of the design effort, developers need to identify pre-processing or post-processing logic that can be implemented in XSLT as opposed to making it all applicationâ€™s responsibility. Anything that uses XSLT can then go to the appliance to improve the processing speed and throughput.
You may want to look at vtd-xml as the state of the art in XML processing, consuming far less memory than DOM