Batch processing (a.k.a. “bulk processing”) is dull and boring compared to the new world of SOA, Software as a Service and Web 2.0. It’s hardly ever mentioned these days, so one can get an impression that batch processing all but disappeared from an enterprise and got replaced by “enterprise mashupsâ€, or, at the very least, Web services.
Of course, nothing can be further from the truth. While Web services indeed begin playing a key role in many organizations, batch files still remain one of the most widespread form of system integration due to a large number of “legacy” systems that rely on it. At the same time, batch processing presents certain challenges to developers; these challenges include high volume/high throughput requirements since batch processing is inherently “spikyâ€, ability to deal with failures, which usually requires checkpoint/restart capabilities and many others.
The way batch files are processed varies greatly from system to system; in most cases developers end up creating their own custom frameworks to handle batch processing requirements.
In many cases, ETL tools become part of the solution since most ETL tools have built-in batch processing capabilities. As a result, batch processing has become the area of proprietary solutions and expensive tools. In spite of the batch processing being part of IT landscape for tens of years, there has been no framework or platform that would introduce some notion of uniformity in batch processing and save developers from having to reinvent the wheel (I’m talking about distributed systems; batch processing on mainframe is a different story).
Spring batch framework is aiming to change that. It builds upon the “template†pattern widely used in Spring to support JDBC, JMS, transactions and other APIs. Spring batch supports processing of batches and also provides “retry†capabilities. Details are pretty sketchy at this point (there is almost no documentation), some information is available in this blog entry.
I think this is a great news. Spring is very widely adopted, it is even claimed to become “new Java EE”. So Spring has enough clout to become the new “batch processing container” which can be embedded in any modern application server.
In general, I think that we should treat batch processing as an integral part of SOA and enterprise architecture. Batch processing is not going away any time soon and, let’s face it, some batch processes will never be replaced. We can even see some new sources of batch processing, for example, an RSS feed, if it’s big enough, is no different than traditional batch file in terms of its processing requirements. Batch feed is just another source of events for an enterprise; these events can be processed same way events from other sources are processed. From SOA prospective it should not matter whether a service was invoked directly by its client or by a batch file processing routine (that can generate events processed by SOA). In reality, however, this concept has some challenges; order of records in a batch file being one of them since often times this order is significant. Performance/throughput is another important consideration; translating a batch record into “canonical” XML representation and invoking a remote service for each record could be a costly proposition.
This is why we need frameworks like Spring batch that could help us deal with these problems. It would also be nice if ESB vendors incorporated batch processing support into their products, after all dealing with different transports/data formats is one of the key capabilities that ESB is supposed to provide.