Archive for February, 2009

Using Jython 2.2.1 with WSAdmin Tool

Posted on 02/05/2009 , Alexander Ananiev, 3 Comments ( Add )

In one of the previous posts I complained that wsadmin tool (which is the main WebSphere Application Server administration tool) still relies on Jython 2.1, which is quite old.

The issue became critical when I realized that jython automatically re-compiles classes compiled with a different jython version. In my case, I was using Jython 2.2 for my Ant scripts and Jython 2.1 for wsadmin scripts. Some of the code was shared. This led to the situation of concurrent builds stepping on each other by dynamically re-compiling classes with different jython versions. The error looked something like that:


File "<string>", line 5, in ?
java.lang.ArrayIndexOutOfBoundsException: -4
at org.python.core.imp.createFromPyClass(Unknown Source)

Bugs like that are always fun to troubleshoot.

Going back to 2.1 was not an option for me since I use closures and “new classes” in quite a few places. So I tried putting jython 2.2.1 on wsadmin classpath and it worked without a hitch with thin wsadmin client. All my of wsadmin scripts work without a problem.

Here is a sample wsadmin.bat file that I use. This file utilizes thin client jars. Note how in line 85 my jython.jar (which is jython 2.2.1) is added to the classpath so it would override jython packages supplied with WAS.

One possible downside of this approach is running into issues with IBM support in case if you need to open a PMR related to wsadmin scripting.

Value of IT Automation

Posted on 02/01/2009 , Alexander Ananiev, No Comments ( Add )

Intuitively we all understand that automating IT operations, including builds, deployments, configurations, upgrades and so on is a good thing. We all know that humans make mistakes and mistakes can be costly when they affect a large group of people (e.g., a large user community) or otherwise result in lost revenue to a business.

But how error-prone human actions really are? Note that we’re not talking about normal error rate in software development or other creative fields. Clearly, it is very difficult if not impossible to eliminate errors from occurring when the task can’t be formalized. On the other hand, a large area of IT is related to operations and maintenance and it involves mostly predictive and repetitive tasks. There are many tools, from simple scripts to super-expensive enterprise products that deal with automating these types of tasks. Knowing the probability of human error would help us estimate potential benefits from these tools and, consequently, assess the return on investment.

The classic formula for calculating human reliability can be found here. Without going too much into the math, empirically we can ascertain the following:

  • Every action performed by a human has a probability of error. It is never zero.
  • Most tasks (at least, in IT) consist of multiple steps (actions). E.g., a change may have to be made on multiple severs.
  • The likelihood of error goes up proportionally to the number of steps.

So it should not come as a surprise that the probability of error could be quite high for complex task. According to the data published on Ray Panko’s website, 28 experienced users on average had 33% error rate in a task involving 14 command-line-based steps. Another interesting tidbit from the same site is 50% error rate in following a checklist. It is unfortunate that the details of these studies are not documented on the site.

Of course, many of these errors can be caught and corrected via testing. It is common knowledge that every change has to be accompanied by some verification or “smoke” testing.

But some changes are impossible or very expensive to validate. Imagine having to change JVM maximum heap size to prevent an application running out of memory. Imagine also that this is a high volume application that runs on four different servers. Imagine further that one server out of four was not updated by mistake. You are not going to find out about it until the application starts crashing on that server under load – and this will be the worst time for dealing with this issue. Now, what if a parameter that had to be changed was some obscure garbage collection setting that was going to improve application’s performance. Users will be experiencing intermittent performance issues but there will be nothing explicitly pointing to the offending server. Discovering the root cause of the problem could take quite a while. The bottom line is that some errors can only be discovered by users at which point the cost of fixing them is going to be substantial.

I think that we tend to overestimate the reliability of human actions and underestimate the cost of fixing errors. After all, how hard could it be for an experience administrator to run a few commands? And it could be very easy but it still does not make the actions of this administrator any more reliable.

The bottom line that almost any investment in IT automation is well worth it. Unfortunately, this view is not uniformly accepted. Many organizations still live in stone age when it comes to automation, but that’s a subject for another post.

Most Popular Posts

Recent Tweets

Recent Posts

Blog Categories

Blog Archives