All posts by Alexander Ananiev

WebSphere Administration: Finding Cluster Members

There are many instances when it is necessary to find members (servers) that constitute a cluster. For example, to script starting and stopping an application, we need to know names of servers and nodes where the application is installed. Or we may want to restart all members of a cluster after some configuration change.

As always, this is easy to do when you know what you’re doing. A query-like activity in WAS environment usually heavily relies on AdminConfig object. Our example is no exception:


import sys

"""
List application servers that belong to a particular cluster
Cluster name is passed as a parameter to the script
"""
cluster_name=sys.argv[0]

# Get configID of the cluster
cluster_conf_id = AdminConfig.getid("/ServerCluster:"+cluster_name )
if not cluster_conf_id:
    raise "Cluster %s does not exist!" % cluster_name

# Get confids of the cluster members
member_conf_ids = AdminConfig.showAttribute(cluster_conf_id, "members")
# AdminConfig returns the list in [], get rid of the brackets
member_conf_ids = member_conf_ids[1:-1]
print "Cluster %s has following members:" % cluster_name  
# split by space
for member_conf_id in member_conf_ids.split():
    # Obtain server name and node name
    member_name=AdminConfig.showAttribute(member_conf_id, "memberName")
    node_name=AdminConfig.showAttribute(member_conf_id, "nodeName")
    print node_name+"/"+member_name


Now we can do a lot of useful things with node_name and member_name variables; for example, we can try to get an MBean of this server and check its state. Or we can attempt to restart it. I will cover this in one of the future posts.

This post is part of the series on WebSphere Application Server administration. Please subscribe to this blog if you’d like to receive updates.

WebSphere Administration: How to Force Node Synchronization

It is often desirable to “push” configuration changes to nodes in the cell without having to wait for the periodic node sync process to kick in. This becomes a requirement if you want to run some kind of automated testing as part of your configuration/deployment process (which is a very good practice). For example, you may have a build process which runs HTTPUnit tests immediately after the change.

In this case, you really don’t want to rely on any artificial “sleeps” and delays in your script, so triggering synchronization programmatically is the best option.

This is actually quite simple to accomplish, as illustrated by the following script:


"""
Sync configuration changes with nodes
"""

# Commit changes to the configuration repository 
AdminConfig.save()

# Obtain deployment manager MBean
dm=AdminControl.queryNames("type=DeploymentManager,*")

# "syncActiveNodes" can only be run on the deployment manager's MBean, 
# it will fail in standalone environment
if dm:
    print "Synchronizing configuration repository with nodes. Please wait..."
    # Force sync with all currently active nodes
    nodes=AdminControl.invoke(dm, "syncActiveNodes", "true")
    print "The following nodes have been synchronized: "+str(nodes)
else:
    print "Standalone server, no nodes to sync"


This post is part of the series on WebSphere Application Server administration. Please subscribe to this blog if you’d like to receive updates.

WebSphere Administration: AdminControl vs AdminConfig

AdminContol and AdminConfig are two key scripting objects that any WAS administrator must know by heart.

The objects are available from wsadmin WAS administration tool.

AdminConfig provides an interface to the WAS configuration repository available under profile_root/config. All AdminConfig services are provided by the Deployment Manager (assuming you’re connecting remotely); there is no dependency on node agents.

AdminControl provides an interface to JMX managed beans (MBeans) in your WAS environment. In general, MBeans are only available for the resources that are running. For example, if an application server is down, the corresponding MBean will not be available.

AdminConfig and AdminContol could complement each other nicely, especially when you need to find out the status of “things” in a WAS cell. To illustrate the point, here is a simple script that prints the status of all application servers within a cell:


"""
List all application servers in a cell and print their status (up or down)
"""

# Get the string with config ids of all serves
server_confids=AdminConfig.list("Server")
#Iterate over all servers - config ids are separated by \n 
for server_confid in server_confids.split("\n"):
    server_confid=server_confid.strip()
    # obtain the type - types are APPLICATION_SERVER, DEPLOYMENT_MANAGER, NODE_AGENT, WEB_SERVER
    server_type=AdminConfig.showAttribute(server_confid, "serverType")
    
    # we are only interested in application servers
    if server_type == "APPLICATION_SERVER":
        # obtain the name of the server
        server_name=AdminConfig.showAttribute(server_confid, "name")
        print "Checking the state of the server %s of type %s" % (server_name, server_type) 
        
        #Now we can try to get the mbean of the server. We could've used AdminConfig.getObjectName() as 
        #a shortcut, but for the sake of the example we'll use AdminControl explicitly
        
        # Query to search for mbeans - note the wildcard at the end to match the rest of mbean parameters
        # Note that we're simplifying a little bit since we're ignoring node names -
        # server names may not be unique within cell  
        server_query="type=Server,name=%s,*" % server_name
        server_mbean=AdminControl.queryNames( server_query )
        # AdminContol returns None if there is no running mbean for this server
        if server_mbean:
            print "Server is up"
        else:
            print "Server is down!"

This post is part of the series on WebSphere Application Server administration. Please subscribe to this blog if you’d like to receive updates.

Using Jython 2.2.1 with WSAdmin Tool

In one of the previous posts I “complained”:/was-70-is-still-on-jython-21 that wsadmin tool (which is the main WebSphere Application Server administration tool) still relies on Jython 2.1, which is quite old.

The issue became critical when I realized that jython automatically re-compiles classes compiled with a different jython version. In my case, I was using Jython 2.2 for my Ant scripts and Jython 2.1 for wsadmin scripts. Some of the code was shared. This led to the situation of concurrent builds stepping on each other by dynamically re-compiling classes with different jython versions. The error looked something like that:


File "", line 5, in ?
java.lang.ArrayIndexOutOfBoundsException: -4
at org.python.core.imp.createFromPyClass(Unknown Source)

Bugs like that are always fun to troubleshoot.

Going back to 2.1 was not an option for me since I use closures and “new classes” in quite a few places. So I tried putting jython 2.2.1 on wsadmin classpath and it worked without a hitch with “thin wsadmin client”:/wsadmin-thin-client. All my of wsadmin scripts work without a problem.

Here is a “sample wsadmin.bat file”:/files/wsadmin/wsadmin_thin.bat that I use. This file utilizes thin client jars. Note how in line 85 my jython.jar (which is jython 2.2.1) is added to the classpath so it would override jython packages supplied with WAS.

One possible downside of this approach is running into issues with IBM support in case if you need to open a PMR related to wsadmin scripting.

Value of IT Automation

Intuitively we all understand that automating IT operations, including builds, deployments, configurations, upgrades and so on is a good thing. We all know that humans make mistakes and mistakes can be costly when they affect a large group of people (e.g., a large user community) or otherwise result in lost revenue to a business.

But how error-prone human actions really are? Note that we’re not talking about normal error rate in software development or other creative fields. Clearly, it is very difficult if not impossible to eliminate errors from occurring when the task can’t be formalized. On the other hand, a large area of IT is related to operations and maintenance and it involves mostly predictive and repetitive tasks. There are many tools, from simple scripts to super-expensive enterprise products that deal with automating these types of tasks. Knowing the probability of human error would help us estimate potential benefits from these tools and, consequently, assess the return on investment.

The classic formula for calculating human reliability can be found “here”:http://www.ida.liu.se/~eriho/HRA_M.htm. Without going too much into the math, empirically we can ascertain the following:

* Every action performed by a human has a probability of error. It is never zero.
* Most tasks (at least, in IT) consist of multiple steps (actions). E.g., a change may have to be made on multiple severs.
* The likelihood of error goes up proportionally to the number of steps.

So it should not come as a surprise that the probability of error could be quite high for complex task. According to the data published on “Ray Panko’s website”:http://panko.shidler.hawaii.edu/HumanErr/Multiple.htm, 28 experienced users on average had 33% error rate in a task involving 14 command-line-based steps. Another interesting tidbit from the same site is 50% error rate in following a checklist. It is unfortunate that the details of these studies are not documented on the site.

Of course, many of these errors can be caught and corrected via testing. It is common knowledge that every change has to be accompanied by some verification or “smoke” testing.

But some changes are impossible or very expensive to validate. Imagine having to change JVM maximum heap size to prevent an application running out of memory. Imagine also that this is a high volume application that runs on four different servers. Imagine further that one server out of four was not updated by mistake. You are not going to find out about it until the application starts crashing on that server under load – and this will be the worst time for dealing with this issue. Now, what if a parameter that had to be changed was some obscure garbage collection setting that was going to improve application’s performance. Users will be experiencing intermittent performance issues but there will be nothing explicitly pointing to the offending server. Discovering the root cause of the problem could take quite a while. The bottom line is that some errors can only be discovered by users at which point the cost of fixing them is going to be substantial.

I think that we tend to overestimate the reliability of human actions and underestimate the cost of fixing errors. After all, how hard could it be for an experience administrator to run a few commands? And it could be very easy but it still does not make the actions of this administrator any more reliable.

The bottom line that almost any investment in IT automation is well worth it. Unfortunately, this view is not uniformly accepted. Many organizations still live in stone age when it comes to automation, but that’s a subject for another post.

DataPower News

I’ve long been a “fan of XML Appliances”:/is-xml-appliance-an-ultimate-esb. Looks like IBM customers like the appliance idea as well. IBM said that DataPower was one of the top-selling products and also “announced”:http://www.infoq.com/articles/cuomo-websphere-trends-2009 that “DataPower-lution”:http://www.ibm.com/developerworks/blogs/page/gcuomo?entry=datapower_lution is one of their strategic directions for 2009. Basically, more and more edge functions will be moving into the appliance. And why not use the same device for XML acceleration, load balancing, crypto acceleration, caching, perhaps even WebSEAL replacement (it’s just a fancy reverse proxy after all). We’ll see how this vision plays out.

In a related news, there is finally a “DataPower book”:http://www.ibm.com/developerworks/blogs/page/woolf?entry=ibm_websphere_datapower_soa_appliance1 and it’s 960 pages long. And this is before IBM started adding all these edge functions to the device :).

Exception Handling in WSAdmin Scripts

Using AdminTask in wsadmin often results in getting ugly stack traces. In fact, AdminTask always produces a full java stack trace even when the error is fairly innocuous (e.g., a resource name was mistyped). The stack trace in this situation is pretty useless; it could actually confuse operations staff as it might be interpreted as a problem in IBM code.

It is, in fact, quite easy to deal with this situation in Jython and suppress the annoying stack trace:


import sys
from com.ibm.ws.scripting import ScriptingException
...
    try:
        AdminTask....
    except ScriptingException:
        # note that we can't use "as" because of python 2.1 version, have to use sys
        print "Error:\n"+str(sys.exc_info()[1])

Stack Traces and Consulting Rates

Long and unwieldy stack traces is a common occurrence when dealing with Java EE application servers. Here is “an example”:/files/wps_error_example.txt. Many (if not all) of these products re-throw the same exception multiple times which complicates things even further. Figuring out the root cause of an exception is a major undertaking.

Of course, most of the trace is useless when using proprietary products since it points to classes that you don’t have the source code for. And not only you – level 1 of support most likely can’t get to the source code either. As a result, 90% of the trace has little to none immediate value.

As a rule, the more complex the product, the longer the stack trace. Makes sense, right? You’ve got more layers and components, each layer thinks that it is its duty to dump the whole thing to the log and re-throw.

May be we should start using stack traces as a code complexity metric. It would be much more telling than cyclomatic complexity.

I also think that there is a correlation between an average length of a stack trace and an average consulting rate that users of the product pay for development and support. So may be at the end of the day, developers and administrators should not grumble about it to much and I should just shut up.

WAS 7.0 is Still On Jython 2.1

I’ve started playing with WebSphere Application Sever 7.0 (the latest and greatest) and to my surprise discovered that it still “uses Jython/Python 2.1”:http://publib.boulder.ibm.com/infocenter/wasinfo/v7r0/index.jsp?topic=/com.ibm.websphere.soafep.multiplatform.doc/info/ae/ae/cxml_jython.html as a scripting language for its wsadmin tool.

Jython 2.2 has been around for quite a while and, from my experience, is very stable. So it’s odd that IBM choose not to upgrade it. The difference between 2.1 and 2.2 is quite significant, the biggest selling point of 2.2 is “new style classes”:http://docs.python.org/reference/datamodel.html#newstyle with a unified type model. Python 2.2 also supports properties. I don’t believe there is closures support in 2.1 either.

Why is it important? Well, using Java for WAS administration is hard; the API is obscure and poorly documented. This makes Jython the only game in town (with JACL deprecated some time back). So being able to use a modern version of Jython is highly desirable.

I’m still hoping that IBM might upgrade jython as part of one the minor upgrades; WAS 8.0 is probably a long time away.

Using Thin WSAdmin Client with WPS/WESB

Quick tips for those wishing to use “thin wsadmin client”:http://publib.boulder.ibm.com/infocenter/wasinfo/v6r1/index.jsp?topic=/com.ibm.websphere.express.doc/info/exp/ae/txml_adminclient.html with process server/WESB: add com.ibm.soacore.runtime_6.1.0.jar to the classpath.

Normally, you need the following jars for the client to work: com.ibm.ws.admin.client_6.1.0.jar, com.ibm.ws.security.crypto_6.1.0.jar com.ibm.ws.webservices.thinclient_6.1.0.jar.

However, if you want to use any of the SCA commands available in WESB, the extra jar is needed. Otherwise, you’ll be getting “class not found” for SCACommandException. The jar is available under wesb_root/plugins.

I don’t think it’s documented in infocenter.

Building Windows NT

I’ve been reading a relatively old but nevertheless fascinating book called “Showstopper”:http://www.amazon.com/Show-Stopper-Breakneck-Generation-Microsoft/dp/0029356717/ref=sr_1_9?ie=UTF8&s=books&qid=1228778060&sr=8-9 about development of Windows NT. I was struck by the author’s account of NT’s build process, specifically its low degree of automation.

NT was obviously a high-intensity, almost a death march kind of project and so the builds had to be churned out at a quick pace:

…the number of builds grew from a couple to a half dozen some week…

This may not sound like much, but since NT was getting quite big and complex, it kept the guys in the build lab busy. The builds were so critical that at some point the technical lead of the project, Dave Cutler, had to take over the build lab. This, however, did not improve the way builds were done. One of the members of the build team remembers:

He is not giving us the list, he’s basically saying, ‘Go to this directory and sync this file.’ He’s saying, ‘Pick up this file, do this, do that’.

The release process was pretty haphazard too according to another team member:

We have all these cowboy developers, just slinging code like crazy, calling out: “We need another build!”

And, of course, continuous integration was not invented yet:

We’d think we were all done for the day, then test the build and it wouldn’t boot. We’d run around looking for the programmer whose code broke it.

I don’t think this situation was unique to Microsoft back then. But I also think that the attitude toward CM and development process automation has changed over the last 16 years. Today, automated builds is pretty much the norm for all but the smallest projects. Continuous integration and automated testing is becoming widespread. There is a dizzying array of build systems, “build servers”:/yet-another-build-server, version control systems and other CM and development tools.

There is a long way to go however. Implementing solid build/deployment and release management automation is still hard. Most large projects end up having to dedicate multiple highly skilled people to solving this problem. Home-grown script-based automation is still pretty much the state of the art. This is going to change. The tools will become more intelligent and advanced. I hope it won’t take another 16 years.

XML Alternatives and YAML

The need for a more human-friendly alternative to XML is apparent to many people, myself included. This is the reason why quite a few different “light-weight markup languages”:http://en.wikipedia.org/wiki/List_of_lightweight_markup_languages have been created over the last several years. I guess they are called “lightweight” because they don’t use XML-like tags that tend to clutter documents. I’ve looked at several of them and found “YAML”:http://yaml.org to be the most mature out of the bunch as well as quite human-readable (as opposed to, say, JSON) and easy to understand. You can find some very good side-by-side XML vs. YAML comparison “here”:http://yaml.kwiki.org/index.cgi?YamlExamples or “here”:http://www.ibm.com/developerworks/xml/library/x-matters23.html, the difference in readability is stunning.

From what I understand, YAML is popular in the Ruby world and it is used for various “PHP projects”:http://www.symfony-project.org/book/1_0/08-Inside-the-Model-Layer. However, it is almost unknown in Java/J2EE circles. Which is a shame. While annotations somewhat limited the spread of “XML hell” in Java applications, XML still remains a de-facto file configuration format. I would venture to say that except for few outliers, YAML would be a better option as a format for configuration files. Why is it the case? One reason is that YAML format simplifies application support. Developers often say that they don’t care about readability of XML since they use IDE or editors that hide the complexity of XML. Indeed, being able to work with XML in a nice tree view-based editor is appealing. But this does not work when application configuration needs to be quickly analyzed and potentially updated on some remote machine that most likely only has VI or notepad (which is usually the case in production environments, which I find very ironic – shouldn’t the production machine have the most advanced editors and analysis tools to make troubleshooting as efficient as possible?) in response to some production problem. For configuration files, readability and ease of understanding is the key.

Of course, there is also an old trusty property/name-value format. It is, however, very limited, since it does not support any kind of nesting or scoping. So all properties become essentially global and haven’t we learned already that global variables is not a good thing?

YAML, on the other hand, allows for expressing arbitrary complex models. Anything that can be expressed in XML can also be expressed using YAML.

On the downside, YAML does not have a very broad ecosystem. There are not that many “editors that support YAML”:http://www.digitalhobbit.com/archives/2005/09/15/yaml-editor-support/. There is a “YAML Eclipse plugin”:http://code.google.com/p/yamleditor/, but in only gives color highlighting, no validation (here is “another plugin”:http://noy.cc/symfoclipse/download.html which I have not tried yet). There is no metadata support, at least for Java, although there is a “schema validator”:http://www.kuwata-lab.com/kwalify/ for Ruby (its Java port seems to be dead). There is also no XSLT equivalent.

There are two YAML parsers for Java – “jvyaml”:https://jvyaml.dev.java.net/ and “JYaml”:http://jyaml.sourceforge.net/index.html. They kinda work, but there is certainly room for improvement in terms of error messages and just the ability to detect and reject an incorrect document. Since YAML is supposed to be a language with minimal learning curve, the parsing has to be intuitive and bulletproof.

I still think that despite the shortcomings YAML is the way to go. Perhaps I will give a closer look to one of these parsers and see if I can tweak it a bit.

Why are Environments So Poorly Supported?

A concept of “environment” permeates software development lifecycle. No application is released into production directly from developers PCs. There has to be a place where an application can go through various stages of testing. We use different environments for that purpose, e.g., “QA environment” or “acceptance environment”.

An “environment” is just a collection of resources which could include middleware and OS/filesystem resources. In the simplest case, an environment for a J2EE web application consists of a single application server. Complex applications consisting of multiple components could utilize many different resources, including several different middleware products (e.g., app server, Web server, messaging infrastructure, ESB).

For any IT organization it is important to know how their resources are used. ITIL has a concept of “CMDB”:http://en.wikipedia.org/wiki/CMDB that’s supposed to contain all IT resources. However, the granularity of CMDB implementations is usually too high (typically, a server level) which makes it difficult to use for software development. Also, CMDB is not really integrated with development processes and tools, it’s kind of a thing on its own.

Ideally, the environment concept must be supported by development, version control, change management and build/deploy tools. Environment metadata can be used to automatically install an application in a given environment. Testing tools can use this metadata to generate “smog” tests or to adjust existing test cases (e.g., by using different URLs/endpoints). There should be a capability to produce various reports showing what version of what application is installed in what environment.

Sadly, all these wonderful features are mostly missing from modern development and CM tools. Developers rely on scripting and informal use of environment variables. Essentially, today, each application has its own “selfish” view of what an environment is. This makes providing consistent operations and support difficult. This is especially true in virtualized environments where each logical environment may consist of many different VMs.

Case in point are build servers and build tools. I looked at several build servers and found explicit environment support only in “AntHill”:http://www.anthillpro.com/html/products/anthillpro/default.html. All the others I looked at (several, I won’t name them) omit the environment concept completely (except for some lame support of environment variables). To me, this is really odd. While build servers have their root in continues integration, their key selling point in an enterprise is actually release management (at lease for commercial products; there are many great open source build servers to choose from if continuous integration is the only goal). So how can a release process be automated if the foundation of this process is missing entirely from the tool that’s supposed to help with the automation?

Build tools are the same way. There is some deployment support in both Maven (“Cargo plugin”:http://cargo.codehaus.org/Maven2+plugin) and Ant, but no way of supporting environment as an entity.

Will Recession Provide a Boost to SOA?

There is no doubt that IT budgets “will be cut next year”:http://blogs.zdnet.com/BTL/?p=10403. The size of the cut is hard to predict, but I think that it could exceed the Gartner’s forecast.

This, however, could finally give a shot in the arm to the long stagnated SOA projects. What makes me say that? Well, as I’ve been “advocating all along”:/soa-need-not-be-costly, SOA represents an inexpensive way of adding flexibility and achieving new goals. SOA does not have to require significant new investments. Done smartly, it could be a great way of implementing new functionality without ripping the existing systems apart, which is undoubtedly won’t fly in light of highly constrained budgets.

To prove this point, in spite of lackluster economy, “the demand for SOA architects has been booming”:http://blogs.zdnet.com/service-oriented/?p=1179.

There is one caveat to this view – organizations need to stop approaching SOA from the infrastructure standpoint (i.e., what ESB product do I need?) and start using it as a solution to real business needs. I’m not entirely sure that it’s going to happen, but bad economy could be just the motivation for that.

Updated Jython Ant Task

I’ve updated Ant Jython task with a number of new features:

* Jython path is now handled by a separate JythonPath task.

* Jython interpreter is now scoped to Ant project. This means that you can have multiple Jython calls withing the same project that share common imports and variables.

* Jython task now supports nested text, similar to the “script” task.