Abstract: A method for enhancing reliability while upgrading a software program implemented in a clustered computer system from a first version to a second version. The software program is implemented as software modules running on a plurality of computers coupled in a cluster configuration in a clustered computer system. The method includes ascertaining a certification level associated with each of the software modules. If a certification level of a given software module of the plurality of software modules has a first certification level, the method includes limiting a load level on the given software module to a first load level. If a certification level of a given software module of the plurality of software modules has a second certification level, the method includes allowing the load level on the second routing transaction requests to reach a second load level higher than the first load level.
Abstract: A method for maintaining a predefined acceptable fault tolerance level for a plurality of software modules implementing a software program running on a first plurality of computers coupled together in a cluster configuration in a first cluster in a clustered computer system. The first plurality of computers being coupled to a first intelligent director agent. The method includes tracking, using the first intelligent director agent, status of the software modules running on the first plurality of computers. The method also includes ascertaining a fault tolerance level associated with the software program, with the ascertaining being ascertained by examining the status of the software modules running on the first plurality of computers. If the fault tolerance level is below the predefined acceptable fault tolerance level, the method also includes searching for a first suitable computer among the first plurality of computers to load another module of the software program thereon.