Rules-Based, Mode-Driven Manager for Timer Bounded Arbitration Protocol Based Resource Control

Info

Publication number: 20110113228
Type: Application
Filed: Oct 18, 2010
Publication Date: May 12, 2011
Applicant: QUANTUM CORPORATION (San Jose, CA)
Inventor: William J. MIDDLECAMP (Apple Valley, MN)
Application Number: 12/906,220

Abstract

An example apparatus includes a processor, a memory, and an interface that connects the processor, the memory, and a set of components. The set of components includes a first component configured to acquire a mode from members of an HA cluster and a second component configured to enforce mode pairing rules for members of the HA cluster. Once the desired mode pairing has been determined, a third component takes actions configured to either achieve the mode pairing according to rules for members of the HA cluster or to selectively force a hardware reset of one or more members of the HA cluster upon determining that a split brain scenario is possible based, at least in part, on the mode of the members of the HA cluster. The example apparatus therefore implements a rules-based manager for timer bounded arbitration protocol based resource control.

Description

Description

CROSS REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Patent Application 61/259,271 filed Nov. 9, 2009.

BACKGROUND

Networked systems of computers allow parallel and distributed computing. Networked systems of computers may act as clusters, where the computers are nodes in the clusters. Clusters may function collectively as operational groups of servers. The nodes of a cluster or other multi-node computer system may function together to achieve high server system performance, availability, and reliability. High availability (HA) systems facilitate a standby server taking over in the event of undesired failures on a primary server. Goals for a HA system include providing safety and uninterrupted operation. If a primary server fails, failover should occur automatically and operations should resume on a secondary server. However, at any point in time, only one of the primary server or the secondary server should have write access to certain items. For example, at any point in time, there should only be one server with write access to file system metadata to prevent corruption of the metadata. When two servers both have write access, which is an undesirable state, this may be referred to as a split brain scenario (SBS).

Conventional systems may have employed protocols and techniques for preventing multiple writer access leading to a SBS. However, these conventional systems may have had default settings that unintentionally allowed both the primary server and the secondary server to have write access resulting in SBS under certain circumstances. Additionally, these systems may have had settings that unintentionally led to unnecessary hardware resets when an ambiguous or non-deterministic state was encountered. One unintentional occurrence that could lead to an undesired hardware reset involves a communications network breakdown or slowdown. When synchronizing communications are lost, a hardware reset may be forced, even though all parts of the system except the communications network are healthy and single writer access is still in place. Operation of an HA cluster includes times when the protection mechanism must be stopped, which requires stopping all but one of the processors to avoid SBS. A state-based system is insufficient to protect against SBS because one or more processors may change state without awareness of that state change by another processor under certain types of equipment failures or operational mistakes.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate various example methods, apparatus, and other example embodiments of various aspects of the invention described herein. It will be appreciated that the illustrated element boundaries (e.g., boxes, groups of boxes, other shapes) in the figures represent one example of the boundaries of the elements. One of ordinary skill in the art will appreciate that in some examples one element may be designed as multiple elements or that multiple elements may be designed as one element. In some examples, an element shown as an internal component of another element may be implemented as an external component and vice versa. Furthermore, elements may not be drawn to scale.

FIG. 1 illustrates a pair of metadata controllers (MDCs) used in a system for maintaining single writer access between a pair of HA servers using a rules-based, mode-driven manager (RBM) for timer bounded arbitration protocol based resource control.

FIG. 2 illustrates a method for maintaining single writer access between a pair of HA servers using an RBM for timer bounded arbitration protocol based resource control.

FIG. 3 illustrates a system for maintaining single writer access between a pair of HA servers using an RBM for timer bounded arbitration protocol based resource control.

FIG. 4 illustrates an apparatus for maintaining single writer access between a pair of HA servers using an RBM for timer bounded arbitration protocol based resource control.

DETAILED DESCRIPTION

Example apparatuses and methods work with apparatuses and methods that prevent a split brain scenario (SBS) where both servers in a high availability (HA) pair could have write access to resources (e.g., file system metadata, databases) for which there should only be single writer access. Example apparatuses and methods implement a rules-based, mode-driven manager (RBM) that enforces rules for modes into which HA cluster members can be placed. The modes are controlled by pairing rules. The modes are created by the RBM issuing commands to cluster members. The RBM collects mode and/or status information in order to determine which commands, if any, to issue to control modes into which cluster members are placed. In some circumstances, the RBM may trigger a hardware reset of one or both members of an HA cluster after acquiring information (e.g., mode, status) and attempting to alter modes by taking actions.

The mode and/or status information is collected at least when a decision point in the management chain of events for an HA cluster (e.g., subsystem startup) occurs. Upon detecting a decision point, an RBM running on one server attempts to communicate with a peer server in the HA cluster to ensure that mode-pairing rules are followed and to ensure that the cluster is not exposed to split brain scenario risk by operational mistakes.

The RBM can be viewed as a cluster manager that automates HA cluster pairwise operation. The RBM facilitates normal operation, reconfiguration, maintenance, not running in HA mode, and so on. The RBM facilitates configuring metadata controllers (MDCs) to operate properly together to avoid split brain scenario. The RBM makes it possible to be able to turn off HA protection and then to turn it back on. To achieve this desired functionality while avoiding split brain scenario, the RBM is configured to substantially instantaneously acquire a picture of the operating status of servers in a system. Acquiring substantially instantaneous information is desired because there are asynchronous processes operating in the HA system. The RBM also is configured to act on the information it acquires. Therefore, the RBM is not just a status-collection machine, but rather is an entity that can force mode transitions according to rules that maintain cluster members in desired pair-wise modes that facilitate avoiding a split brain scenario. In one example, an RBM can receive three different statuses from a server, and can force a server into one of six different modes by causing up to seven different actions.

FIG. 1 illustrates a pair of metadata controllers (MDCs) used in a system for maintaining single writer access between a pair of HA servers. Thus, FIG. 1 provides context for the example RBM for timer-bounded ARB-protocol-based resource control described herein. The pair of MDCs includes active MDC 100 and standby MDC 150.

Active MDC 100 includes an active file system portmapper (FSMPM) 110, an active file system manager (FSM) 120, and an active dead-man timer 130. The standby MDC 150 includes a standby FSMPM 160, a standby FSM 170, and a standby dead-man timer 180. The active FSMPM 110 is connected to the active FSM 120 by a socket. Likewise, the standby FSMPM 160 is connected to the standby FSM 170 by a socket.

The active FSMPM 110 is configured to provide a heartbeat signal to a coordinator (not shown). When the active FSMPM is configured to provide a heartbeat signal, the timer bounded ARB protocol may define an amount of elapsed time between heartbeats. Therefore, one way that standby FSMPM 160 can determine that active FSM 120 is not operating properly is by monitoring the heartbeat signal.

The active FSM 120 is configured to maintain ownership of an ARB block 140 by periodically writing the ARB block 140 according to the timer bounded ARB protocol. The active dead-man timer 130 is reset upon successfully writing the ARB block 140. Before the expiration of the active dead-man timer 130, the active FSMPM 110 may request permission to reset the active dead-man timer 130. If permission is granted, the active FSM 120 may attempt to maintain control by updating the ARB block 140 before the active dead-man timer 130 expires.

In this manner the active FSM 120 can negotiate for additional time to retain control of the ARB block 140. Also if permission is granted, the standby FSM 170 will not attempt to establish control of the ARB block 140 for a predetermined amount of time. Therefore, the negotiation affords the active FSM 120 the opportunity to maintain control of the ARB block 140 in situations where a hardware reset is unnecessary (e.g., minor system delay or slowdown). Furthermore, single writer access is maintained via the negotiation.

The active FSMPM 110 can also be configured to selectively force an election of an FSM to replace the active FSM 100 as the single writer upon a determination that the active FSM 100 has exited. Therefore, the active FSMPM 110 may establish the standby FSM 170 as the replacement of the active FSM 120. Accordingly, an activation command may be sent to the standby FSM 170.

The standby FSM 170 monitors the ARB block 140 for a safety period of time before writing to the ARB block 140. During the safety period of time the standby FSM 170 monitors the ARB block 140 to ensure that the active FSM 120 is not writing to the ARB block 140. Once the safety period expires and it is determined the active FSM 120 has not written to the ARB block 140, the standby FSM 170 may write to the ARB block. This way single writer access is maintained during the transition of control from the active FSM 120 to the standby FSM 170.

The active MDC 100 violates the timer bounded ARB protocol when the active FSM 120 has not exited before the active dead-man timer 130 expires. If the timer bounded ARB protocol is violated, a hardware reset will be forced, and an election is held to select a standby MDC 150 to replace the active MDC 100.

The standby FSMPM 160 is configured to selectively activate the standby FSM 170 to take control of the ARB block 140 after being elected to replace the active MDC 100. The standby FSM 170 is configured to acquire ownership of the ARB block 140, to maintain ownership of the ARB block 140 by periodically writing the ARB block 140 according to the timer bounded ARB protocol, and to reset the standby dead-man timer 180 upon successfully writing the ARB block 140. In one embodiment, the active MDC 100 and the standby MDC 150 reside on separate pieces of computer hardware and communicate over a computer network.

FIG. 1 illustrates the active dead-man timer 130 and the standby dead-man timer 180 as being internal to the active MDC 100 and the standby MDC 150 respectively. One skilled in the art will appreciate that the dead-man timers 130 and 180 may be part of an MDC or may be external to but used by an MDC. For example, a dead-man timer may be external to a process and/or hardware implementing an active MDC 100 or a standby MDC 150. Therefore, in different examples, the active dead-man timer 130 can be, but is not limited to being, a kernel timer, an operating system timer, and a timer associated with computer hardware (e.g., peripheral component interconnect express (PCIE) card) operatively connected to an interface visible to active MDC 100. In one embodiment, there is one dead-man timer per active FSM 120. Similarly, the standby dead-man timer 180 can be, but is not limited to being, a kernel timer, an operating system timer, and a timer associated with computer hardware (e.g., PCIE card) operatively connected to an interface visible to the standby MDC 150. There is one dead-man timer per standby FSM 170.

The active MDC 100 and the standby MDC 150 participate in the timer bounded ARB protocol. Functioning of the protocol, and thus functioning of the active MDC 100 and the standby MDC 150 are enhanced by example RBMs described herein. In one example, the timer bounded ARB protocol includes controlling the active FSM 120 to write the ARB block 140 once per FSM write period. The periodic writing indicates continued ownership of the ARB block 140. When a write to the ARB block 140 is successful, the active FSM 120 will reset the active dead-man timer 130 to a reset threshold period. Recall that it is the expiration of the active timer 130 that forces a hardware reset of the active MDC 100.

One skilled in the art will appreciate that a failover system for a pair of HA servers can be arranged in different environments and may experience different operating conditions, different communication conditions, and other different factors. Therefore the timer bounded ARB protocol may have different time delays. In one embodiment, the FSM write period employed by the active dead-man timer 130 is 5 seconds. The active FSMPM 110 associated with the active FSM 120 sends a request to the standby FSMPM 160 to ask permission to restart the active dead-man timer 130. The active FSMPM 110 may measure the round trip time of the request and reset the active dead-man timer 130 if permission was granted in less than a second.

In one example, an HA manager collects and reports the operating information of the servers in the HA pair. This operating information may be employed by the RBM to enhance operation of the pair of servers in the HA pair. The operating information may include information concerning modes and statuses of the individual HA servers. Example RBMs control the modes and statuses of the servers in the HA pair so that both of the servers in the HA pair do not have write access to file system metadata.

The RBM may also change the operating information and/or status of the HA servers in the HA pair. The RBM has a set of actions available to it. The set of actions facilitates putting individual servers into pairs of operating modes according to pairing rules to avoid SBS.

FIG. 2 illustrates a method 200. Computer executable instructions may be stored on a non-transitory computer readable medium. The instructions, when executed by a computer in a HA cluster, control the computer to perform method 200. Method 200 begins, at 205, when a decision point in the lifecycle of an HA cluster is detected. Method 200 proceeds, at 210, to acquire data describing an operating condition of a set of servers comprising the HA cluster. The data will include a mode of a server in the HA cluster.

Method 200 also includes, at 220, controlling at least one member of the HA cluster to selectively change mode to a target paired mode. The target paired mode is selected based, at least in part, on mode pairing rules associated with the HA cluster.

In method 200, the mode is one of: a default mode, a single mode, a config (configuration) mode, a locked mode, a peerdown mode, and a failed startup mode. In the default mode, HA monitoring for a server is ON and SMITH (Shoot Myself In The Head) reset is enabled. In the single mode, HA monitoring for a server is OFF and a peer server is communicating and in locked mode, or not communicating and designated peerdown. In the configuration mode, HA monitoring for a server is OFF and a peer server is communicating and in locked mode, or not communicating and designated peerdown. In the locked mode, a storage area network (SAN) application on the server is stopped and prevented from starting. In the peerdown mode, a peer server is OFF. In the failed startup mode, attempts to start a SAN application are blocked until a failure indicator is cleared.

In one example, the data may also include a status of the server in the HA cluster. The status is one of: unknown, stopped, running, and primary. The unknown status is reported when a peer server is not communicating. The stopped status is reported when a status command returns a first pre-determined code. The running status is reported when a status command returns a second, different pre-determined code. The primary status is reported when the server status is running and the FSMPM (file system port mapper) in the server is in the primary state.

In one example, controlling at least one member of the HA cluster to selectively change mode comprises issuing one or more of: a status, stop, start, configuration, clear, primary, and force reset command. The status command causes members of the HA cluster to report their status. The stop command causes the non-primary server in the HA cluster to be transitioned to locked mode, the primary server in the HA cluster to be transitioned to configuration mode, HA monitoring to be turned off on the primary server in the HA cluster, the SAN application to be stopped, and both servers in the HA cluster to be transitioned to default mode. The start command causes the stop command to be run to transition the cluster to default-default mode if necessary, the SAN application and HA monitor to be started on the local server in the HA cluster, SMITH reset to be enabled on the local server in the HA cluster, the SAN application and HA monitor to be started on the peer server in the HA cluster, and SMITH reset to be enabled on the peer server in the HA cluster. The config command causes the peer server in the HA cluster to be transitioned to the locked mode and the local server in the HA cluster to be transitioned to the configuration mode. The clear command clears an indicator that was set by failure of a start command. The primary command sets the status of the FSMPM on the local server in the HA cluster to primary. The force reset command triggers an immediate HA reset of the local server.

Method 200 facilitates persisting modes for members of the HA cluster. Therefore, method 200 may also include storing values for the modes associated with members of the HA cluster to maintain modes through a hardware reset. Part of the persistence can involve monitoring a file that indicates that a previous initialization of the HA cluster has failed. Since the file is monitored, in one example method 200 may also include granting permission, prior to initialization of the HA cluster, for the HA cluster to initialize.

In one example, the allowed set of mode pairs includes: default-default, default-locked, default-peerdown, single-peerdown, single-locked, config-locked, config-peerdown, locked-default, locked-single, locked-config, and locked-locked. The prohibited set of paired modes may include: single-default, single-single, single-config, config-default, config-single, and config-config.

To review the actions caused by the commands, the status command causes cluster members to report their status. The stop command causes a series of actions: the non-primary server is transitioned to locked mode, the primary server is transitioned to configuration mode, HA monitoring is turned off on the primary server, the SAN application is stopped, and then both servers are transitioned to default mode. The start command also causes a series of actions: the stop command is run to transition the cluster to default-default mode if necessary, then SMITH reset is enabled on the local server and the SAN application and HA monitor are started on the local server. Then the SMITH reset is enabled on the peer server and the SAN application and HA monitor are started on the peer server. The config command transitions the peer server to the locked mode and transitions the local server to the configuration mode. The clear command clears an indicator that was set by failure of a start command. The primary command sets the status of the FSMPM on the local server to primary. The force reset command triggers an immediate HA reset.

The pairing rules describe mode configurations for a primary server and a secondary server of the HA pair that allows single server write access to system resources. Thus, an RBM selectively sets the modes and statuses of the paired HA servers. Setting the modes and statuses of the paired HA servers overrides default behaviors of the paired HA servers and puts the paired HA servers into allowed mode pairs while preventing the paired HA servers from entering prohibited mode pairs. The RBM changes the modes and statuses of the servers in the pair of HA servers so that only a single server has write access to the system resources (e.g. file metadata) at a given time. Likewise, the RBM sets the modes and statuses according to pairing rules to avoid unintentional hardware resets.

Example apparatuses and methods rely on pairing rules that define valid mode combinations of paired HA servers. The RBM monitors the operating information of paired HA servers to ensure that the paired HA servers are operating in valid mode combinations. If the RBM detects that paired servers are not operating in a valid mode combination, the RBM will take an action to attempt to force the paired servers into a valid mode combination if possible. If the RBM is unable to move the paired HA servers to a valid mode combination, the RBM may trigger a hardware reset in one or both of the paired HA servers to prevent SBS.

FIG. 3 illustrates an HA cluster manager apparatus 300. Apparatus 300 implements an RBM 310 for timer bounded arbitration protocol based resource control. RBM 310 includes a rules logic 320, a mode selection logic 330, and an action logic 340.

Rules logic 320 is configured to acquire substantially instantaneous information about an HA cluster. The information includes at least a mode and status for members of the HA cluster. The HA cluster may include, for example, primary server 350 and secondary server 360. RBM 310 manages the HA cluster to prevent a split-brain-scenario with regards to file system resource 370. While RBM 310 is illustrated being separate from the primary server 350 and the secondary server 360, in different examples the RBM 310 may be implemented in one or both of the primary server 350 and the secondary server 360.

The mode selection logic 330 is configured to select a mode for a member of the HA cluster. The mode is selected to make the HA cluster comply with a set of allowed paired modes and to prevent the HA cluster from attaining a prohibited paired mode.

The action logic 340 is configured to prevent a split brain scenario in the HA cluster by transforming an HA cluster member mode. The cluster member mode is changed by the action logic 340 causing the performance of one or more of, a status action, a stop action, a start action, a configuration action, a clear action, a primary action, and a force hardware reset action.

In one example, control of and write access to system resources (e.g., file system resource 370) is regulated through the RBM 310. The RBM 310 monitors and sets operating modes. The modes persist across system reboots. Therefore, if the paired HA servers are rebooted, the paired HA servers do not encounter an ambiguous or non-deterministic state when initialized. Accordingly, the paired HA servers are not subject to an unnecessary forced hardware reset upon initialization.

Operating modes for members of the HA cluster may include default, single, config (configuration), locked, peerdown, and failed startup. These are the modes that can be assigned to a server in the HA cluster. The RBM 310 may employ a distributed application that puts individual servers into pairs of operating modes according to rules that prevent a split brain scenario. The RBM 310 may need to suspend SMITH resets to facilitate doing configuration changes, to restart a cluster without incurring a SMITH reset, and for other reasons. When suspending SMITH resets is necessary, the RBM uses mode pairing rules to ensure that one of the servers stops and stays stopped until the RBM tells it to restart. Therefore, before a component of a SAN file system (e.g., StorNext) application can start, the RBM gives its permission.

At decision points in the management chain of events (e.g., component startup actions), the RBM 310 attempts to communicate across a network (e.g., LAN) to a peer server computer to ensure that mode-pairing rules are followed and to ensure that the cluster is not exposed to split brain scenario risk by operational mistakes. In one example, the RBM 310 monitors operating states of the SAN file system on both servers in an HA pair and outputs modes and statuses.

In default mode, HA monitoring is turned on. When the peer server is not available for communication, the peer server is assumed to be in default mode. In default mode, SMITH reset is enabled and thus a server can force a hardware reset on itself.

In single mode, HA monitoring is turned off. For this server to be in single mode, its paired peer server must be communicating and in locked mode, or not communicating and certified as being in peerdown mode. This mode is meant for extended production operations without a redundant server, for example when one server is being repaired or replaced. When the peer server is about to be restored to service, the operating server can be transitioned from single to default mode without stopping an associated SAN file system (e.g., StorNext) application. In the single mode, SMITH reset is disabled for single-server operation.

In the config mode, HA monitoring is turned off. In this mode, a peer server must be communicating and in locked mode, or not communicating and certified as being in the peerdown mode. The config mode is intended for re-configuration and other non-production service operations. When returning to production service and the default mode, an associated SAN file system (e.g., StorNext) application must be stopped to ensure that SAN file system processes can be started correctly upon returning to default mode.

In the locked mode, an associated SAN file system application (e.g., StorNext) is stopped and prevented from starting on the local server in a pair of paired HA servers. Locked mode allows the RBM to actively query the peer server to ensure that it is stopped when the local peer is operating in single or config mode. Communication with the locked node must continue, so this mode is effective when the associated SAN file system (e.g., StorNext) application is stopped for a short period and the node will not be rebooted. If communication is lost, the peer node assumes this node is in default mode, which facilitates avoiding split-brain scenarios. Locked mode can be set programmatically to allow a cluster to be put into the config mode automatically by the RBM.

In the peerdown mode, the peer server is turned off and must not be communicating with the local server's RBM subsystem. Therefore, this mode is effective when the server is powered down. This mode is declared by the peerdown command on a working server to give information about the non-working peer server. By setting this mode, an administrator is certifying the off status of the peer, which the RBM cannot verify by itself. This allows the local peer to be in single or config mode. If the peer starts communicating while this mode is set, the setting is immediately erased, the local mode is set to default to restore HA Monitoring, and an associated SAN file system (e.g., StorNext) application is shut down, which can trigger an HA reset. The peerdown mode is changed to default mode with the peerup command. The peerdown and peerup commands should not be automated because they require external knowledge about the peer server's condition and operator awareness of a requirement to keep the peer server turned off.

In the failed startup mode, a previous attempt to start an associated SAN file system application (e.g., StorNext) with a command (e.g., service cvfs start) has failed before completion. Attempts to start the SAN file system application are blocked until this status is cleared by running the clear command.

The RBM collects server statuses along with server modes to measure the operating condition of an HA cluster. Statuses may include stopped, running, primary and unknown. The stopped status is reported when a status command (e.g., DSM_control status) has returned a first pre-determined code (e.g., false). The running status is reported when a status command (e.g., DSM_control status) has returned a second, different pre-determined code (e.g., true). The primary status is reported when the server status is running and the FSMPM is in the primary state. This combination indicates that the HA shared FSM has been activated. The unknown status is reported when attempts to communicate with the peer server fail.

Therefore the RBM controls modes into which HA cluster members can be forced. The modes are controlled by pairing rules. The modes are created by the RBM issuing commands. The RBM collects status information in order to determine which commands, if any, to issue to control modes. In some circumstances, the RBM may force a hardware reset of one or both members of an HA cluster.

FIG. 4 illustrates a computer 400 that facilitates maintaining single writer access between a pair of HA servers by participating in a timer bounded ARB protocol for resource control. Computer 400 includes a processor 402 and a memory 404 that are operably connected by a bus 412. In one example, the computer 400 may include a first component 406, a second component 408, and a third component 410. Additionally, the computer 400 may be associated with a process 414 and data 416.

The first component 406 is configured to acquire a mode from a member of an HA cluster. The second component 408 is configured to determine a desired mode pairing for the member of the HA cluster. The third component 410 is configured to take an action configured to either achieve the desired mode pairing for the member of the HA cluster or to selectively force a hardware reset of the member of the HA cluster upon determining that a split brain scenario is possible based, at least in part, on the mode of the member of the HA cluster.

The modes include: a default mode, a single mode, a config mode, a locked mode, a peerdown mode, and a failed startup mode. In the default mode, HA monitoring for a member of the HA cluster is ON and SMITH reset is enabled. In the single mode, HA monitoring for a member of the HA cluster is OFF and a communicating peer member of the HA cluster is communicating and in locked mode, or not communicating and in peerdown mode. In the configuration mode, HA monitoring for a member of the HA cluster is OFF and a communicating peer member of the HA cluster is communicating and in locked mode, or not communicating and in peerdown mode. In the locked mode, an SAN application on the member of the HA cluster is stopped and prevented from starting. In the peerdown mode, a peer member of the HA cluster is OFF. In the failed startup mode, attempts to start the SAN application are blocked until a failure indicator is cleared.

The first component 406 may also be configured to acquire a status from the member of the HA cluster. The status may be one of: unknown, stopped, running, and primary. The unknown status is reported when the member of the HA cluster is not communicating. The stopped status is reported when a status command returns a first pre-determined code. The running status is reported when a status command returns a second, different pre-determined code. The primary status is reported when the member of the HA cluster status is running and the FSMPM in the member of the HA cluster is in the primary state.

The third component 410 may force actions including a status, stop, start, config, clear, primary, and force reset. The status action causes a cluster member to report status. The stop action causes the non-primary member of the HA cluster to be transitioned to locked mode, the primary member of the HA cluster to be transitioned to config mode, HA monitoring to be turned off on the primary member of the HA cluster, the SAN application to be stopped, and both members of the HA cluster to be transitioned to default mode. The start action also causes the stop command to be run, the SAN application and HA monitor to be started on the local member of the HA cluster, SMITH reset to be enabled on the local member of the HA cluster, the SAN application and HA monitor to be started on the peer member of the HA cluster, and SMITH reset to be enabled on the peer member of the HA cluster. The configuration action causes the peer member of the HA cluster to be transitioned to the locked mode and the local member of the HA cluster to be transitioned to the configuration mode. The clear action clears an indicator that was set by failure of a start command. The primary action sets the status of the FSMPM on the local member of the HA cluster to primary. The force reset action triggers an immediate HA reset.

The third component 410 may also be configured to force a hardware reset upon determining that the HA cluster is in a prohibited paired mode and is at risk of an SBS.

Generally describing an example configuration of the computer 400, the processor 402 may be a variety of various processors including dual microprocessor and other multi-processor architectures. A memory 404 may include volatile memory (e.g., RAM (random access memory)) and/or non-volatile memory (e.g., ROM (read only memory)). The memory 404 can store a process 414 and/or a data 416, for example. The process 414 may be a RBM process and the data 416 may be co-ordination and control data.

The bus 412 may be a single internal bus interconnect architecture and/or other bus or mesh architectures. While a single bus is illustrated, it is to be appreciated that the computer 400 may communicate with various devices, logics, and peripherals using other busses (e.g., PCIE (peripheral component interconnect express), 1394, USB (universal serial bus), Ethernet). The bus 412 can be types including, for example, a memory bus, a memory controller, a peripheral bus, an external bus, a crossbar switch, and/or a local bus.

The following includes definitions of selected terms employed herein. The definitions include various examples and/or forms of components that fall within the scope of a term and that may be used for implementation. The examples are not intended to be limiting.

References to “one embodiment”, “an embodiment”, “one example”, “an example”, and other similar terms indicate that the embodiment(s) or example(s) so described may include a particular feature, structure, characteristic, property, element, or limitation, but that not every embodiment or example necessarily includes that particular feature, structure, characteristic, property, element or limitation. Furthermore, repeated use of the phrase “in one embodiment” or “in one example” does not necessarily refer to the same embodiment or example.

To the extent that the term “includes” or “including” is employed in the detailed description or the claims, it is intended to be inclusive in a manner similar to the term “comprising” as that term is interpreted when employed as a transitional word in a claim.

To the extent that the term “or” is employed in the detailed description or claims (e.g., A or B) it is intended to mean “A or B or both”. When the applicants intend to indicate “only A or B but not both” then the term “only A or B but not both” will be employed. Thus, use of the term “or” herein is the inclusive, and not the exclusive use. See, Bryan A. Garner, A Dictionary of Modern Legal Usage 624 (2d. Ed. 1995).

While example apparatus, methods, and articles of manufacture have been illustrated by describing examples, and while the examples have been described in considerable detail, it is not the intention of the applicants to restrict or in any way limit the scope of the appended claims to such detail. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the systems, methods, and so on described herein. Therefore, the invention is not limited to the specific details, the representative apparatus, and illustrative examples shown and described. Thus, this application is intended to embrace alterations, modifications, and variations that fall within the scope of the appended claims.

Claims

1. A non-transitory computer readable medium storing computer executable instructions that when executed by a computer in a high availability (HA) cluster controls the computer to perform a method, the method comprising:

upon detecting an occurrence of a decision point in the lifecycle of the HA cluster, acquiring data describing an operating condition of a set of servers comprising the HA cluster, the data including a mode of a server in the HA cluster;

controlling at least one member of the HA cluster to selectively change mode to a target paired mode, where the target paired mode is selected based, at least in part, on mode pairing rules associated with the HA cluster; and

selectively forcing a hardware reset of one or more members of the HA cluster upon determining that the mode pairing rules have been violated.

2. The computer readable medium of claim 1, where the mode is one of: a default mode, a single mode, a configuration mode, a locked mode, a peerdown mode, and a failed startup mode.

3. The computer readable medium of claim 2, where:

in the default mode, HA monitoring for a server is ON and SMITH (Shoot Myself In The Head) reset is enabled;

in the single mode, HA monitoring for a server is OFF and a communicating peer server is communicating and in locked mode;

in the configuration mode, HA monitoring for a server is OFF and a communicating peer server is communicating and in locked mode;

in the locked mode, a storage area network (SAN) application on the server is stopped and prevented from starting;

in the peerdown mode, a peer server is OFF; and

in the failed startup mode, attempts to start an SAN application are blocked until a failure indicator is cleared.

4. The computer readable medium of claim 1, the data also including a status of the server in the HA cluster.

5. The computer readable medium of claim 4, where the status is one of: unknown, stopped, running, and primary.

6. The computer readable medium of claim 5, where:

the unknown status is reported when the server is not communicating;

the stopped status is reported when a status command returns a first pre-determined code;

the running status is reported when a status command returns a second, different pre-determined code; and

the primary status is reported when the server status is running and the FSMPM (file system port mapper) in the server is in the primary state.

7. The computer readable medium of claim 1, where controlling at least one member of the HA cluster to selectively change mode comprises issuing one or more of: a status, stop, start, config, clear, primary, and force reset command.

8. The computer readable medium of claim 7, where:

the status command causes members of the HA cluster to report their status;

the stop command causes the non-primary server in the HA cluster to be transitioned to locked mode, the primary server in the HA cluster to be transitioned to config mode, HA monitoring to be turned off on the primary server in the HA cluster, the SAN application to be stopped, and both servers in the HA cluster to be transitioned to default mode;

the start command also causes the stop command to be run, the SAN application and HA monitor to be started on the local server in the HA cluster, SMITH reset to be enabled on the local server in the HA cluster, the SAN application and HA monitor to be started on the peer server in the HA cluster, and SMITH reset to be enabled on the peer server in the HA cluster;

the configuration command causes the peer server in the HA cluster to be transitioned to the locked mode and the local server in the HA cluster to be transitioned to the configuration mode;

the clear command clears an indicator that was set by failure of a start command;

the primary command sets the status of the FSMPM on the local server in the HA cluster to primary, and

the force reset command triggers an immediate HA reset of one or more servers in the HA cluster.

9. The computer readable medium of claim 1, the method comprising:

storing values for the modes associated with members of the HA cluster to maintain modes through a hardware reset.

10. The computer readable medium of claim 1, the method comprising:

monitoring a file that indicates that a previous initialization of the HA cluster has failed, and

granting permission, prior to initialization of the HA cluster, for the HA cluster to initialize.

11. The computer readable medium of claim 1, the method comprising controlling members of the HA cluster to be in a mode pairing selected from an allowed set of mode pairings comprising:

default-default, default-locked, default-peerdown, single-peerdown, single-locked, config-locked, locked-default, locked-single, locked-config, and locked-locked.

12. The computer readable medium of claim 11, the method comprising controlling members of the HA cluster to not be in a mode pairing selected from the prohibited set of paired mode states comprising:

single-default, single-single, single-config, config-default, config-single, and config-config.

13. An apparatus, comprising;

a processor,

a memory, and

an interface that connects the processor, the memory, and a set of components, the set of components comprising: a first component configured to acquire a mode from a member of an HA cluster; a second component configured to determine a desired mode pairing for the member of the HA cluster; and a third component configured to take an action configured to either achieve the desired mode pairing for the member of the HA cluster or to selectively force a hardware reset of the member of the HA cluster upon determining that a split brain scenario is possible based, at least in part, on the mode of the member of the HA cluster.

14. The apparatus of claim 13, the mode being one of:

a default mode, a single mode, a configuration mode, a locked mode, a peerdown mode, and a failed startup mode, where:

in the default mode, HA monitoring for a member of the HA cluster is ON and SMITH reset is enabled;

in the single mode, HA monitoring for a member of the HA cluster is OFF and a communicating peer member of the HA cluster is communicating and in locked mode or not communicating and in peerdown mode;

in the configuration mode, HA monitoring for a member of the HA cluster is OFF and a communicating peer member of the HA cluster is communicating and in locked mode or not communicating and in peerdown mode;

in the locked mode, an SAN application on the member of the HA cluster is stopped and prevented from starting;

in the peerdown mode, a peer member of the HA cluster is OFF; and

in the failed startup mode, attempts to start the SAN application are blocked until a failure indicator is cleared.

15. The apparatus of claim 14, the first component being configured to acquire a status from the member of the HA cluster, the status being one of:

unknown, stopped, running, and primary, and where:

the unknown status is reported when the server is not communicating;

the stopped status is reported when a status command returns a first pre-determined code;

the running status is reported when a status command returns a second, different pre-determined code; and

the primary status is reported when the member of the HA cluster status is running and the FSMPM in the member of the HA cluster is in the primary state.

16. The apparatus of claim 15, where the action performed by the third component is one of:

status, stop, start, configuration, clear, primary, and force reset, and where:

the status action causes a cluster member to report status;

the stop action causes the non-primary member of the HA cluster to be transitioned to locked mode, the primary member of the HA cluster to be transitioned to configuration mode, HA monitoring to be turned off on the primary member of the HA cluster, the SAN application to be stopped, and both members of the HA cluster to be transitioned to default mode;

the start action also causes the stop command to be run, the SAN application and HA monitor to be started on the local member of the HA cluster, SMITH reset to be enabled on the local member of the HA cluster, the SAN application and HA monitor to be started on the peer member of the HA cluster, and SMITH reset to be enabled on the peer member of the HA cluster;

the configuration action causes the peer member of the HA cluster to be transitioned to the locked mode and the local member of the HA cluster to be transitioned to the configuration mode;

the clear action clears an indicator that was set by failure of a start command;

the primary action sets the status of the FSMPM on the local member of the HA cluster to primary, and

the force reset action triggers an immediate HA reset.

17. The apparatus of claim 16, the desired mode pairings comprising:

default-default, default-locked, default-peerdown, single-peerdown, single-locked, config-locked, locked-default, locked-single, locked-config, and locked-locked.

18. The apparatus of claim 17, where prohibited mode pairings comprise:

single-default, single-single, single-config, config-default, config-single, and config-config.

19. The apparatus of claim 18, the third component being configured to force a hardware reset upon determining that the HA cluster is in a prohibited paired mode and is in danger of an SBS.

20. A high availability (HA) cluster manager apparatus, comprising:

a logic configured to acquire a substantially instantaneous state of an HA cluster, the state comprising at least a mode and status for members of the HA cluster;

a mode rules logic configured to select a mode for a member of the HA cluster, the mode being selected to make the HA cluster comply with a set of allowed mode pairings and to prevent the HA cluster from attaining a prohibited mode pairing, and

an action logic configured to prevent a split brain scenario in the HA cluster by transforming an HA cluster member state by performing one or more of, a status action, a stop action, a start action, a configuration action, a clear action, a primary action, and a force hardware reset action.