Novell SUSE LINUX ENTERPRISE SERVER 10 SP2 HEARTBEAT Manual

Hide thumbs Also See for SUSE LINUX ENTERPRISE SERVER 10 SP2 HEARTBEAT:

Advertisement

SUSE Linux Enterprise
Server
10 SP2
May 08, 2008
Heartbeat
www.novell.com

Advertisement

Table of Contents
loading

Summary of Contents for Novell SUSE LINUX ENTERPRISE SERVER 10 SP2 HEARTBEAT

  • Page 1 SUSE Linux Enterprise Server 10 SP2 www.novell.com Heartbeat May 08, 2008...
  • Page 2 The express authorization of Novell, Inc must be obtained prior to any other use of any manual or part thereof. For Novell trademarks, see the Novell Trademark and Service Mark list http://www.novell...
  • Page 3: Table Of Contents

    Contents About This Guide 1 Overview Product Features ......Product Benefits ......Cluster Configurations .
  • Page 4 Specifying Resource Failback Nodes (Resource Stickiness) ..Configuring Resource Monitoring ....Starting a New Cluster Resource ....Removing a Cluster Resource .
  • Page 5: About This Guide

    About This Guide Heartbeat is an open source clustering software for Linux. Heartbeat ensures high availability and manageability of critical network resources including data, applications, and services. It is a multinode clustering product that supports failover, failback, and migration (load balancing) of individually managed cluster resources. This guide is intended for administrators given the task of building Linux clusters.
  • Page 6 Configuring and Managing Cluster Resources Managing resources encompasses much more than just the initial configuration. Learn how to use the Heartbeat graphical user interface to configure and manage resources. Manual Configuration of a Cluster Managing resources encompasses much more than just the initial configuration. Learn how to use the Heartbeat command line tools to configure and manage re- sources.
  • Page 7: Documentation Updates

    (page vi), let us know on which aspects of Heartbeat this guide should elaborate. For the latest version of this documentation, see the SLES 10 SP2 documentation Web site at http://www.novell.com/documentation/sles10. 3 Documentation Conventions The following typographical conventions are used in this manual: •...
  • Page 9: Overview

    Overview Heartbeat is an open source server clustering system that ensures high availability and manageability of critical network resources including data, applications, and services. It is a multinode clustering product for Linux that supports failover, failback, and mi- gration (load balancing) of individually managed cluster resources. Heartbeat is shipped with SUSE Linux Enterprise Server 10 and provides you with the means to make virtual machines (containing services) highly available.
  • Page 10: Product Benefits

    • A single point of administration through either a graphical Heartbeat tool or a command line tool. Both tools let you configure and monitor your Heartbeat cluster. • The ability to tailor a cluster to the specific applications and hardware infrastructure that fit your organization.
  • Page 11 • Low cost of operation • Scalability • Disaster recovery • Data protection • Server consolidation • Storage consolidation Shared disk fault tolerance can be obtained by implementing RAID on the shared disk subsystem. The following scenario illustrates some of the benefits Heartbeat can provide. Suppose you have configured a three-server cluster, with a Web server installed on each of the three servers in the cluster.
  • Page 12 Suppose Web Server 1 experiences hardware or software problems and the users de- pending on Web Server 1 for Internet access, e-mail, and information lose their connec- tions. The following figure shows how resources are moved when Web Server 1 fails. Figure 1.2 Three-Server Cluster after One Server Fails Web Site A moves to Web Server 2 and Web Site B moves to Web Server 3.
  • Page 13: Cluster Configurations

    Now suppose the problems with Web Server 1 are resolved, and Web Server 1 is returned to a normal operating state. Web Site A and Web Site B can either automatically fail back (move back) to Web Server 1, or they can stay where they are. This is dependent on how you configured the resources for them.
  • Page 14 Figure 1.3 Typical Fibre Channel Cluster Configuration Although Fibre Channel provides the best performance, you can also configure your cluster to use iSCSI. iSCSI is an alternative to Fibre Channel that can be used to create a low-cost SAN. The following figure shows how a typical iSCSI cluster configuration might look.
  • Page 15 Figure 1.4 Typical iSCSI Cluster Configuration Although most clusters include a shared disk subsystem, it is also possible to create a Heartbeat cluster without a share disk subsystem. The following figure shows how a Heartbeat cluster without a shared disk subsystem might look. Figure 1.5 Typical Cluster Configuration Without Shared Storage Overview...
  • Page 16: Heartbeat Cluster Components

    1.4 Heartbeat Cluster Components The following components make up a Heartbeat version 2 cluster: • From 2 to 16 Linux servers, each containing at least one local disk device. • Heartbeat software running on each Linux server in the cluster. •...
  • Page 17 Figure 1.6 Heartbeat Architecture Messaging and Infrastructure Layer The primary or first layer is the messaging/infrastructure layer, also known as the Heartbeat layer. This layer contains components that send out the Heartbeat messages containing “I'm alive” signals, as well as other information. The Heartbeat program resides in the messaging/infrastructure layer.
  • Page 18 to all of its members. It performs this task based on the information it gets from the Heartbeat layer. The logic that takes care of this task is contained in the Cluster Con- sensus Membership service, which provides an organized cluster topology overview (node-wise) to cluster components that are the higher layers.
  • Page 19 Policy Engine (PE) and Transition Engine (TE) Whenever the Designated Coordinator needs to make a cluster-wide change (react to a new CIB), the Policy Engine is used to calculate the next state of the cluster and the list of (resource) actions required to achieve it. The commands computed by the Policy Engine are then executed by the Transition Engine.
  • Page 20 Designated Coordinator. You can use either tool on any node in the cluster, and the local CIB will relay the requested changes to the Designated Coordinator. The Desig- nated Coordinator will then replicate the CIB change to all cluster nodes and will start the transition procedure.
  • Page 21: Installation And Setup

    Installation and Setup A Heartbeat cluster can be installed and configured using YaST. During the Heartbeat installation, you are prompted for information that is necessary for Heartbeat to function properly. This chapter contains information to help you install, set up, and configure a Heartbeat cluster.
  • Page 22: Hardware Requirements

    .html] for more information. 2.1 Hardware Requirements The following list specifies hardware requirements for a Heartbeat cluster. These re- quirements represent the minimum hardware configuration. Additional hardware might be necessary, depending on how you intend to use your Heartbeat cluster.
  • Page 23: Preparations

    • The disks contained in the shared disk system should be configured to use mirroring or RAID to add fault tolerance to the shared disk system. Hardware-based RAID is recommended. Software-based RAID1 is not supported for all configurations. • If you are using iSCSI for shared disk system access, ensure that you have properly configured iSCSI initiators and targets.
  • Page 24 3 Click OK, and then repeat Step 2 (page 15) to add the other nodes in the cluster to the /etc/hosts file on this server. 4 Repeat Step 1 (page 15) through Step 3 (page 16) on each server in your Heartbeat cluster.
  • Page 25: Installing Heartbeat

    2.5 Installing Heartbeat 1 Start YaST and select Miscellaneous > High Availability or enter yast2 heartbeat to start the YaST Heartbeat module. It lets you create a new cluster or add new nodes to an existing cluster. 2 On the Node Configuration screen, add a node to the cluster by specifying the node name of the node you want to add, then click Add.
  • Page 26 3 On the Authentication Keys screen, specify the authentication method the cluster will use for communication between cluster nodes, and if necessary an Authenti- cation Key (password). Then click Next. Figure 2.2 Authentication Keys Both the MD5 and SHA1 methods require a shared secret, which is used to protect and authenticate messages.
  • Page 27 Figure 2.3 Media Configuration This provides a way for cluster nodes to signal that they are alive to other nodes in the cluster. For proper redundancy, you should specify at least two Heartbeat media if possible. Choose at least one Heartbeat medium, and if possible, more than one: •...
  • Page 28 NOTE: UDP Port Settings Note that the UDP port setting only apply to broadcast media, not to the other media you may use. When editing UDP ports manually in /etc/ ha.d/ha.cf, the udpport entry must precede the bcast entry it belongs to, otherwise Heartbeat will ignore the port setting.
  • Page 29: Configuring Stonith

    To start Heartbeat on the other servers in the cluster when they are booted, enter chkconfig heartbeat on at the server console of each of those servers. You can also enter chkconfig heartbeat off at the server console to have Heartbeat not start automatically when the server is rebooted. 6 To configure Heartbeat on the other nodes in the cluster run /usr/lib/heartbeat/ha_propagate on the heartbeat node you just configured.
  • Page 30 For Heartbeat, STONITH must be configured as a cluster resource. After reviewing Chapter 4, Configuring and Managing Cluster Resources (page 29), continue with Configuring STONITH as a Cluster Resource (page 22). Procedure 2.3 Configuring STONITH as a Cluster Resource 1 Start the HA Management Client and log in to the cluster as described in Section 4.1, “Graphical HA Management Client”...
  • Page 31 11 In the clone_max field, enter the number of nodes in the cluster that will run the STONITH service. Be mindful of the number of concurrent connections your STONITH device supports. 12 Enter the Clone or Master/Slave ID. 13 Click Add to add the STONITH resource to the cluster. It now appears below the Resources entry in the left pane.
  • Page 33: Setting Up A Simple Resource

    Setting Up a Simple Resource Once your cluster is installed and set up as described in Chapter 2, Installation and Setup (page 13), you can start adding resources to your configuration. Configure re- sources either with the Heartbeat GUI or manually by using the command line tools. In the following, find an example of how to configure an IP address as a resource either manually or with the Heartbeat GUI.
  • Page 34 4 Enter a Resource ID (name) for the IP address resource. For example, ipaddress1. 5 In the Type section of the page, scroll down the list and select IPaddr (OCF Resource Agent) as the resource type. 6 In the Parameters section of the page, find the line that was added for the IP address resource, click the line once, then click the line again under the Value heading to open a text field.
  • Page 35: Manual Configuration Of A Resource

    3 In the new window, select ipaddress1 as Resource to migrate and select node1 from the To Node drop-down list. 3.2 Manual Configuration of a Resource Resources are any type of service that a computer provides. Resources are known to Heartbeat when they may be controlled by RAs (Resource Agents), which are LSB scripts, OCF scripts, or legacy Heartbeat 1 resources.
  • Page 36 In this example, the RA attribute ip is set to 10.10.0.1. For the IPaddr RA, this RA attribute is mandatory, as can be seen in Appendix A, HB OCF Agents (page 109). NOTE When configuring a resource with Heartbeat, the same resource should not be initialized by init.
  • Page 37: Configuring And Managing Cluster Resources

    Configuring and Managing Cluster Resources Cluster resources must be created for every resource or application you run on servers in your cluster. Cluster resources can include Web sites, e-mail servers, databases, file systems, virtual machines, and any other server-based applications or services you want to make available to users at all times.
  • Page 38 Do this on every node where you will run the HA Management Client utility. To start the HA Management Client, enter hb_gui at the command line of a Linux server or workstation. Log in to the cluster by selecting Connection > Login. You are prompted for a username and password.
  • Page 39: Creating Cluster Resources

    Depending on which entry you select in the left pane, several tabs appear in the right pane of the main window. For example, if you select the topmost entry, linux-ha, you can access three tabs on the right in the main window, allowing you to view general Information on the cluster or to change certain options and aspects on the Configurations and Advanced tabs.
  • Page 40 5 Select the resource Type from the list. To display a brief description of that re- source type, double-click any of the resource types in the list. 6 Conditional: After selecting a resource type, a line for that resource type might be added to the Parameters section of the screen.
  • Page 41: Configuring Resource Constraints

    4.3 Configuring Resource Constraints Resource constraints let you specify which cluster nodes resources will run on, what order resources will load, and what other resources a specific resource is dependent on. For information on configuring resource constraints, see Constraints [http:// linux-ha.org/ClusterInformationBase/SimpleExamples] on the High Availability Linux Web site.
  • Page 42: Specifying Resource Failback Nodes (Resource Stickiness)

    6 If you need to modify the score, select the added constraint in the left pane of the main window and enter a new Score value in the Attributes section in the right pane. 7 On the left side of the page, select the resource constraint you just created and click Add Expression.
  • Page 43: Configuring Resource Monitoring

    Value is 0: This is the default. The resource will be placed optimally in the system. This may mean that it is moved when a “better” or less loaded node becomes available. This option is almost equivalent to automatic failback, except that the resource may be moved to a node that is not the one it was previously active on.
  • Page 44: Starting A New Cluster Resource

    2 Select the resource, click the Operations tab, then click Add Operation. 3 Select Monitor as the operation name. 4 Add the desired values in the Interval, Timeout, and Start Delay fields, and a description if desired. 5 Click OK, then click Apply to start the monitoring operation. If you do not configure resource monitoring, resource failures after a successful start will not be communicated, and the cluster will always show the resource as healthy.
  • Page 45: Configuring A Heartbeat Cluster Resource Group

    4.9 Configuring a Heartbeat Cluster Resource Group Some cluster resources are dependent on other components or resources, and require that each component or resource are started in a specific order and run together on the same server. An example of this is a Web server that requires an IP address and a file system.
  • Page 46 The Ordered value specifies, if the resources in the group will load in order you specified. The Collocated value specifies whether the resources in the group will run on the same server. 5 Specify a resource name (ID) for the IP address resource portion of the group. 6 In the Type section of the page, scroll down the list and select IPaddr2 (OCF Resource Agent) as the resource type.
  • Page 47 5 In the Parameters section of the page, find the line that was added for the file system resource, click the line once, then click the line again under the Value heading to open a text field. 6 Add the name of the file system. For example, Reiser. 7 Click the Add Parameter button and select Device as the name.
  • Page 48: Configuring A Heartbeat Clone Resource

    4.10 Configuring a Heartbeat Clone Resource You may want certain resources to run simultaneously on multiple nodes in your cluster. To do this you must configure a resource as a clone. Examples of resources that might be configured as clones include STONITH and cluster file systems like OCFS2. You can clone any resource provided it is supported by the resource’s ResourceAgent.
  • Page 49: Migrating A Cluster Resource

    6 Select the resource in the left pane and click Resource > Start to start the resource. Starting the clone resource will cause it to start on the nodes that have been specified with its resource location constraints. After creating the resource, you must create location constraints for the resource. The location constraints determine which nodes the resource can run on.
  • Page 51: Manual Configuration Of A Cluster

    Manual Configuration of a Cluster Manual configuration of a Heartbeat cluster is often the most effective way of creating a reliable cluster that meets specific needs. Because of the extensive configurability of Heartbeat and the range of needs it can meet, it is not possible to document every pos- sible scenario.
  • Page 52: Configuration Basics

    5.1 Configuration Basics The cluster is divided into two main sections, configuration and status. The status section contains the history of each resource on each node and based on this data, the cluster can construct the complete current state of the cluster. The authoritative source for the status section is the local resource manager (lrmd) process on each cluster node.
  • Page 53: Updating The Configuration

    Details on all the available options can be obtained using the crm_mon --help command. 5.1.2 Updating the Configuration There is a basic warning for updating the cluster configuration: WARNING: Rules For Updating the Configuration Never edit the cib.xml file manually, otherwise the cluster will refuse to use the configuration.
  • Page 54: Using Xml

    5.1.3 Quickly Deleting Part of the Configuration Sometimes it is necessary to delete an object quickly. This can be done in three easy steps: 1 Identify the object you wish to delete, for example: cibadmin -Q | grep stonith <nvpair id="cib-bootstrap-options-stonith-action" name="stonith-action" value="reboot"/>...
  • Page 55: Configuring Resources

    crm_resource --locate --resource my-test-rsc 5.1.5 Testing Your Configuration It is not necessary to modify a real cluster in order to test the effect of the configuration changes. Do the following to test your modifications: 1 Save the current configuration to a temporary file: cibadmin --cib_query >...
  • Page 56 force-reload, and status as explained in http://www.linux-foundation .org/spec/refspecs/LSB_1.3.0/gLSB/gLSB/iniscrptact.html. The configuration of those services is not standardized. If you intend to use an LSB script with Heartbeat, make sure that you understand how the respective script is con- figured. Often you can find some documentation to this in the documentation of the respective package in /usr/share/doc/packages/<package_name>.
  • Page 57 The IPv4 address to be configured in dotted quad notation, for example "192.168.1.1". </longdesc> <shortdesc lang="en">IPv4 address</shortdesc> <content type="string" default="" /> </parameter> </resource-agent> This is part of the IPaddr RA. The information about how to configure the parameter of this RA can be read as follows: Root element for each output.
  • Page 58 <primitive id="filesystem_resource" class="ocf" provider="heartbeat" type="Filesystem"> <instance_attributes id="ia-filesystem_1"> <attributes> <nvpair id="filesystem-nv-1" name="device" value="/dev/drbd0"/> <nvpair id="filesystem-nv-2" name="directory" value="/srv/failover"/> <nvpair id="filesystem-nv-3" name="fstype" value="reiserfs"/> </attributes> </instance_attributes> </primitive> Configuring drbd Before starting with the drbd Heartbeat configuration, set up a drbd device manually. Basically this is configuring drbd in /etc/drbd.conf and letting it synchronize. The exact procedure for configuring drbd is described in the Storage Administration Guide.
  • Page 59 <nvpair id="drbd-nv-5" name="drbd_resource" value="r0"/> </attributes> </instance_attributes> </primitive> </master_slave> The master element of this resource is master_slave. The complete resource is later accessed with the ID drbd_resource. clone_max defines how many masters and slaves may be present in the cluster. clone_node_max is the maximum number of clones (masters or slaves) that are allowed to run on a single.
  • Page 60: Configuring Constraints

    Configure the IP address completely with the Heartbeat RA configuration. No additional modification is necessary in the system. The IP address RA is an OCF RA. <group id="nfs_group"> <primitive id="nfs_resource" class="lsb" type="nfsserver"/> <primitive id="ip_resource" class="ocf" provider="heartbeat" type="IPaddr"> <instance_attributes id="ia-ipaddr_1"> <attributes> <nvpair id="ipaddr-nv-1"...
  • Page 61 5.3.1 Locational Constraints This type of constraint may be added multiple times for each resource. All rsc_location constraints are evaluated for a given resource. A simple example that increases the probability to run a resource with the ID filesystem_1 on the node with the name earth to 100 would be the following: <rsc_location id="filesystem_1_location"...
  • Page 62 For a master slave configuration, it is necessary to know if the current node is a master in addition to running the resource locally. This can be checked with an additional to_role or from_role attribute. 5.3.3 Ordering Constraints Sometimes it is necessary to provide an order in which services must start. For example, you cannot mount a file system before the device is available to a system.
  • Page 63: Configuring Crm Options

    • The NFS server as well as the IP address start after the file system is mounted. <rsc_order id="nfs_second" from="nfs_group" action="start" to="filesystem_resource" to_action="start" type="after"/> • The NFS server as well as the IP address must be on the same node as the file sys- tem.
  • Page 64 symmetric_cluster (boolean, default=TRUE) If true, resources are permitted to run anywhere by default. Otherwise, explicit constraints must be created to specify where they can run. stonith_enabled (boolean, default=FALSE) If true, failed nodes are fenced. no_quorum_policy (enum, default=stop) ignore Pretend to have quorum. freeze Do not start any resources not currently in the partition.
  • Page 65 or configuration change). This option is almost equivalent to auto_failback off except that the resource may be moved to other nodes than the one on which it was previously active. -INFINITY Resources always move away from their current location. is_managed_default (boolean, default=TRUE) Unless the resource's definition says otherwise: TRUE Resources are started, stopped, monitored, and moved as necessary.
  • Page 66 Policy Engine (7) Policy Engine — Policy Engine Options Synopsis [no-quorum-policy=enum] [symmetric-cluster=boolean] [stonith-enabled=boolean] [stonith-action=enum] [default-resource-stickiness=integer] [default-resource-failure-stickiness=integer] [is-managed-default=boolean] [cluster-delay=time] [default-action-timeout=time] [stop-orphan-resources=boolean] [stop-orphan-actions=boolean] [pe-error-series-max=integer] [pe-warn-series-max=integer] [pe-input-series-max=integer] [startup-fencing=boolean] Description This is a fake resource that details the options that can be configured for the Policy Engine.
  • Page 67: For More Information

    default-resource-stickiness= default-resource-failure-stickiness= is-managed-default=Should the cluster start/stop resources as required Should the cluster start/stop resources as required cluster-delay=Round trip delay over the network (excluding action execution) The "correct" value will depend on the speed and load of your network and cluster nodes.
  • Page 69: Managing A Cluster

    Managing a Cluster Heartbeat ships with a comprehensive set of tools that assist you in managing your cluster from the command line. This chapter introduces the tools needed for managing the cluster configuration in the CIB and the cluster resources. Other command line tools for managing resource agents or tools used for debugging and troubleshooting your setup are covered in Chapter 7, Creating Resources...
  • Page 70 Managing Configuration Changes The crm_diff command assists you in creating and applying XML patches. This can be useful for visualizing the changes between two versions of the cluster con- figuration or saving changes so they can be applied at a later time using cibadmin(8) (page 64).
  • Page 71 this command with extreme caution. For more information, refer to crm_uuid(8) (page 98). Managing a Node's Standby Status The crm_standby command can manipulate a node's standby attribute. Any node in standby mode is no longer eligible to host resources and any resources that are there must be moved.
  • Page 72 cibadmin (8) cibadmin — read, modify, or administer Heartbeat Cluster Information Base Synopsis cibadmin (--cib_query|-Q) -[Vrwlsmfbp] [-i xml-object-id|-o xml-object-type] [-t t-flag-whatever] [-h hostname] cibadmin (--cib_create|-C) -[Vrwlsmfbp] [-X xml-string] [-x xml- filename] [-t t-flag-whatever] [-h hostname] cibadmin (--cib_replace|-R) -[Vrwlsmfbp] [-i xml-object-id| -o xml-object-type] xml-string] [-x xml-filename] [-t t-flag- whatever] [-h hostname]...
  • Page 73 Description The cibadmin command is the low-level administrative command for manipulating the Heartbeat CIB. Use it to dump all or part of the CIB, update all or part of it, modify all or part of it, delete the entire CIB, or perform miscellaneous CIB administrative operations.
  • Page 74 --cib_erase, -E Erase the contents of the entire CIB. --cib_query, -Q Query a portion of the CIB. --cib_create, -C Create a new CIB from the XML content of the argument. --cib_replace, -R Recursively replace an XML object in the CIB. --cib_update, -U Recursively update an object in the CIB.
  • Page 75 --xml-file filename, -x filename Specify XML from a file on which cibadmin should operate. It must be a complete tag or XML fragment. --xml_pipe, -p Specify that the XML on which cibadmin should operate comes from standard input. It must be a complete tag or XML fragment. Specialized Options --cib_bump, -B Increase the epoch version counter in the CIB.
  • Page 76 IMPORTANT Use this option with utmost care and avoid using it at a time when (part of) the cluster does not have quorum. Otherwise, this results in divergence between a small subset of the cluster nodes and the majority of the cluster. The worst case is if the majority of the cluster is also running (as in a parti- tioned cluster) and is also making changes.
  • Page 77 <primitive id="R_10.10.10.101" class="ocf" type="IPaddr2" provider="heartbeat"> <instance_attributes id="RA_R_10.10.10.101"> <attributes> <nvpair id="R_ip_P_ip" name="ip" value="10.10.10.101"/> <nvpair id="R_ip_P_nic" name="nic" value="eth0"/> </attributes> </instance_attributes> </primitive> Then issue the following command: cibadmin --obj_type resources -U -x foo To change the IP address of the IPaddr2 resource previously added, issue the command below: cibadmin -M -X '<nvpair id="R_ip_P_ip"...
  • Page 78 To replace the CIB with a new manually-edited version of the CIB, use the following command: cibadmin -R -x $HOME/cib.xml Files /var/lib/heartbeat/crm/cib.xml—the CIB (minus status section) on disk. See Also crm_resource(8) (page 89), crmadmin(8) (page 71), lrmadmin(8), heartbeat(8) Author cibadmin was written by Andrew Beekhof. This manual page was originally written by Alan Robertson.
  • Page 79 crmadmin (8) crmadmin — control the Cluster Resource Manager Synopsis crmadmin [-V|-q] [-i|-d|-K|-S|-E] node crmadmin [-V|-q] -N -B crmadmin [-V|-q] -D crmadmin -v crmadmin -? Description crmadmin was originally designed to control most of the actions of the CRM daemon. However, the largest part of its functionality has been made obsolete by other tools, such as crm_attribute and crm_resource.
  • Page 80 NOTE Increase the level of verbosity by providing additional instances. --quiet, -q Do not provide any debug information at all and reduce the output to a minimum. --bash-export, -B Create bash export entries of the form export uname=uuid. This applies only to the crmadmin -N node command.
  • Page 81 --election node, -E node Initiate an election from the specified node. WARNING Use this with extreme caution. This action is normally initiated internally and may have unintended side effects. --dc_lookup, -D Query the uname of the current DC. The location of the DC is only of significance to the crmd internally and is rarely useful to administrators except when deciding on which node to examine the logs.
  • Page 82 crm_attribute (8) crm_attribute — manipulate attributes in the CIB Synopsis crm_attribute [options] Description The crm_attribute command queries and manipulates node attributes and cluster configuration options that are used in the CIB. Options --help, -? Print a help message. --verbose, -V Turn on debug information.
  • Page 83 --attr-id string, -i string For advanced users only. Identifies the id attribute. --attr-value string, -v string Value to set. This is ignored when used with -G. --inhibit-policy-engine, -! For advanced users only. --node-uuid node_uuid, -u node_uuid Specify the UUID of the node to change. --node-uname node_uname, -U node_uname Specify the uname of the node to change.
  • Page 84 Examples Query the value of the location attribute in the nodes section for the host myhost in the CIB: crm_attribute -G -t nodes -U myhost -n location Query the value of the cluster-delay attribute in the crm_config section in the CIB: crm_attribute -G -t crm_config -n cluster-delay Query the value of the cluster-delay attribute in the crm_config section in the...
  • Page 85 Author crm_attribute was written by Andrew Beekhof. Managing a Cluster...
  • Page 86 crm_diff (8) crm_diff — identify changes to the cluster configuration and apply patches to the con- figuration files Synopsis crm_diff [-?|-V] [-o filename] [-O string] [-p filename] [-n filename] [-N string] Description The crm_diff command assists in creating and applying XML patches. This can be useful for visualizing the changes between two versions of the cluster configuration or saving changes so they can be applied at a later time using cibadmin.
  • Page 87 --cib, -c Compare or patch the inputs as a CIB. Always specify the base version with -o and provide either the patch file or the second version with -p or -n, respectively. --stdin, -s Read the inputs from stdin. Examples Use crm_diff to determine the differences between various CIB configuration files and to create patches.
  • Page 88 See Also cibadmin(8) (page 64) Author crm_diff was written by Andrew Beekhof. Heartbeat...
  • Page 89 crm_failcount (8) crm_failcount — manipulate the failcount attribute on a given resource Synopsis crm_failcount [-?|-V] -D -u|-U node -r resource crm_failcount [-?|-V] -G -u|-U node -r resource crm_failcount [-?|-V] -v string -u|-U node -r resource Description Heartbeat implements a sophisticated method to compute and force failover of a resource to another node in case that resource tends to fail on the current node.
  • Page 90 NOTE Increase the level of verbosity by providing additional instances. --quiet, -Q When doing an attribute query using -G, print just the value to stdout. Use this option with -G. --get-value, -G Retrieve rather than set the preference. --delete-attr, -D Specify the attribute to delete.
  • Page 91 Files /var/lib/heartbeat/crm/cib.xml—the CIB (minus status section) on disk. Editing this file directly is strongly discouraged. See Also crm_attribute(8) (page 74), cibadmin(8) (page 64), and the Linux High Availability FAQ Web site [http://www.linux-ha.org/v2/faq/forced_failover] Author crm_failcount was written by Andrew Beekhof. Managing a Cluster...
  • Page 92 crm_master (8) crm_master — determine which resource instance to promote to master Synopsis crm_master [-V|-Q] -D [-l lifetime] crm_master [-V|-Q] -G [-l lifetime] crm_master [-V|-Q] -v string [-l string] Description crm_master is called from inside the resource agent scripts to determine which re- source instance should be promoted to master mode.
  • Page 93: Environment Variables

    --quiet, -Q When doing an attribute query using -G, print just the value to stdout. Use this option with -G. --get-value, -G Retrieve rather than set the preference to be promoted. --delete-attr, -D Delete rather than set the attribute. --attr-id string, -i string For advanced users only.
  • Page 94 crm_mon (8) crm_mon — monitor the cluster's status Synopsis crm_mon [-V] -d -pfilename -h filename crm_mon [-V] [-1|-n|-r] -h filename crm_mon [-V] [-n|-r] -X filename crm_mon [-V] [-n|-r] -c|-1 crm_mon [-V] -i interval crm_mon -? Description The crm_mon command allows you to monitor your cluster's status and configuration. Its output includes the number of nodes, uname, uuid, status, the resources configured in your cluster, and the current status of each.
  • Page 95 --group-by-node, -n Group resources by node. --inactive, -r Display inactive resources. --as-console, -c Display the cluster status on the console. --simple-status, -s Display the cluster status once as a simple one line output (suitable for nagios). --one-shot, -1 Display the cluster status once on the console then exit (does not use ncurses). --as-html filename, -h filename Write the cluster's status to the specified file.
  • Page 96 Display your cluster's status on the console just once then exit: crm_mon -1 Display your cluster's status and group resources by node: crm_mon -n Display your cluster's status, group resources by node, and include inactive resources in the list: crm_mon -n -r Write your cluster's status to an HTML file: crm_mon -h filename Run crm_mon as a daemon in the background, specify the daemon's pid file for easier...
  • Page 97 crm_resource (8) crm_resource — interact with the Cluster Resource Manager Synopsis crm_resource [-?|-V|-S] -L|-Q|-W|-D|-C|-P|-p [options] Description The crm_resource command performs various resource-related actions on the cluster. It can modify the definition of configured resources, start and stop resources, and delete and migrate resources between nodes. --help, -? Print the help message.
  • Page 98 --locate, -W Locate a resource. Requires: -r --migrate, -M Migrate a resource from its current location. Use -H to specify a destination. If -H is not specified, the resource is forced to move by creating a rule for the current location and a score of -INFINITY. NOTE This prevents the resource from running on this node until the constraint is removed with -U.
  • Page 99 --refresh, -R Refresh the CIB from the LRM. Optional: -H --set-parameter string, -p string Set the named parameter for a resource. Requires: -r, -v. Optional: -i, -s --get-parameter string, -g string Get the named parameter for a resource. Requires: -r. Optional: -i, -s --delete-parameter string, -d string Delete the named parameter for a resource.
  • Page 100 --force, -f Force the resource to move by creating a rule for the current location and a score of -INFINITY This should be used if the resource's stickiness and constraint scores total more than INFINITY (currently 100,000). NOTE This prevents the resource from running on this node until the constraint is removed with -U.
  • Page 101 Migrate a resource away from its current location: crm_resource -M -r my_first_ip Migrate a resource to a specific location: crm_resource -M -r my_first_ip -H c001n02 Allow a resource to return to its normal location: crm_resource -U -r my_first_ip NOTE The values of resource_stickiness and default_resource_stickiness may mean that it does not move back.
  • Page 102 Files /var/lib/heartbeat/crm/cib.xml—the CIB (minus status section) on disk. Editing this file directly is strongly discouraged. See Also cibadmin(8) (page 64), crmadmin(8) (page 71), lrmadmin(8), heartbeat(8) Author crm_resource was written by Andrew Beekhof. Heartbeat...
  • Page 103 crm_standby (8) crm_standby — manipulate a node's standby attribute to determine whether resources can be run on this node Synopsis crm_standby [-?|-V] -D -u|-U node -r resource crm_standby [-?|-V] -G -u|-U node -r resource crm_standby [-?|-V] -v string -u|-U node -r resource [-l string] Description The crm_standby command manipulates a node's standby attribute.
  • Page 104 --quiet, -Q When doing an attribute query using -G, print just the value to stdout. Use this option with -G. --get-value, -G Retrieve rather than set the preference. --delete-attr, -D Specify the attribute to delete. --attr-value string, -v string Specify the value to use. This option is ignored when used with -G. --attr-id string, -i string For advanced users only.
  • Page 105 Query the standby status of a node: crm_standby -G -U node1 Remove the standby property from a node: crm_standby -D -U node1 Have a node go to standby for an indefinite period of time: crm_standby -v true -l forever -U node1 Have a node go to standby until the next reboot of this node: crm_standby -v true -l reboot -U node1 Files...
  • Page 106 crm_uuid (8) crm_uuid — get a node's UUID Synopsis crm_uuid [-w|-r] Description UUIDs are used to identify cluster nodes to ensure that they can always be uniquely identified. The crm_uuid command displays and modifies the UUID of the node on which it is run.
  • Page 107 sociated with the old UUID value. Do not change the UUID unless you changed all references to it as well. --read, -r Read the UUID value and print it to stdout. See Also /var/lib/heartbeat/hb_uuid Author crm_uuid was written by Andrew Beekhof. Managing a Cluster...
  • Page 108 crm_verify (8) crm_verify — check the CIB for consistency Synopsis crm_verify [-V] -x file crm_verify [-V] -X string crm_verify [-V] -L|-p crm_verify [-?] Description crm_verify checks the configuration database (CIB) for consistency and other problems. It can be used to check a file containing the configuration or can it can connect to a running cluster.
  • Page 109 --live-check, -L Connect to the running cluster and check the CIB. --crm_xml string, -X string Check the configuration in the supplied string. Pass complete CIBs only. --xml-file file, -x file Check the configuration in the named file. --xml-pipe, -p Use the configuration piped in via stdin. Pass complete CIBs only. --dtd-file string, -D string Use the given DTD file instead of /usr/share/heartbeat/crm.dtd.
  • Page 110 Author crm_verify was written by Andrew Beekhof. Heartbeat...
  • Page 111: Creating Resources

    Creating Resources All tasks that should be managed by a cluster must be available as a resource. There are two major groups that should be distinguished: resource agents and STONITH agents. For both categories, you can add your own agents, extending the abilities of the cluster to your own needs.
  • Page 112: Resource Agents

    7.2 Resource Agents All services that are controlled by the cluster have a corresponding RA that handles the changes and monitoring of this service. These RAs are available in three different flavors: Heartbeat 1 RAs All the resources from Heartbeat 1 are still available in Heartbeat 2. However, it is recommended to migrate your configurations to Heartbeat 2 OCF RAs if possible.
  • Page 113 When implementing your own OCF RA, provide several actions for this agent. More details about writing OCF resource agents can be found at http://www.linux-ha .org/OCFResourceAgent. Find special information about several concepts of Heartbeat 2 at http://linux-ha.org/v2/Concepts. Creating Resources...
  • Page 115: Troubleshooting

    Troubleshooting Especially when starting to experiment with Heartbeat, strange problems may occur that are not easy to understand. However, there are several utilities that may be used to take a closer look at the Heartbeat internal processes. What is the state of my cluster? To check the current state of your cluster, use the program crm_mon.
  • Page 116 If this fails, it is very likely that you missed some mandatory variable or just mistyped a parameter. I just get a failed message. Is it possible to get more information? You may always add the -V parameter to your commands. If you do that multiple times, the debug output becomes very verbose.
  • Page 117: A Hb Ocf Agents

    HB OCF Agents All OCF agents require several parameters to be set when they are started. The following overview shows how to manually operate these agents. The data that is available in this appendix is directly taken from the meta-data invocation of the respective RA. Find all these agents in /usr/lib/ocf/resource.d/heartbeat/.
  • Page 118 ocf:apache (7) ocf:apache — Apache web server Synopsis OCF_RESKEY_configfile=string [OCF_RESKEY_httpd=string] [OCF_RESKEY_port=integer] [OCF_RESKEY_statusurl=string] [OCF_RESKEY_options=string] [OCF_RESKEY_testregex=string] apache [start | stop | status | monitor | meta-data | validate-all] Description This is the resource agent for the Apache web server. Thie resource agent operates both version 1.x and version 2.x Apache servers.
  • Page 119 OCF_RESKEY_testregex=test regular expression Regular expression to match in the output of statusurl. It is case insensitive. HB OCF Agents...
  • Page 120 ocf:AudibleAlarm (7) ocf:AudibleAlarm — AudibleAlarm resource agent Synopsis [OCF_RESKEY_nodelist=string] AudibleAlarm [start | stop | restart | status | monitor | meta-data | validate-all] Description Resource script for AudibleAlarm. It sets an audible alarm running by beeping at a set interval. Supported Parameters OCF_RESKEY_nodelist=Node list The node list that should never sound the alarm.
  • Page 121 ocf:ClusterMon (7) ocf:ClusterMon — ClusterMon resource agent Synopsis [OCF_RESKEY_user=string] [OCF_RESKEY_update=integer] [OCF_RESKEY_extra_options=string] OCF_RESKEY_pidfile=string OCF_RESKEY_htmlfile=string ClusterMon [start | stop | monitor | meta-data | validate-all] Description This is a ClusterMon Resource Agent. It outputs current cluster status to the html. Supported Parameters OCF_RESKEY_user=The user we want to run crm_mon as The user we want to run crm_mon as OCF_RESKEY_update=Update interval...
  • Page 122 ocf:db2 (7) ocf:db2 — db2 resource agent Synopsis [OCF_RESKEY_instance=string] [OCF_RESKEY_admin=string] db2 [start | stop | status | monitor | validate-all | meta-data | methods] Description Resource script for db2. It manages a DB2 Universal Database instance as an HA re- source.
  • Page 123 ocf:Delay (7) ocf:Delay — Delay resource agent Synopsis [OCF_RESKEY_startdelay=integer] [OCF_RESKEY_stopdelay=integer] [OCF_RESKEY_mondelay=integer] Delay [start | stop | status | monitor | meta-data | validate-all] Description This script is a test resource for introducing delay. Supported Parameters OCF_RESKEY_startdelay=Start delay How long in seconds to delay on start operation. OCF_RESKEY_stopdelay=Stop delay How long in seconds to delay on stop operation.
  • Page 124 ocf:drbd (7) ocf:drbd — This resource agent manages a Distributed Replicated Block Device (DRBD) object as a master/slave resource. DRBD is a mechanism for replicating storage; please see the documentation for setup details. Synopsis OCF_RESKEY_drbd_resource=string [OCF_RESKEY_drbdconf=string] [OCF_RESKEY_clone_overrides_hostname=boolean] [OCF_RESKEY_clone_max=integer] [OCF_RESKEY_clone_node_max=integer] [OCF_RESKEY_master_max=integer] [OCF_RESKEY_master_node_max=in- teger] drbd [start | promote | demote | notify | stop | monitor | monitor | meta-data | validate-all] Description...
  • Page 125 OCF_RESKEY_clone_node_max=Number of nodes Clones per node. Do not fiddle with the default. OCF_RESKEY_master_max=Number of primaries Maximum number of active primaries. Do not fiddle with the default. OCF_RESKEY_master_node_max=Number of primaries per node Maximum number of primaries per node. Do not fiddle with the default. HB OCF Agents...
  • Page 126 ocf:Dummy (7) ocf:Dummy — Dummy resource agent Synopsis OCF_RESKEY_state=string Dummy [start | stop | monitor | reload | migrate_to | mi- grate_from | meta-data | validate-all] Description This is a Dummy Resource Agent. It does absolutely nothing except keep track of whether it is running or not.
  • Page 127 ocf:eDir88 (7) ocf:eDir88 — eDirectory resource agent Synopsis OCF_RESKEY_eDir_config_file=string [OCF_RESKEY_eDir_monitor_ldap=boolean] [OCF_RESKEY_eDir_monitor_idm=boolean] [OCF_RESKEY_eDir_jvm_initial_heap=integer] [OCF_RESKEY_eDir_jvm_max_heap=integer] [OCF_RESKEY_eDir_jvm_options=string] eDir88 [start | stop | monitor | meta- data | validate-all] Description Resource script for managing an eDirectory instance. Manages a single instance of eDirectory as an HA resource. The "multiple instances" feature or eDirectory has been added in version 8.8.
  • Page 128 Supported Parameters OCF_RESKEY_eDir_config_file=eDir config file Path to configuration file for eDirectory instance. OCF_RESKEY_eDir_monitor_ldap=eDir monitor ldap Should we monitor if LDAP is running for the eDirectory instance? OCF_RESKEY_eDir_monitor_idm=eDir monitor IDM Should we monitor if IDM is running for the eDirectory instance? OCF_RESKEY_eDir_jvm_initial_heap=DHOST_INITIAL_HEAP value Value for the DHOST_INITIAL_HEAP java environment variable.
  • Page 129 ocf:Evmsd (7) ocf:Evmsd — Evmsd resource agent Synopsis Evmsd [start | stop | monitor | meta-data] Description This is a Evmsd Resource Agent. Supported Parameters HB OCF Agents...
  • Page 130 ocf:EvmsSCC (7) ocf:EvmsSCC — EVMS SCC resource agent Synopsis EvmsSCC [start | stop | notify | status | monitor | meta-data] Description Resource script for EVMS shared cluster container. It runs evms_activate on one node in the cluster. Supported Parameters Heartbeat...
  • Page 131 ocf:Filesystem (7) ocf:Filesystem — Filesystem resource agent Synopsis [OCF_RESKEY_device=string] [OCF_RESKEY_directory=string] [OCF_RESKEY_fstype=string] [OCF_RESKEY_options=string] [OCF_RESKEY_ocfs2_cluster=string] [OCF_RESKEY_ocfs2_configfs=string] Filesystem [start | stop | notify | monitor | validate-all | meta-data] Description Resource script for Filesystem. It manages a Filesystem on a shared storage medium. Supported Parameters OCF_RESKEY_device=block device The name of block device for the filesystem, or -U, -L options for mount, or NFS...
  • Page 132 OCF_RESKEY_ocfs2_configfs=OCFS2 configfs root Mountpoint of the cluster hierarchy below configfs. You should not need to specify this. Heartbeat...
  • Page 133 ocf:ICP (7) ocf:ICP — ICP resource agent Synopsis [OCF_RESKEY_driveid=string] [OCF_RESKEY_device=string] ICP [start | stop | status | monitor | validate-all | meta-data] Description Resource script for ICP. It Manages an ICP Vortex clustered host drive as an HA re- source. Supported Parameters OCF_RESKEY_driveid=ICP cluster drive ID The ICP cluster drive ID.
  • Page 134 ocf:ids (7) ocf:ids — OCF resource agent for the IBM's database server called Informix Dynamic Server (IDS) Synopsis [OCF_RESKEY_informixdir=string] [OCF_RESKEY_informixserver=string] [OCF_RESKEY_onconfig=string] [OCF_RESKEY_dbname=string] [OCF_RESKEY_sqltestquery=string] ids [start | stop | status | monitor | validate- all | meta-data | methods | usage] Description OCF resource agent to manage an IBM Informix Dynamic Server (IDS) instance as an High-Availability resource.
  • Page 135 at '/etc/'. If this parameter is unspecified the script will try to get the value from the shell environment. OCF_RESKEY_dbname= database to use for monitoring, defaults to 'sysmaster' This parameter defines which database to use in order to monitor the IDS instance. If this parameter is unspecified the script will use the 'sysmaster' database as a de- fault.
  • Page 136 ocf:IPaddr2 (7) ocf:IPaddr2 — Manages virtual IPv4 addresses Synopsis OCF_RESKEY_ip=string [OCF_RESKEY_nic=string] [OCF_RESKEY_cidr_netmask=string] [OCF_RESKEY_broadcast=string] [OCF_RESKEY_iflabel=string] [OCF_RESKEY_lvs_support=boolean] [OCF_RESKEY_mac=string] [OCF_RESKEY_clusterip_hash=string] [OCF_RESKEY_arp_interval=integer] [OCF_RESKEY_arp_count=integer] [OCF_RESKEY_arp_bg=string] [OCF_RESKEY_arp_mac=string] IPaddr2 [start | stop | status | monitor | meta-data | validate-all] Description This Linux-specific resource manages IP alias IP addresses. It can add an IP alias, or remove one.
  • Page 137 OCF_RESKEY_broadcast=Broadcast address Broadcast address associated with the IP. If left empty, the script will determine this from the netmask. OCF_RESKEY_iflabel=Interface label You can specify an additional label for your IP address here. This label is appended to your interface name. If a label is specified in nic name, this parameter has no effect.
  • Page 138 ocf:IPaddr (7) ocf:IPaddr — Manages virtual IPv4 addresses Synopsis OCF_RESKEY_ip=string [OCF_RESKEY_nic=string] [OCF_RESKEY_cidr_netmask=string] [OCF_RESKEY_broadcast=string] [OCF_RESKEY_iflabel=string] [OCF_RESKEY_lvs_support=boolean] [OCF_RESKEY_local_stop_script=string] [OCF_RESKEY_local_start_script=string] [OCF_RESKEY_ARP_INTERVAL_MS=integer] [OCF_RESKEY_ARP_REPEAT=in- teger] [OCF_RESKEY_ARP_BACKGROUND=boolean] [OCF_RESKEY_ARP_NETMASK=string] IPaddr [start | stop | monitor | validate-all | meta-data] Description This script manages IP alias IP addresses It can add an IP alias, or remove one. Supported Parameters OCF_RESKEY_ip=IPv4 address The IPv4 address to be configured in dotted quad notation, for example...
  • Page 139 OCF_RESKEY_broadcast=Broadcast address Broadcast address associated with the IP. If left empty, the script will determine this from the netmask. OCF_RESKEY_iflabel=Interface label You can specify an additional label for your IP address here. OCF_RESKEY_lvs_support=Enable support for LVS DR Enable support for LVS Direct Routing configurations. In case a IP address is stopped, only move it to the loopback device to allow the local node to continue to service requests, but no longer advertise it on the network.
  • Page 140 ocf:IPsrcaddr (7) ocf:IPsrcaddr — IPsrcaddr resource agent Synopsis [OCF_RESKEY_ipaddress=string] IPsrcaddr [start | stop | stop | monitor | vali- date-all | meta-data] Description Resource script for IPsrcaddr. It manages the preferred source address modification. Supported Parameters OCF_RESKEY_ipaddress=IP address The IP address. Heartbeat...
  • Page 141 ocf:IPv6addr (7) ocf:IPv6addr — manages IPv6 alias Synopsis [OCF_RESKEY_ipv6addr=string] IPv6addr [start | stop | status | monitor | validate- all | meta-data] Description This script manages IPv6 alias IPv6 addresses,It can add an IP6 alias, or remove one. Supported Parameters OCF_RESKEY_ipv6addr=IPv6 address The IPv6 address this RA will manage HB OCF Agents...
  • Page 142 ocf:iscsi (7) ocf:iscsi — iscsi resource agent Synopsis [OCF_RESKEY_portal=string] OCF_RESKEY_target=string [OCF_RESKEY_discovery_type=string] [OCF_RESKEY_iscsiadm=string] [OCF_RESKEY_udev=string] iscsi [start | stop | status | monitor | validate-all | methods | meta-data] Description OCF Resource Agent for iSCSI. Add (start) or remove (stop) iSCSI targets. Supported Parameters OCF_RESKEY_portal=portal The iSCSI portal address in the form: {ip_address|hostname}[":"port]...
  • Page 143 ocf:Ldirectord (7) ocf:Ldirectord — Wrapper OCF Resource Agent for ldirectord Synopsis OCF_RESKEY_configfile=string [OCF_RESKEY_ldirectord=string] Ldirectord [start | stop | monitor | meta-data | validate-all] Description It's a simple OCF RA wrapper for ldirectord and uses the ldirectord interface to create the OCF compliant interface. You win monitoring of ldirectord. Be warned: Asking ldirectord status is an expensive action.
  • Page 144 ocf:LinuxSCSI (7) ocf:LinuxSCSI — LinuxSCSI resource agent Synopsis [OCF_RESKEY_scsi=string] LinuxSCSI [start | stop | methods | status | monitor | meta-data | validate-all] Description This is a resource agent for LinuxSCSI. It manages the availability of a SCSI device from the point of view of the linux kernel. It make Linux believe the device has gone away, and it can make it come back again.
  • Page 145 ocf:LVM (7) ocf:LVM — LVM resource agent Synopsis [OCF_RESKEY_volgrpname=string] LVM [start | stop | status | monitor | methods | meta-data | validate-all] Description Resource script for LVM. It manages an Linux Volume Manager volume (LVM) as an HA resource. Supported Parameters OCF_RESKEY_volgrpname=Volume group name The name of volume group.
  • Page 146 ocf:MailTo (7) ocf:MailTo — MailTo resource agent Synopsis [OCF_RESKEY_email=string] [OCF_RESKEY_subject=string] MailTo [start | stop | status | monitor | meta-data | validate-all] Description This is a resource agent for MailTo. It sends email to a sysadmin whenever a takeover occurs. Supported Parameters OCF_RESKEY_email=Email address The email address of sysadmin.
  • Page 147 ocf:ManageRAID (7) ocf:ManageRAID — Manages RAID devices Synopsis [OCF_RESKEY_raidname=string] ManageRAID [start | stop | status | monitor | validate-all | meta-data] Description Manages starting, stopping and monitoring of RAID devices which are preconfigured in /etc/conf.d/HB-ManageRAID. Supported Parameters OCF_RESKEY_raidname=RAID name Name (case sensitive) of RAID to manage. (preconfigured in /etc/conf.d/HB- ManageRAID) HB OCF Agents...
  • Page 148 ocf:ManageVE (7) ocf:ManageVE — OpenVZ VE resource agent Synopsis [OCF_RESKEY_veid=integer] ManageVE [start | stop | status | monitor | validate-all | meta-data] Description This OCF complaint resource agent manages OpenVZ VEs and thus requires a proper OpenVZ installation including a recent vzctl util. Supported Parameters OCF_RESKEY_veid=OpenVZ ID of VE OpenVZ ID of virtual environment (see output of vzlist -a for all assigned IDs)
  • Page 149 ocf:mysql (7) ocf:mysql — MySQL resource agent Synopsis [OCF_RESKEY_binary=string] [OCF_RESKEY_config=string] [OCF_RESKEY_datadir=string] [OCF_RESKEY_user=string] [OCF_RESKEY_group=string] [OCF_RESKEY_log=string] [OCF_RESKEY_pid=string] [OCF_RESKEY_socket=string] [OCF_RESKEY_test_table=string] [OCF_RESKEY_test_user=string] [OCF_RESKEY_test_passwd=string] [OCF_RESKEY_enable_creation=in- teger] [OCF_RESKEY_additional_parameters=integer] mysql [start | stop | status | monitor | validate-all | meta-data] Description Resource script for MySQL. It manages a MySQL Database instance as an HA resource. Supported Parameters OCF_RESKEY_binary=MySQL binary Location of the MySQL binary...
  • Page 150 OCF_RESKEY_log=MySQL log file The logfile to be used for mysqld. OCF_RESKEY_pid=MySQL pid file The pidfile to be used for mysqld. OCF_RESKEY_socket=MySQL socket The socket to be used for mysqld. OCF_RESKEY_test_table=MySQL test table Table to be tested in monitor statement (in database.table notation) OCF_RESKEY_test_user=MySQL test user MySQL test user OCF_RESKEY_test_passwd=MySQL test user password...
  • Page 151 ocf:o2cb (7) ocf:o2cb — OCFS2 membership layer manager. Synopsis OCF_RESKEY_netdev=string OCF_RESKEY_port=string [OCF_RESKEY_ocfs2_cluster=string] o2cb [start | stop | notify | monitor | vali- date-all | meta-data] Description This script manages the Oracle Cluster membership layer. It obsoletes manual configu- ration of the nodes in /etc/ocfs2/cluster.conf, and automates the discovery of the IP addresses uses by o2cb.
  • Page 152 ocf:oracle (7) ocf:oracle — oracle resource agent Synopsis OCF_RESKEY_sid=string [OCF_RESKEY_home=string] [OCF_RESKEY_user=string] [OCF_RESKEY_ipcrm=string] oracle [start | stop | status | monitor | validate-all | methods | meta-data] Description Resource script for oracle. Manages an Oracle Database instance as an HA resource. Supported Parameters OCF_RESKEY_sid=sid The Oracle SID (aka ORACLE_SID).
  • Page 153 fail. There are some precautions, however, to prevent stepping on other peoples toes. There is also a dumpinstipc option which will make us print the IPC objects which belong to the instance. Use it to see if we parse the trace file correctly. Three settings are possible: - none: don't mess with IPC and hope for the best (beware: you'll probably be out of luck, sooner or later) - instance: try to figure out the IPC stuff which belongs to the instance and remove only those (default;...
  • Page 154 ocf:oralsnr (7) ocf:oralsnr — oralsnr resource agent Synopsis OCF_RESKEY_sid=string [OCF_RESKEY_home=string] [OCF_RESKEY_user=string] OCF_RESKEY_listener=string oralsnr [start | stop | status | monitor | validate-all | meta-data | methods] Description Resource script for Oracle Listener. It manages an Oracle Listener instance as an HA resource.
  • Page 155 ocf:pgsql (7) ocf:pgsql — pgsql resource agent Synopsis [OCF_RESKEY_pgctl=string] [OCF_RESKEY_start_opt=string] [OCF_RESKEY_ctl_opt=string] [OCF_RESKEY_psql=string] [OCF_RESKEY_pgdata=string] [OCF_RESKEY_pgdba=string] [OCF_RESKEY_pghost=string] [OCF_RESKEY_pgport=string] [OCF_RESKEY_pgdb=string] [OCF_RESKEY_logfile=string] [OCF_RESKEY_stop_escalate=string] pgsql [start | stop | status | monitor | meta-data | validate-all | methods] Description Resource script for PostgreSQL. It manages a PostgreSQL as an HA resource. Supported Parameters OCF_RESKEY_pgctl=pgctl Path to pg_ctl command.
  • Page 156 OCF_RESKEY_pgdba=pgdba User that owns PostgreSQL. OCF_RESKEY_pghost=pghost Hostname/IP Addreess where PosrgeSQL is listening OCF_RESKEY_pgport=pgport Port where PosrgeSQL is listening OCF_RESKEY_pgdb=pgdb Database that will be used for monitoring. OCF_RESKEY_logfile=logfile Path to PostgreSQL server log output file. OCF_RESKEY_stop_escalate=stop escalation Number of retries (using -m fast) before resorting to -m immediate Heartbeat...
  • Page 157 ocf:pingd (7) ocf:pingd — pingd resource agent Synopsis [OCF_RESKEY_pidfile=string] [OCF_RESKEY_user=string] [OCF_RESKEY_dampen=integer] [OCF_RESKEY_set=integer] [OCF_RESKEY_name=integer] [OCF_RESKEY_section=integer] [OCF_RESKEY_multiplier=integer] [OCF_RESKEY_host_list=integer] pingd [start | stop | monitor | meta-data | validate-all] Description This is a pingd Resource Agent. It records (in the CIB) the current number of ping nodes a node can connect to.
  • Page 158 OCF_RESKEY_section=Section name The section place the value in. Rarely needs to be specified. OCF_RESKEY_multiplier=Value multiplier The number by which to multiply the number of connected ping nodes by OCF_RESKEY_host_list=Host list The list of ping nodes to count. Defaults to all configured ping nodes. Rarely needs to be specified.
  • Page 159 ocf:portblock (7) ocf:portblock — portblock resource agent Synopsis [OCF_RESKEY_protocol=string] [OCF_RESKEY_portno=integer] [OCF_RESKEY_action=string] portblock [start | stop | status | monitor | meta- data | validate-all] Description Resource script for portblock. It is used to temporarily block ports using iptables. Supported Parameters OCF_RESKEY_protocol=protocol The protocol used to be blocked/unblocked.
  • Page 160 ocf:Pure-FTPd (7) ocf:Pure-FTPd — OCF Resource Agent compliant FTP script. Synopsis OCF_RESKEY_script=string OCF_RESKEY_conffile=string OCF_RESKEY_daemon_type=string [OCF_RESKEY_pidfile=string] Pure-FTPd [start | stop | monitor | validate-all | meta-data] Description This script manages Pure-FTPd in an Active-Passive setup Supported Parameters OCF_RESKEY_script=Script name with full path The full path to the Pure-FTPd startup script.
  • Page 161 ocf:Raid1 (7) ocf:Raid1 — RAID1 resource agent Synopsis [OCF_RESKEY_raidconf=string] [OCF_RESKEY_raiddev=string] [OCF_RESKEY_homehost=string] Raid1 [start | stop | status | monitor | validate- all | meta-data] Description Resource script for RAID1. It manages a software Raid1 device on a shared storage medium. Supported Parameters OCF_RESKEY_raidconf=RAID config file The RAID configuration file.
  • Page 162 ocf:rsyncd (7) ocf:rsyncd — OCF Resource Agent compliant rsync daemon script. Synopsis [OCF_RESKEY_binpath=string] [OCF_RESKEY_conffile=string] [OCF_RESKEY_bwlimit=string] rsyncd [start | stop | monitor | validate-all | meta- data] Description This script manages rsync daemon Supported Parameters OCF_RESKEY_binpath=Full path to the rsync binary The rsync binary path.
  • Page 163 ocf:SAPDatabase (7) ocf:SAPDatabase — SAP database resource agent Synopsis OCF_RESKEY_SID=string OCF_RESKEY_DIR_EXECUTABLE=string OCF_RESKEY_DBTYPE=string OCF_RESKEY_NETSERVICENAME=string OCF_RESKEY_DBJ2EE_ONLY=boolean OCF_RESKEY_JAVA_HOME=string OCF_RESKEY_STRICT_MONITORING=boolean OCF_RESKEY_AUTOMATIC_RECOVER=boolean OCF_RESKEY_DIR_BOOTSTRAP=string OCF_RESKEY_DIR_SECSTORE=string OCF_RESKEY_PRE_START_USEREXIT=string OCF_RESKEY_POST_START_USEREXIT=string OCF_RESKEY_PRE_STOP_USEREXIT=string OCF_RESKEY_POST_STOP_USEREXIT=string SAPDatabase [start | stop | status | monitor | validate-all | meta-data | methods] Description Resource script for SAP databases. It manages a SAP database of any type as an HA resource.
  • Page 164 OCF_RESKEY_DBJ2EE_ONLY=only JAVA stack installed If you do not have a ABAP stack installed in the SAP database, set this to TRUE OCF_RESKEY_JAVA_HOME=Path to Java SDK This is only needed if the DBJ2EE_ONLY parameter is set to true. Enter the path to the Java SDK which is used by the SAP WebAS Java OCF_RESKEY_STRICT_MONITORING=Activates application level monitoring This controls how the resource agent monitors the database.
  • Page 165 ocf:SAPInstance (7) ocf:SAPInstance — SAP instance resource agent Synopsis OCF_RESKEY_InstanceName=string OCF_RESKEY_DIR_EXECUTABLE=string OCF_RESKEY_DIR_PROFILE=string OCF_RESKEY_START_PROFILE=string OCF_RESKEY_START_WAITTIME=string OCF_RESKEY_AUTOMATIC_RECOVER=boolean OCF_RESKEY_PRE_START_USEREXIT=string OCF_RESKEY_POST_START_USEREXIT=string OCF_RESKEY_PRE_STOP_USEREXIT=string OCF_RESKEY_POST_STOP_USEREXIT=string SAPInstance [start | recover | stop | status | monitor | validate-all | meta-data | methods] Description Resource script for SAP. It manages a SAP Instance as an HA resource. Supported Parameters OCF_RESKEY_InstanceName=instance name: SID_INSTANCE_VIR-HOSTNAME The full qualified SAP instance name.
  • Page 166 OCF_RESKEY_START_WAITTIME=Check the successful start after that time (do not wait for J2EE-Addin) After that time in seconds a monitor operation is executed by the resource agent. Does the monitor return SUCCESS, the start is handled as SUCCESS. This is useful to resolve timing problems with e.g. the J2EE-Addin instance. OCF_RESKEY_AUTOMATIC_RECOVER=Enable or disable automatic startup recovery The SAPInstance resource agent tries to recover a failed start attempt automaticaly one time.
  • Page 167 ocf:SendArp (7) ocf:SendArp — SendArp resource agent Synopsis [OCF_RESKEY_ip=string] [OCF_RESKEY_nic=string] SendArp [start | stop | monitor | meta-data | validate-all] Description This script send out gratuitous Arp for an IP address Supported Parameters OCF_RESKEY_ip=IP address The IP address for sending arp package. OCF_RESKEY_nic=NIC The nic for sending arp package.
  • Page 168 ocf:ServeRAID (7) ocf:ServeRAID — ServeRAID resource agent Synopsis [OCF_RESKEY_serveraid=integer] [OCF_RESKEY_mergegroup=integer] ServeRAID [start | stop | status | monitor | validate-all | meta-data | methods] Description Resource script for ServeRAID. It enables/disables shared ServeRAID merge groups. Supported Parameters OCF_RESKEY_serveraid=serveraid The adapter number of the ServeRAID adapter. OCF_RESKEY_mergegroup=mergegroup The logical drive under consideration.
  • Page 169 ocf:SphinxSearchDaemon (7) ocf:SphinxSearchDaemon — searchd resource agent Synopsis OCF_RESKEY_config=string [OCF_RESKEY_searchd=string] [OCF_RESKEY_search=string] [OCF_RESKEY_testQuery=string] SphinxSearchDaemon [start | stop | monitor | meta-data | validate-all] Description This is a searchd Resource Agent. It manages the Sphinx Search Daemon. Supported Parameters OCF_RESKEY_config=Configuration file searchd configuration file OCF_RESKEY_searchd=searchd binary searchd binary OCF_RESKEY_search=search binary...
  • Page 170 ocf:Stateful (7) ocf:Stateful — Example stateful resource agent Synopsis OCF_RESKEY_state=string Stateful [start | stop | monitor | meta-data | validate- all] Description This is an example resource agent that impliments two states Supported Parameters OCF_RESKEY_state=State file Location to store the resource state in Heartbeat...
  • Page 171 ocf:SysInfo (7) ocf:SysInfo — SysInfo resource agent Synopsis [OCF_RESKEY_pidfile=string] [OCF_RESKEY_delay=string] SysInfo [start | stop | monitor | meta-data | validate-all] Description This is a SysInfo Resource Agent. It records (in the CIB) various attributes of a node Sample Linux output: arch: i686 os: Linux-2.4.26-gentoo-r14 free_swap: 1999 cpu_info: Intel(R) Celeron(R) CPU 2.40GHz cpu_speed: 4771.02 cpu_cores: 1 cpu_load: 0.00 ram_total: 513 ram_free: 117 root_free: 2.4 Sample Darwin output: arch: i386 os: Darwin-8.6.2 cpu_info: Intel Core Duo cpu_speed: 2.16 cpu_cores: 2 cpu_load: 0.18...
  • Page 172 ocf:tomcat (7) ocf:tomcat — tomcat resource agent Synopsis OCF_RESKEY_tomcat_name=string OCF_RESKEY_script_log=string [OCF_RESKEY_tomcat_stop_timeout=integer] [OCF_RESKEY_tomcat_suspend_trialcount=integer] [OCF_RESKEY_tomcat_user=string] [OCF_RESKEY_statusurl=string] OCF_RESKEY_java_home=string OCF_RESKEY_catalina_home=string OCF_RESKEY_catalina_pid=string tomcat [start | stop | status | monitor | meta- data | validate-all] Description Resource script for tomcat. It manages a Tomcat instance as an HA resource. Supported Parameters OCF_RESKEY_tomcat_name=The name of the resource The name of the resource...
  • Page 173 OCF_RESKEY_statusurl=URL for state confirmation URL for state confirmation OCF_RESKEY_java_home=Home directory of the Java Home directory of the Java OCF_RESKEY_catalina_home=Home directory of Tomcat Home directory of Tomcat OCF_RESKEY_catalina_pid=A PID file name of Tomcat A PID file name of Tomcat HB OCF Agents...
  • Page 174 ocf:VIPArip (7) ocf:VIPArip — Virtual IP Address by RIP2 protocol Synopsis OCF_RESKEY_ip=string [OCF_RESKEY_nic=string] VIPArip [start | stop | monitor | validate-all | meta-data] Description Virtual IP Address by RIP2 protocol. This script manages IP alias in different subnet with quagga/ripd. It can add an IP alias, or remove one. Supported Parameters OCF_RESKEY_ip=The IP address in different subnet The IPv4 address in different subnet, for example "192.168.1.1".
  • Page 175 ocf:WAS6 (7) ocf:WAS6 — WAS6 resource agent Synopsis [OCF_RESKEY_profile=string] WAS6 [start | stop | status | monitor | validate-all | meta-data | methods] Description Resource script for WAS6. It manages a Websphere Application Server (WAS6) as an HA resource. Supported Parameters OCF_RESKEY_profile=profile name The WAS profile name.
  • Page 176 ocf:WAS (7) ocf:WAS — WAS resource agent Synopsis [OCF_RESKEY_config=string] [OCF_RESKEY_port=integer] WAS [start | stop | status | monitor | validate-all | meta-data | methods] Description Resource script for WAS. It manages a Websphere Application Server (WAS) as an HA resource. Supported Parameters OCF_RESKEY_config=configration file The WAS-configuration file.
  • Page 177 ocf:WinPopup (7) ocf:WinPopup — WinPopup resource agent Synopsis [OCF_RESKEY_hostfile=string] WinPopup [start | stop | status | monitor | validate- all | meta-data] Description Resource script for WinPopup. It sends WinPopups message to a sysadmin's workstation whenever a takeover occurs. Supported Parameters OCF_RESKEY_hostfile=Host file The file containing the hosts to send WinPopup messages to.
  • Page 178 ocf:Xen (7) ocf:Xen — Manages Xen DomUs Synopsis [OCF_RESKEY_xmfile=string] [OCF_RESKEY_allow_migrate=boolean] [OCF_RESKEY_allow_mem_management=boolean] [OCF_RESKEY_reserved_Dom0_memory=string] [OCF_RESKEY_monitor_scripts=string] Xen [start | stop | migrate_from | mi- grate_to | status | monitor | meta-data | validate-all] Description Resource Agent for the Xen Hypervisor. Manages Xen virtual machine instances by mapping cluster resource start and stop, to Xen create and shutdown, respectively.
  • Page 179 OCF_RESKEY_monitor_scripts=list of space separated monitor scripts To additionally monitor services within the unprivileged domain, add this parameter with a list of scripts to monitor. NB: In this case make sure to set the start-delay of the monitor operation to at least the time it takes for the DomU to start all services. HB OCF Agents...
  • Page 180 ocf:Xinetd (7) ocf:Xinetd — Xinetd resource agent Synopsis [OCF_RESKEY_service=string] Xinetd [start | stop | restart | status | monitor | validate-all | meta-data] Description Resource script for Xinetd. It starts/stops services managed by xinetd. Note that the xinetd daemon itself must be running: we are not going to start it or stop it ourselves. Important: in case the services managed by the cluster are the only ones enabled, you should specify the -stayalive option for xinetd or it will exit on Heartbeat stop.
  • Page 181: Terminology

    Terminology cluster A high-performance cluster is a group of computers (real or virtual) sharing appli- cation load to get things done fast. A high availability cluster is designed primarily to secure the highest possible availability of services. cluster partition Whenever communication fails between one or more nodes and the rest of the cluster, a cluster partition occurs.
  • Page 182 Distributed replicated block device (drbd) DRBD is a block device designed for building high availability clusters. The whole block device is mirrored via a dedicated network and is seen as a network RAID- failover Occurs when a resource or node fails on one machine and the affected resources are started on another node.
  • Page 183 node Any computer (real or virtual) that is a member of a cluster and invisible for the user. pingd The ping daemon. It continuously contacts one or more servers outside the cluster with ICMP pings. policy engine (PE) The policy engine computes the actions that need to be taken to implement policy changes in the CIB.
  • Page 184 source agents, LSB resource agents (Standard LSB init scripts), and Heartbeat re- source agents (Heartbeat v1 resources). Single Point of Failure (SPOF) A single point of failure (SPOF) is any component of a cluster that, should it fail, triggers the failure of the entire cluster. split brain A scenario in which the cluster nodes are divided into two or more groups that do not know of each other (either through a software or hardware failure).

Table of Contents