Novell LINUX ENTERPRISE 11 - HIGH AVAILABILITY Manual

Table of Contents

Quick Links

Download this manual

SUSE Linux Enterprise

High Availability

Extension

February 18, 2010

High Availability Guide

www.novell.com

Table of Contents

Troubleshooting

Summary of Contents for Novell LINUX ENTERPRISE 11 - HIGH AVAILABILITY

Page 1 SUSE Linux Enterprise High Availability Extension www.novell.com High Availability Guide February 18, 2010...
Page 2 SUSE®, openSUSE®, the openSUSE® logo, Novell®, the Novell® logo, the N® logo, are registered trademarks of Novell, Inc. in the United States and other countries. Linux* is a registered trademark of Linus Torvalds. All other third party trademarks are the property of their respective owners. A trademark symbol (®...
Page 3: Table Of Contents
Contents About This Guide Part I Installation and Setup 1 Conceptual Overview Product Features ......Product Benefits .
Page 4 Part II Configuration and Administration 4 Configuring Cluster Resources with the GUI Linux HA Management Client ....Creating Cluster Resources ..... Creating STONITH Resources .
Page 5 8 Fencing and STONITH Classes of Fencing ......Node Level Fencing ......STONITH Configuration .
Page 6 14.2 Configuring the DRBD Service ....14.3 Testing the DRBD Service ..... 14.4 Troubleshooting DRBD .
Page 7: About This Guide
About This Guide SUSE® Linux Enterprise High Availability Extension is an integrated suite of open source clustering technologies that enables you to implement highly available physical and virtual Linux clusters. For quick and efficient configuration and administration, the High Availability Extension includes both a graphical user interface (GUI) and a command line interface (CLI).
Page 8 • To report bugs for a product component or to submit enhancement requests, please use https://bugzilla.novell.com/. If you are new to Bugzilla, you might find the Bug Writing FAQs helpful, available from the Novell Bugzilla home page. • We want to hear your comments and suggestions about this manual and the other documentation included with this product.
Page 9 • ls, --help: commands, options, and parameters • user: users or groups • Alt , Alt + F1 : a key to press or a key combination; keys are shown in uppercase as on a keyboard • File, File > Save As: menu items, buttons •...
Page 11: Part I Installation And Setup
Part I. Installation and Setup...
Page 13: Conceptual Overview
Conceptual Overview SUSE® Linux Enterprise High Availability Extension is an integrated suite of open source clustering technologies that enables you to implement highly available physical and virtual Linux clusters, and to eliminate single points of failure. It ensures the high availability and manageability of critical network resources including data, applications, and services.
Page 14 Multi-node active cluster, containing up to 16 Linux servers. Any server in the cluster can restart resources (applications, services, IP addresses, and file systems) from a failed server in the cluster. Flexible Solution The High Availability Extension ships with OpenAIS messaging and membership layer and Pacemaker Cluster Resource Manager.
Page 15: Product Benefits
User-friendly Administration For easy configuration and administration, the High Availability Extension ships with both a graphical user interface (like YaST and the Linux HA Management Client) and a powerful unified command line interface. Both approaches provide a single point of administration for effectively monitoring and administrating your cluster.
Page 16 • Storage consolidation Shared disk fault tolerance can be obtained by implementing RAID on the shared disk subsystem. The following scenario illustrates some of the benefits the High Availability Extension can provide. Example Cluster Scenario Suppose you have configured a three-server cluster, with a Web server installed on each of the three servers in the cluster.
Page 17 Figure 1.2 Three-Server Cluster after One Server Fails Web Site A moves to Web Server 2 and Web Site B moves to Web Server 3. IP addresses and certificates also move to Web Server 2 and Web Server 3. When you configured the cluster, you decided where the Web sites hosted on each Web server would go should a failure occur.
Page 18: Cluster Configurations
back (move back) to Web Server 1, or they can stay where they are. This is dependent on how you configured the resources for them. Migrating the services back to Web Server 1 will incur some down-time, so the High Availability Extension also allows you to defer the migration until a period when it will cause little or no service interrup- tion.
Page 19 Figure 1.3 Typical Fibre Channel Cluster Configuration Although Fibre Channel provides the best performance, you can also configure your cluster to use iSCSI. iSCSI is an alternative to Fibre Channel that can be used to create a low-cost Storage Area Network (SAN). The following figure shows how a typical iSCSI cluster configuration might look.
Page 20 Figure 1.4 Typical iSCSI Cluster Configuration Although most clusters include a shared disk subsystem, it is also possible to create a cluster without a share disk subsystem. The following figure shows how a cluster without a shared disk subsystem might look. Figure 1.5 Typical Cluster Configuration Without Shared Storage High Availability Guide...
Page 21: Architecture
1.4 Architecture This section provides a brief overview of the High Availability Extension architecture. It identifies and provides information on the architectural components, and describes how those components interoperate. 1.4.1 Architecture Layers The High Availability Extension has a layered architecture. Figure 1.6, “Architecture” (page 11) illustrates the different layers and their associated components.
Page 22 Messaging and Infrastructure Layer The primary or first layer is the messaging/infrastructure layer, also known as the OpenAIS layer. This layer contains components that send out the messages containing “I'm alive” signals, as well as other information. The program of the High Availability Extension resides in the messaging/infrastructure layer.
Page 23 taining a list of (resource) actions and dependencies to achieve the next cluster state. The PE runs on every node to speed up DC failover. Local Resource Manager (LRM) The LRM calls the local Resource Agents (see Section “Resource Layer” (page 13)) on behalf of the CRM.
Page 24: What's New
they will be relayed to the DC. The DC will then replicate the CIB change to all cluster nodes. Based on the information in the CIB, the PE then computes the ideal state of the cluster and how it should be achieved and feeds a list of instructions to the DC. The DC sends commands via the messaging/infrastructure layer which are received by the crmd peers on other nodes.
Page 25 1.5.1 New Features and Functions Added Migration Threshold and Failure Timeouts The High Availability Extension now comes with the concept of a migration threshold and failure timeout. You can define a number of failures for resources, after which they will migrate to a new node. By default, the node will no longer be allowed to run the failed resource until the administrator manually resets the resource’s failcount.
Page 26 Triggering Recurring Actions at Known Times By default, recurring actions are scheduled relative to when the resource started, but this is not always desirable. To specify a date/time that the operation should be relative to, set the operation’s interval-origin. The cluster uses this point to cal- culate the correct start-delay such that the operation will occur at origin + (interval * N).
Page 27 Validating and Parsing XML The cluster configuration is written in XML. Instead of a Document Type Definition (DTD), now a more powerful RELAX NG schema is used to define the pattern for the structure and content. libxml2 is used as parser. id Fields id fields are now XML IDs which have the following limitations: •...
Page 29: Getting Started
Getting Started In the following, learn about the system requirements and which preparations to take before installing the High Availability Extension. Find a short overview of the basic steps to install and set up a cluster. 2.1 Hardware Requirements The following list specifies hardware requirements for a cluster based on SUSE® Linux Enterprise High Availability Extension.
Page 30: Software Requirements
nodes that are thought to be dead or behaving in a strange manner. Resetting non- heartbeating nodes is the only reliable way to ensure that no data corruption is performed by nodes that hang and only appear to be dead. For more information, refer to Chapter 8, Fencing and STONITH (page 81).
Page 31: Preparations
. For more information, refer to SUSE Linux Enterprise Server Administration Guide, chapter Time Synchronization with NTP , available at http://www.novell.com/documentation. The cluster nodes will use the time server as their time synchronization source. 2.5 Overview: Installing and Setting...
Page 32 4. Adding and configuring cluster resources, either with a graphical user interface (GUI) or from command line. For detailed information, see Chapter 4, Configuring Cluster Resources with the GUI (page 31) or Chapter 5, Configuring Cluster Re- sources From Command Line (page 59). To protect your data from possible corruption by means of fencing and STONITH, make sure to configure STONITH devices as resources.
Page 33: Installation And Basic Setup With Yast
Installation and Basic Setup with YaST There are several ways to install the software needed for High Availability clusters: either from a command line, using zypper, or with YaST which provides a graphical user interface. After installing the software on all nodes that will be part of your cluster, the next step is to initially configure the cluster so that the nodes can communicate with each other.
Page 34: Initial Cluster Setup
1 Start YaST and select Software > Software Management to open the YaST package manager. 2 From the Filter list, select Patterns and activate the High Availability pattern in the pattern list. 3 Click Accept to start the installation of the packages. 3.2 Initial Cluster Setup After having installed the HA packages, you can configure the initial cluster setup with YaST.
Page 35 Define the Bind Network Address, the Multicast Address and the Multicast Port to use for all cluster nodes. 3 Specify a unique Node ID for every cluster node. It is recommended to start at 4 In the Security category, define the authentication settings for the cluster. If Enable Security Authentication is activated, HMAC/SHA1 authentication is used for communication between the cluster nodes.
Page 36 5 In the Service category, choose whether you want to start OpenAIS on this cluster server each time it is booted. If you select Off, you must start OpenAIS manually each time this cluster server is booted. To start OpenAIS manually, use the rcopenais start command. To start OpenAIS immediately, click Start OpenAIS Now.
Page 37: Bringing The Cluster Online
7 After the initial configuration is done, you need to transfer the configuration to the other nodes in the cluster. The easiest way to do so is to copy the /etc/ ais/openais.conf file to the other nodes in the cluster. As each node needs to have a unique node ID, make sure to adjust the node ID accordingly after copying the file.
Page 39: Part Ii Configuration And Administration
Part II. Configuration and Administration...
Page 41: Configuring Cluster Resources With The Gui
Configuring Cluster Resources with the GUI The main purpose of an HA cluster is to manage user services. Typical examples of user services are an Apache web server or a database. From the user's point of view, the services do something specific when ordered to do so. To the cluster, however, they are just resources which may be started or stopped—the nature of the service is irrelevant to the cluster.
Page 42: Linux Ha Management Client
4.1 Linux HA Management Client When starting the Linux HA Management Client you need to connect to a cluster. NOTE: Password for the hacluster User The installation creates a linux user named hacluster. Prior to using the Linux HA Management Client, you must set the password for the hacluster user.
Page 43: Creating Cluster Resources
After being connected, the main window opens: Figure 4.2 Linux HA Management Client - Main Window The Linux HA Management Client lets you add and modify resources, constraints, configurations etc. It also provides functionalities for managing cluster components like starting, stopping or migrating resources, cleaning up resources, or setting nodes to standby.
Page 44 Group Groups contain a set of resources that need to be located together, start sequentially and stop in the reverse order. For more information, refer to Section 4.10, “Config- uring a Cluster Resource Group” (page 49). Clone Clones are resources that can be active on multiple hosts. Any resource can be cloned, provided the respective resource agent supports it.
Page 45 3f Activate Add monitor operation if you want the cluster to monitor if the re- source is still healthy. 4 Click Forward. The next window shows a summary of the parameters that you have already defined for that resource. All required Instance Attributes for that resource are listed.
Page 46 Instance Attributes Instance attributes are parameters for certain resource classes that determine how they behave and which instance of a service they control. For more information, refer to Section 17.5, “Instance Attributes” (page 199). Operations The monitor operations added for a resource. These instruct the cluster to make sure that the resource is still healthy.
Page 47: Creating Stonith Resources
3 To add a new meta attribute or instance attribute, select the respective tab and click Add. 4 Select the Name of the attribute you want to add. A short Description is displayed. 5 If needed, specify an attribute Value. Otherwise the default value of that attribute will be used.
Page 48: Configuring Resource Constraints
3b From the Class list, select the resource agent class stonith. 3c From the Type list, select the STONITH plug-in for controlling your STONITH device. A short description for this plug-in is displayed below. 3d Below Options, set the Initial state of resource. 3e Activate Add monitor operation if you want the cluster to monitor the fencing device.
Page 49 Resource Collocation Collocational constraints that tell the cluster which resources may or may not run together on a node. Resource Order Ordering constraints to define the sequence of actions. When defining constraints, you also need to deal with scores. Scores of all kinds are integral to how the cluster works.
Page 50 6 Select the Resource for which to define the constraint. The list shows the IDs of all resources that have been configured for the cluster. 7 Set the Score for the constraint. Positive values indicate the resource can run on the Node you specify below. Negative values indicate the resource can not run on this node.
Page 51 5 Enter a unique ID for the constraint. When modifying existing constraints, the ID is already defined and is displayed in the configuration dialog. 6 Select the Resource which is the collocation source. The list shows the IDs of all resources that have been configured for the cluster. If the constraint cannot be satisfied, the cluster may decide not to allow the resource to run at all.
Page 52 4 Select Resource Order and click OK. 5 Enter a unique ID for the constraint. When modifying existing constraints, the ID is already defined and is displayed in the configuration dialog. 6 With First, define the resource that must be started before the Then resource is allowed to.
Page 53: Specifying Resource Failover Nodes
For more information on configuring constraints and detailed background information about the basic concepts of ordering and collocation, refer to the following documents available at http://clusterlabs.org/wiki/Documentation: • Configuration 1.0 Explained , chapter Resource Constraints • Collocation Explained • Ordering Explained 4.5 Specifying Resource Failover Nodes A resource will be automatically restarted if it fails.
Page 54 For example, let us assume you have configured a location constraint for resource r1 to preferably run on node1. If it fails there, migration-threshold is checked and compared to the failcount. If failcount >= migration-threshold then the resource is migrated to the node with the next best preference. By default, once the threshold has been reached, the node will no longer be allowed to run the failed resource until the administrator manually resets the resource’s failcount (after fixing the failure cause).
Page 55: Specifying Resource Failback Nodes (Resource Stickiness)
4.6 Specifying Resource Failback Nodes (Resource Stickiness) A resource might fail back to its original node when that node is back online and in the cluster. If you want to prevent a resource from failing back to the node it was running on prior to failover, or if you want to specify a different node for the resource to fail back to, you must change its resource stickiness value.
Page 56: Configuring Resource Monitoring
Procedure 4.7 Specifying Resource Stickiness 1 Add the resource-stickiness meta attribute to the resource as described in Procedure 4.2, “Adding or Modifying Meta and Instance Attributes” (page 36). 2 As Value for the resource-stickiness, specify a value between -INFINITY and INFINITY.
Page 57 4 To add a new monitor operation, select the respective tab and click Add. To modify an existing operation, select the respective entry and click Edit. 5 Enter a unique ID for the monitor operation. When modifying existing monitor operations, the ID is already defined and is displayed in the configuration di- alog.
Page 58: Starting A New Cluster Resource
If you do not configure resource monitoring, resource failures after a successful start will not be communicated, and the cluster will always show the resource as healthy. If the resource monitor detects a failure, the following takes place: • Log file messages are generated, according to the configuration specified in the logging section of /etc/ais/openais.conf (by default, written to syslog, usually /var/log/messages).
Page 59: Removing A Cluster Resource
4.9 Removing a Cluster Resource To remove a cluster resource with the Linux HA Management Client, switch to the Resources view in the left pane, then select the respective resource and click Remove. NOTE: Removing Referenced Resources Cluster resources cannot be removed if their ID is referenced by any constraint. If you cannot delete a resource, check where the resource ID is referenced and remove the resource from the constraint first.
Page 60 Stickiness Stickiness is additive in groups. Every active member of the group will contribute its stickiness value to the group’s total. So if the default resource-stickiness is 100 and a group has seven members (five of which are active), then the group as a whole will prefer its current location with a score of 500.
Page 61 The position of the resources in the Primitive tab represents the order in which the resources are started in the cluster. 8 As the order of resources in a group is important, use the Up and Down buttons to sort the Primitives in the group. 9 If all parameters are set according to your wishes, click OK to finish the configu- ration of that group.
Page 62 Figure 4.5 Group Resource In Procedure 4.9, “Adding a Resource Group” (page 50), you learned how to create a resource group. Let us assume you already have created a resource group as explained above. Procedure 4.10, “Adding Resources to an Existing Group” (page 52) shows you how to modify the group to match Example 4.1, “Resource Group for a Web Server”...
Page 63 4c As Provider of your OCF resource agent, select heartbeat. 4d From the Type list, select IPaddr as resource agent. 4e Click Forward. 4f In the Instance Attribute tab, select the IP entry and click Edit (or double- click the IP entry). 4g As Value, enter the desired IP address, for example, 192.168.1.1.
Page 64: Configuring A Clone Resource
7 In case you need to change the resource order for a group, use the Up and Down buttons to sort the resources on the Primitive tab. 8 To remove a resource from the group, select the resource on the Primitives tab and click Remove.
Page 65: Migrating A Cluster Resource
Procedure 4.11 Adding or Modifying Clones 1 Start the Linux HA Management Client and log in to the cluster as described in Section 4.1, “Linux HA Management Client” (page 32). 2 In the left pane, select Resources and click Add > Clone. 3 Enter a unique ID for the clone.
Page 66 3 In the new window, select the node to which to move the resource to in To Node. This creates a location constraint with an INFINITY score for the destination node. 4 If you want to migrate the resource only temporarily, activate Duration and enter the time frame for which the resource should migrate to the new node.
Page 67: For More Information
To allow the resource to move back again, switch to the Management, right-click the resource view and select Clear Migrate Constraints. This uses the crm_resource -U command. The resource can move back to its original location or it may stay where it is (depending on resource stickiness).
Page 69: Configuring Cluster Resources From Command Line
Configuring Cluster Resources From Command Line Like in Chapter 4 (page 31), a cluster resource must be created for every resource or application you run on the servers in your cluster. Cluster resources can include Web sites, e-mail servers, databases, file systems, virtual machines, and any other server- based applications or services that you want to make available to users at all times.
Page 70: Debugging Your Configuration Changes
5.2 Debugging Your Configuration Changes Before loading the changes back into the cluster, it is recommended to view your changes with ptest. The ptest can show a diagram of actions which would be induced by the changes to be committed. You need the graphiz package to display the diagrams. The following example is a transcript, adding a monitor operation: # crm crm(live)# configure...
Page 71 The previous command configures a “primitive” with the name myIP. You need the class (here ocf), provider (heartbeat), and type (IPaddr). Furthermore this primitive expects some parameters like the IP address. You have to change the address to your setup. 4 Display and review the changes you have made: crm(live)configure# show To see the XML structure, use the following:...
Page 72 5.3.2 OCF Resource Agents All OCF agents are located in /usr/lib/ocf/resource.d/heartbeat/. These are small programs that have a functionality similar to that of LSB scripts. However, the configuration is always done with environment variables. All OCF Resource Agents are required to have at least the actions start, stop, status, monitor, and meta-data.
Page 73 More information about a resource agent can be viewed with meta: crm(live)ra# meta Filesystem ocf heartbeat Filesystem resource agent (ocf:heartbeat:Filesystem) Resource script for Filesystem. It manages a Filesystem on a shared storage medium. Parameters (* denotes required, [] the default): You can leave the viewer by pressing Q .
Page 74 istration Guide. For now, assume that you configured a resource r0 that may be accessed at the device /dev/drbd0 on both of your cluster nodes. The drbd resource is an OCF master slave resource. This can be found in the description of the metadata of the drbd RA.
Page 75: Creating A Stonith Resource
crm(live)# configure crm(live)configure# primitive nfs_resource lsb:nfsserver crm(live)configure# primitive ip_resource ocf:heartbeat:IPaddr \ params ip=10.10.0.1 crm(live)configure# group nfs_group nfs_resource ip_resource crm(live)configure# commit crm(live)configure# end crm(live)# quit 5.4 Creating a STONITH Resource From the crm perspective, a STONITH device is just another resource. To create a STONITH resource, proceed as follows: 1 Run the crm command as system administrator.
Page 76: Configuring Resource Constraints
meta target-role=Stopped \ operations my_stonith-operations \ op monitor start-delay=15 timeout=15 hostlist='' \ pduip='' community='' 5.5 Configuring Resource Constraints Having all the resources configured is only one part of the job. Even if the cluster knows all needed resources, it might still not be able to handle them correctly. For example, it would not make sense to try to mount the file system on the slave node of drbd (in fact, this would fail with drbd).
Page 77 crm(live)configure# order rsc1 rsc2 crm(live)configure# colocation rsc2 rsc1 It is only possible to set a score of either +INFINITY or -INFINITY, defining resources that must always or must never run on the same node. For example, to run the two re- sources with the IDs filesystem_resource and nfs_group always on the same host, use the following constraint: crm(live)configure# colocation nfs_on_filesystem inf: nfs_group...
Page 78: Specifying Resource Failover Nodes
• The NFS server as well as the IP address must be on the same node as the file sys- tem. crm(live)configure# colocation nfs_with_fs inf: \ nfs_group filesystem_resource • The NFS server as well as the IP address start after the file system is mounted: crm(live)configure# order nfs_second mandatory: \ filesystem_resource nfs_group •...
Page 79: Specifying Resource Failback Nodes (Resource Stickiness)
5.7 Specifying Resource Failback Nodes (Resource Stickiness) A rsc may failback after it has been migrated due to the number of failures only when the administrator resets the failcount or the failures have been expired (see failure- timeout meta attribute). crm resource failcount RSC delete NODE 5.8 Configuring Resource Monitoring To monitor a resource, there are two possibilities: either define monitor operation with...
Page 80: Removing A Cluster Resource
5.10 Removing a Cluster Resource To remove a cluster resource you need the relevant identifier. Proceed as follows: 1 Run the crm command as system administrator. The prompt changes to crm(live). 2 Run the following command to get a list of your resources: crm(live)# resource status For example, the output can look like this (whereas myIP is the relevant identifier of your resource):...
Page 81: Configuring A Clone Resource
params ip=1.2.3.4 crm(live)configure# primitive Email lsb:exim 3 Group the primitives with their relevant identifiers in the correct order: crm(live)configure# group shortcut Public-IP Email 5.12 Configuring a Clone Resource Clones were initially conceived as a convenient way to start N instances of an IP resource and have them distributed throughout the cluster for load balancing.
Page 82: Migrating A Cluster Resource
crm(live)# configure crm(live)configure# primitive Apache lsb:apache 3 Clone the primitive: crm(live)configure# clone apache-clone Apache \ meta globally-unique=false 5.12.2 Creating Stateful/Multi-State Clone Resources To create an stateful clone resource, first create a primitive resource and then the master- slave resource. 1 Run the crm command as system administrator. The prompt changes to crm(live).
Page 83: Testing With Shadow Configuration
2 To migrate a resource named ipaddress1 to a cluster node named node2, enter: crm(live)# resource crm(live)resource# migrate ipaddress1 node2 5.14 Testing with Shadow Configuration NOTE: For Experienced Administrators Only Although the concept is easy, it is nevertheless recommended to use shadow configuration only when you really need them, and if you are experienced with High Availability.
Page 84: For More Information
crm(myNewConfig)configure# cib use crm(live)configure# 5.15 For More Information http://linux-ha.org Homepage of High Availability Linux http://www.clusterlabs.org/mediawiki/images/8/8d/Crm_cli .pdf Gives you an introduction to the CRM CLI tool http://www.clusterlabs.org/mediawiki/images/f/fb/ Configuration_Explained.pdf Explains the Pacemaker configuration High Availability Guide...
Page 85: Setting Up A Simple Testing Resource
Setting Up a Simple Testing Resource After your cluster is installed and set up as described in Chapter 3, Installation and Basic Setup with YaST (page 23) and you have learned how to configure resources either with the GUI or from command line, this chapter provides a basic example for the configuration of a simple resource: an IP address.
Page 86 3 Click the Primitives tab and click Add. 4 In the next dialog, set the following parameters to add an IP address as sub-re- source of the group: 4a Enter a unique ID. For example, myIP. 4b From the Class list, select ocf as resource agent class. 4c As Provider of your OCF resource agent, select heartbeat.
Page 87: Manual Configuration Of A Resource
Procedure 6.2 Migrating Resources to Another Node 1 Switch to the Management view in the left pane, then right-click the IP address resource in the right pane and select Migrate Resource. 2 In the new window, select saturn from the To Node drop-down list to move the selected resource to the node saturn.
Page 88 NOTE When configuring a resource with High Availability, the same resource should not be initialized by init. High availability is be responsible for all service start or stop actions. If the configuration was successful, a new resource appears in crm_mon that is started on a random node of your cluster.
Page 89: Adding Or Modifying Resource Agents
Adding or Modifying Resource Agents All tasks that need to be managed by a cluster must be available as a resource. There are two major groups here to consider: resource agents and STONITH agents. For both categories, you can add your own agents, extending the abilities of the cluster to your own needs.
Page 90: Writing Ocf Resource Agents
7.2 Writing OCF Resource Agents All OCF RAs are available in /usr/lib/ocf/resource.d/, see Section 17.1, “Supported Resource Agent Classes” (page 193) for more information. To avoid naming contradictions, create a new subdirectory for each new resource agent. For example, if you have a resource group kitchen with the resource coffee_machine, add this resource to the directory /usr/lib/ocf/resource.d/kitchen/.
Page 91: Fencing And Stonith
Fencing and STONITH Fencing is a very important concept in computer clusters for HA (High Availability). A cluster sometimes detects that one of the nodes is behaving strangely and needs to remove it. This is called fencing and is commonly done with a STONITH resource. Fencing may be defined as a method to bring an HA cluster to a known state.
Page 92: Node Level Fencing
The resource level fencing may be achieved using normal resources on which the resource you want to protect depends. Such a resource would simply refuse to start on this node and therefore resources which depend on will not run on the same node.
Page 93 Blade Power Control Devices If you are running a cluster on a set of blades, then the power control device in the blade enclosure is the only candidate for fencing. Of course, this device must be capable of managing single blade computers. Lights-out Devices Lights-out devices (IBM RSA, HP iLO, Dell DRAC) are becoming increasingly popular, and in the future they may even become standard on off-the-shelf comput-...
Page 94: Stonith Configuration
All STONITH plug-ins reside in /usr/lib/stonith/plugins on each node. All STONITH plug-ins look the same to stonithd, but are quite different on the other side reflecting the nature of the fencing device. Some plug-ins support more than one device. A typical example is ipmilan (or external/ipmi) which implements the IPMI protocol and can control any device which supports this protocol.
Page 95 8.3.1 Example STONITH Resource Configurations In the following, find some example configurations written in the syntax of the crm command line tool. To apply them, put the sample in a text file (for example, sample .txt) and run: crm < sample.txt For more information about configuring resources with the crm command line tool, refer to Chapter 5, Configuring Cluster Resources From Command Line (page 59).
Page 96 Example 8.3 Testing Configuration A more realistic example (but still only for testing) is the following external/ssh confi- guration: configure primitive st-ssh stonith:external/ssh \ params hostlist="node1 node2" clone fencing st-ssh commit This one can also reset nodes. The configuration is remarkably similar to the first one which features the null STONITH device.
Page 97 Example 8.5 Configuration of an UPS Fencing Device The configuration of a UPS type of fencing device is similar to the examples above. The details are left (as an exercise) to the reader. All UPS devices employ the same mechanics for fencing, but how the device itself is accessed varies. Old UPS devices used to have just a serial port, in most cases connected at 1200baud using a special se- rial cable.
Page 98: Monitoring Fencing Devices
8.3.2 Constraints Versus Clones In Section 8.3.1, “Example STONITH Resource Configurations” (page 85) you learned that there are several ways to configure a STONITH resource: using constraints clones or both. The choice of which construct to use for configuration depends on several factors (nature of the fencing device, number of hosts managed by the device, number of cluster nodes, or personal preference).
Page 99: Special Fencing Devices
8.5 Special Fencing Devices Apart from plug-ins which handle real devices, some STONITH plug-ins require addi- tional explanation. external/kdumpcheck Sometimes, it is important to get a kernel core dump. This plug-in can be used to check if a dump is in progress. If that is the case, it will return true, as if the node has been fenced (it cannot run any resources at that time).
Page 100: For More Information
suicide and null are the only exceptions to the “do not shoot my host” rule. 8.6 For More Information /usr/share/doc/packages/heartbeat/stonith/ In your installed system, this directory holds README files for many STONITH plug-ins and devices. http://linux-ha.org/STONITH Information about STONITH on the home page of the The High Availability Linux Project.
Page 101: Load Balancing With Linux Virtual Server
Load Balancing with Linux Virtual Server The goal of Linux Virtual Server (LVS) is to provide a basic framework that directs network connections to multiple servers that share their workload. Linux Virtual Server is a cluster of servers (one or more load balancers and several real servers for running services) which appears to be one large, fast server to an outside client.
Page 102 Clients connect to the director that forwards packets to the real servers. The director is a layer 4 router with a modified set of routing rules (for example, connections do not originate or terminate on the director, it does not send acknowledgments) that make the LVS work.
Page 103: High Availability
9.2 High Availability To construct a highly available Linux Virtual Server cluster, you can use several built- in features of the software. In general, there are service monitor daemons running on the load balancer to check server health periodically. If there is no response for a service access request or ICMP ECHO_REQUEST from a server within a specified time, the service monitor will consider the server dead and remove it from the available server list at the load balancer.
Page 104: For More Information
9.3 For More Information To learn more about Linux Virtual Server, refer to the project home page available at http://www.linuxvirtualserver.org/. High Availability Guide...
Page 105: 0 Network Device Bonding
Network Device Bonding For many systems, it is desirable to implement network connections that comply to more than the standard data security or availability requirements of a typical ethernet device. In these cases, several ethernet devices can be aggregated to a single bonding device.
Page 106 3 Select how to assign the IP address to the bonding device. Three methods are at your disposal: • No IP Address • Dynamic Address (with DHCP or Zeroconf) • Statically assigned IP Address Use the method that is appropriate for your environment. If OpenAIS manages virtual IP addresses, select Statically assigned IP Address and assign a basic IP address on the interface.
Page 107 broadcast Provides fault tolerance 802.3ad Provides dynamic link aggregation if supported by the connected switch. balance-tlb Provides load balancing for outgoing traffic. balance-alb Provides load balancing for incoming and outgoing traffic, if the network devices used allow the modifying of the network device's hardware address while in use.
Page 109: 1 Updating Your Cluster To Suse Linux Enterprise
Updating Your Cluster to SUSE Linux Enterprise 11 If you have an existing cluster based on SUSE® Linux Enterprise Server 10 SP2, you can update your cluster to run with the High Availability Extension on SUSE® Linux Enterprise Server 11. For migration purposes, all cluster nodes must be offline and the cluster must be migrated as a whole—mixed SUSE Linux Enterprise Server 10/SUSE Linux Enterprise Server 11 clusters are not supported.
Page 110: Preparation And Backup
NOTE: Reverting after Update After the update process to SUSE Linux Enterprise Server 11, reverting back to SUSE Linux Enterprise Server 10 is not supported. 11.1 Preparation and Backup Before updating your cluster to the latest product version and converting the data ac- cordingly, you need to prepare your current cluster.
Page 111: Update/Installation
11.2 Update/Installation After preparing the cluster and backing up the files, you can start updating the cluster nodes to the latest product version. Instead of running an update, you can also do a fresh SUSE Linux Enterprise 11 installation on your cluster nodes. Procedure 11.2 Updating to SUSE Linux Enterprise 11 1 On all cluster nodes, perform an update from SUSE Linux Enterprise Server 10 SP2 to SUSE Linux Enterprise Server 11.
Page 112 Procedure 11.3 Testing the Conversion 1 On one of the nodes, create a test directory and copy the backup files to the test directory: $ mkdir /tmp/hb2openais-testdir $ cp /etc/ha.d/ha.cf /tmp/hb2openais-testdir $ cp /var/lib/heartbeat/hostcache /tmp/hb2openais-testdir $ cp /etc/logd.cf /tmp/hb2openais-testdir $ sudo cp /var/lib/heartbeat/crm/cib.xml /tmp/hb2openais-testdir 2 Start the test run with $ /usr/lib/heartbeat/hb2openais.sh -T /tmp/hb2openais-testdir -U or with the following command, if you are using a 64-bit system:...
Page 113: For More Information
3 Start the conversion script as root. If using sudo, specify the privileged user using the -u option: $ /usr/lib/heartbeat/hb2openais.sh -u root Based on the configuration stored in /etc/ha.d/ha.cf, the script will gen- erate a new configuration file for the OpenAIS cluster stack, /etc/ais/ openais.conf.
Page 115: Part Iii Storage And Data Replication
Part III. Storage and Data Replication...
Page 117: 2 Oracle Cluster File System
Oracle Cluster File System 2 Oracle Cluster File System 2 (OCFS2) is a general-purpose journaling file system that has been fully integrated since the Linux 2.6 kernel. OCFS2 allows you to store appli- cation binary files, data files, and databases on devices on shared storage. All nodes in a cluster have concurrent read and write access to the file system.
Page 118: Management Utilities And Commands
• An application’s files are available to all nodes in the cluster. Users simply install it once on an OCFS2 volume in the cluster. • All nodes can concurrently read and write directly to storage via the standard file system interface, enabling easy management of applications that run across the cluster.
Page 119: Ocfs2 Packages
Table 12.1 OCFS2 Utilities OCFS2 Utility Description debugfs.ocfs2 Examines the state of the OCFS file system for the purpose of debugging. fsck.ocfs2 Checks the file system for errors and optionally repairs er- rors. mkfs.ocfs2 Creates an OCFS2 file system on a device, usually a partition on a shared physical or logical disk.
Page 120: Creating An Ocfs2 Volume
6 Click Accept and follow the on-screen instructions. 12.4 Creating an OCFS2 Volume Follow the procedures in this section to configure your system to use OCFS2 and to create OCFS2 volumes. 12.4.1 Prerequisites Before you begin, do the following: • Prepare the block devices you plan to use for your OCFS2 volumes. Leave the de- vices as free space.
Page 121 2b Create the DLM service and have it run on all machines in the cluster: configure primitive dlm ocf:pacemaker:controld op monitor interval=120s clone dlm-clone dlm meta globally-unique=false interleave=true 2c Verify the changes you made to the cluster before commiting them: cib diff configure verify 2d Upload the configuration to the cluster and exit the shell:...
Page 122 cib commit oracle-glue quit 12.4.3 Creating an OCFS2 Volume Creating an OCFS2 file system and adding new nodes to the cluster should be performed on only one of the nodes in the cluster. 1 Open a terminal window and log in as root. 2 Check if the cluster is online with the command crm_mon.
Page 123: Mounting An Ocfs2 Volume
OCFS2 Pa- Description and Recommendation rameter Oracle recommends a cluster size of 128 KB or larger for database volumes. Oracle also recommends a cluster size of 32 or 64 KB for Oracle Home. Number of The maximum number of nodes that can concurrently mount node slots a volume.
Page 124 3 Use one of the following methods to mount the volume. WARNING: Manual Mounted OCFS2 Devices If you mount the ocfs2 file system manually for testing purposes, you are required to umount the file system again before starting to use it by means of OpenAIS.
Page 125: Additional Information
2 Configure Pacemaker to mount the file system on every node in the cluster. configure primitive fs ocf:heartbeat:Filesystem \ params device="/dev/sdb1" directory="/mnt/shared" fstype="ocfs2" \ op monitor interval=120s clone fs-clone fs meta interleave="true" ordered="true" 3 Make sure that Pacemaker only starts the fs clone resource on nodes that also have a clone of the o2cb resource already running: colocation fs-with-o2cb INFINITY: fs-clone o2cb-clone order start-fs-after-o2cb mandatory: o2cb-clone fs-clone...
Page 127: 3 Cluster Lvm
Cluster LVM When managing shared storage on a cluster, every node must be informed about changes that are done to the storage subsystem. The Linux Volume Manager 2 (LVM2), which is widely used to manage local storage, has been extended to support transparent man- agement of volume groups across the whole cluster.
Page 128 Change the locking type to 3, and write the configuration to disk. Copy this configuration to all nodes. 2 Include the clvmd resource as a clone in the pacemaker configuration, and make it depend on the DLM clone resource. A typical snippet from the crm configura- tion shell would look like this: primitive dlm ocf:pacemaker:controld primitive clvm ocf:lvm2:clvmd \...
Page 129: Configuring Eligible Lvm2 Devices Explicitly
7 If you want the volume group to only be activated exclusively on one node, use the following example; in this case, cLVM will protect all logical volumes within the VG from being activated on multiple nodes, as an additional measure of protection for non-clustered applications: primitive vg1 ocf:heartbeat:LVM \ params volgrpname="<volume group name>"...
Page 130: For More Information
3 To remove a device named /dev/sdb1, add the following expression to the filter rule: "r|^/dev/sdb1$|" The complete filter line will look like the following: filter = [ "r|^/dev/sdb1$|", "r|/dev/.*/by-path/.*|", "r|/dev/.*/by-id/.*|", "a/.*/" ] A filter line, that accepts DRBD and MPIO devices but rejects all other devices would look like this: filter = [ "a|/dev/drbd.*|", "a|/dev/.*/by-id/dm-uuid-mpath-.*|", "r/.*/"...
Page 131: 4 Distributed Replicated Block Device (Drbd)
Distributed Replicated Block Device (DRBD) DRBD allows you to create a mirror of two block devices that are located at two different sites across an IP network. When used with OpenAIS, DRBD supports distributed high- availability Linux clusters. IMPORTANT The data traffic between mirrors is not encrypted. For secure data exchange, you should deploy a virtual private network (VPN) solution for the connection.
Page 132: Installing Drbd Services
14.1 Installing DRBD Services To install the needed packages for drbd, install the High Availability Extension Add- On product on both SUSE Linux Enterprise Server machines in your networked cluster as described in Part I, “Installation and Setup” (page 1). Installing High Availability Extension also installs the drbd program files.
Page 133 Device The device that holds the replicated data on the respective node. Use this device to create file systems and mount oper- ations. Disk The device that is replicated between both nodes. Meta-disk The Meta-disk is either set to the value internal or specifies an explicit device extended by an index to hold the meta data needed by drbd.
Page 134: Testing The Drbd Service
rcdrbd status Before proceeding, wait until the block devices on both nodes are fully synchro- nized. Repeat the rcdrbd status command to follow the synchronization progress. 9 After the block devices on both nodes are fully synchronized, format the DRBD device on the primary with a file system such as reiserfs.
Page 135 touch /srv/r0mount/from_node1 2 Test the DRBD service on node 2. 2a Open a terminal console, then log in as root. 2b Dismount the disk on node 1. umount /srv/r0mount 2c Downgrade the DRBD service on node 1 by typing the following command on node 1: drbdadm secondary r0 2d On node 2, promote the DRBD service to primary.
Page 136: Troubleshooting Drbd
3 If the service is working on both nodes, the DRBD setup is complete. 4 Set up node 1 as the primary again. 4a Dismount the disk on node 2 by typing the following command on node 2: umount /srv/r0mount 4b Downgrade the DRBD service on node 2 by typing the following command on node 2: drbdadm secondary r0...
Page 137 14.4.1 Configuration If the initial drbd setup does not work as expected, there is probably something wrong with your configuration. To get information about the configuration: 1 Open a terminal console, then log in as root. 2 Test the configuration file by running drbdadm with the -d option. Enter drbdadm -d adjust r0 In a dry run of the adjust option, drbdadm compares the actual configuration of the DRBD resource with your DRBD configuration file, but it does not execute...
Page 138: Additional Information
14.4.3 TCP Port 7788 If your system is unable to connect to the peer, this might be a problem with your local firewall. By default, DRBD uses the TCP port 7788 to access the other node. Make sure that this port is accessible on both nodes. 14.4.4 DRBD Devices Broken after Reboot In cases when DRBD does not know which of the real devices holds the latest data, it changes to a split brain condition.
Page 139 • The project home page http://www.drbd.org. • http://clusterlabs.org/wiki/DRBD_HowTo_1.0 by the Linux Pacemaker Cluster Stack Project. Distributed Replicated Block Device (DRBD)
Page 141: Part Iv Troubleshooting And Reference
Part IV. Troubleshooting and Reference...
Page 143: 5 Troubleshooting
Troubleshooting Often, strange problems may occur that are not easy to understand (especially when starting to experiment with Heartbeat). However, there are several utilities that may be used to take a closer look at the Heartbeat internal processes. This chapter recommends various solutions.
Page 144: Debugging" A Ha Cluster
Check if the communication channels and options configured in /etc/ais/ openais.conf are the same for all cluster nodes. In case you use encrypted communication, check if the /etc/ais/authkey file is available on all cluster nodes. Does the firewall allow communication via the mcastport? If the mcastport used for communication between the cluster nodes is blocked by the firewall, the nodes cannot see each other.
Page 145 Example 15.1 Stopped Resources Refresh in 10s... ============ Last updated: Mon Jan 19 08:56:14 2009 Current DC: d42 (d42) 3 Nodes configured. 3 Resources configured. ============ Online: [ d230 d42 ] OFFLINE: [ clusternode-1 ] Full list of resources: Clone Set: o2cb-clone Stopped: [ o2cb:0 o2cb:1o2cb:2 ] Clone Set: dlm-clone...
Page 146: Faqs
15.3 FAQs What is the state of my cluster? To check the current state of your cluster, use the program crm_mon. This displays the current DC as well as all of the nodes and resources that are known to the current node.
Page 147: Fore More Information
I just get a failed message. Is it possible to get more information? You may always add the -V parameter to your commands. If you do that multiple times, the debug output becomes very verbose. How can I clean up my resources? If you know the IDs of your resources (which you can get with crm_resource -L), remove a specific one with crm_resource -C -r resource id -H HOST.
Page 149: 6 Cluster Management Tools
Cluster Management Tools High Availability Extension ships with a comprehensive set of tools to assists you in managing your cluster from the command line. This chapter introduces the tools needed for managing the cluster configuration in the CIB and the cluster resources. Other command line tools for managing resource agents or tools used for debugging (and troubleshooting) your setup are covered in Chapter 15, Troubleshooting (page 133).
Page 150 Managing Configuration Changes The crm_diff command assists you in creating and applying XML patches. This can be useful for visualizing the changes between two versions of the cluster con- figuration or saving changes so they can be applied at a later time using cibadmin(8) (page 142).
Page 151 become a fully active member of the cluster again. See crm_standby(8) (page 187) for a detailed introduction to this tool's usage and command syntax. Cluster Management Tools...
Page 152 cibadmin (8) cibadmin — Provides direct access to the cluster configuration Synopsis Allows the configuration, or sections of it, to be queried, modified, replaced and/or deleted. cibadmin (--query|-Q) -[Vrwlsmfbp] [-i xml-object-id|-o xml-object-type] [-t t-flag-whatever] [-h hostname] cibadmin (--create|-C) -[Vrwlsmfbp] [-X xml-string] [-x xml- filename] [-t t-flag-whatever] [-h hostname] cibadmin (--replace-R) -[Vrwlsmfbp]...
Page 153 Description The cibadmin command is the low-level administrative command for manipulating the Heartbeat CIB. Use it to dump all or part of the CIB, update all or part of it, modify all or part of it, delete the entire CIB, or perform miscellaneous CIB administrative operations.
Page 154 Commands --bump, -B Increase the epoch version counter in the CIB. Normally this value is increased automatically by the cluster when a new leader is elected. Manually increasing it can be useful if you want to make an older configuration obsolete (such as one stored on inactive cluster nodes).
Page 155 XML Data --xml-text string, -X string Specify an XML tag or fragment on which crmadmin should operate. It must be a complete tag or XML fragment. --xml-file filename, -x filename Specify the XML from a file on which cibadmin should operate. It must be a complete tag or an XML fragment.
Page 156 To add an IPaddr2 resource to the resources section, first create a file foo with the following contents: <primitive id="R_10.10.10.101" class="ocf" type="IPaddr2" provider="heartbeat"> <instance_attributes id="RA_R_10.10.10.101"> <attributes> <nvpair id="R_ip_P_ip" name="ip" value="10.10.10.101"/> <nvpair id="R_ip_P_nic" name="nic" value="eth0"/> </attributes> </instance_attributes> </primitive> Then issue the following command: cibadmin --obj_type resources -U -x foo To change the IP address of the IPaddr2 resource previously added, issue the command below:...
Page 157 cibadmin -D -X '<primitive id="R_10.10.10.101"/>' To replace the CIB with a new manually-edited version of the CIB, use the following command: cibadmin -R -x $HOME/cib.xml Files /var/lib/heartbeat/crm/cib.xml—the CIB (minus status section) on disk. See Also crm_resource(8) (page 166), crmadmin(8) (page 148), lrmadmin(8), heartbeat(8) Caveats Avoid working on the automatically maintained copy of the CIB on the local disk.
Page 158 crmadmin (8) crmadmin — controls the Cluster Resource Manager Synopsis crmadmin [-V|-q] [-i|-d|-K|-S|-E] node crmadmin [-V|-q] -N -B crmadmin [-V|-q] -D crmadmin -v crmadmin -? Description crmadmin was originally designed to control most of the actions of the CRM daemon. However, the largest part of its functionality has been made obsolete by other tools, such as crm_attribute and crm_resource.
Page 159 NOTE Increase the level of verbosity by providing additional instances. --quiet, -q Do not provide any debug information at all and reduce the output to a minimum. --bash-export, -B Create bash export entries of the form export uname=uuid. This applies only to the crmadmin -N node command.
Page 160 --election node, -E node Initiate an election from the specified node. WARNING Use this with extreme caution. This action is normally initiated internally and may have unintended side effects. --dc_lookup, -D Query the uname of the current DC. The location of the DC is only of significance to the crmd internally and is rarely useful to administrators except when deciding on which node to examine the logs.
Page 161 crm_attribute (8) crm_attribute — Allows node attributes and cluster options to be queried, modified and deleted Synopsis crm_attribute [options] Description The crm_attribute command queries and manipulates node attributes and cluster configuration options that are used in the CIB. Options --help, -? Print a help message.
Page 162 --attr-id string, -i string For advanced users only. Identifies the id attribute. --attr-value string, -v string Value to set. This is ignored when used with -G. --node node_name, -N node_name The uname of the node to change --set-name string, -s string Specify the set of attributes in which to read or write the attribute.
Page 163 Add a new attribute called location with the value of office to the set subsection of the nodes section in the CIB (settings applied to the host myhost): crm_attribute -t nodes -U myhost -s set -n location -v office Change the value of the location attribute in the nodes section for the myhost host: crm_attribute -t nodes -U myhost -n location -v backoffice Files...
Page 164 crm_diff (8) crm_diff — identify changes to the cluster configuration and apply patches to the con- figuration files Synopsis crm_diff [-?|-V] [-o filename] [-O string] [-p filename] [-n filename] [-N string] Description The crm_diff command assists in creating and applying XML patches. This can be useful for visualizing the changes between two versions of the cluster configuration or saving changes so they can be applied at a later time using cibadmin.
Page 165 --cib, -c Compare or patch the inputs as a CIB. Always specify the base version with -o and provide either the patch file or the second version with -p or -n, respectively. --stdin, -s Read the inputs from stdin. Examples Use crm_diff to determine the differences between various CIB configuration files and to create patches.
Page 166 See Also cibadmin(8) (page 142) High Availability Guide...
Page 167 crm_failcount (8) crm_failcount — Manage the counter recording each resource's failures Synopsis crm_failcount [-?|-V] -D -u|-U node -r resource crm_failcount [-?|-V] -G -u|-U node -r resource crm_failcount [-?|-V] -v string -u|-U node -r resource Description Heartbeat implements a sophisticated method to compute and force failover of a resource to another node in case that resource tends to fail on the current node.
Page 168 NOTE Increase the level of verbosity by providing additional instances. --quiet, -Q When doing an attribute query using -G, print just the value to stdout. Use this option with -G. --get-value, -G Retrieve rather than set the preference. --delete-attr, -D Specify the attribute to delete.
Page 169 Files /var/lib/heartbeat/crm/cib.xml—the CIB (minus status section) on disk. Editing this file directly is strongly discouraged. See Also crm_attribute(8) (page 151), cibadmin(8) (page 142), and the Linux High Availability FAQ Web site [http://www.linux-ha.org/v2/faq/forced_failover] Cluster Management Tools...
Page 170 crm_master (8) crm_master — Manage a master/slave resource's preference for being promoted on a given node Synopsis crm_master [-V|-Q] -D [-l lifetime] crm_master [-V|-Q] -G [-l lifetime] crm_master [-V|-Q] -v string [-l string] Description crm_master is called from inside the resource agent scripts to determine which re- source instance should be promoted to master mode.
Page 171 NOTE Increase the level of verbosity by providing additional instances. --quiet, -Q When doing an attribute query using -G, print just the value to stdout. Use this option with -G. --get-value, -G Retrieve rather than set the preference to be promoted. --delete-attr, -D Delete rather than set the attribute.
Page 172 crm_mon (8) crm_mon — monitor the cluster's status Synopsis crm_mon [-V] -d -pfilename -h filename crm_mon [-V] [-1|-n|-r] -h filename crm_mon [-V] [-n|-r] -X filename crm_mon [-V] [-n|-r] -c|-1 crm_mon [-V] -i interval crm_mon -? Description The crm_mon command allows you to monitor your cluster's status and configuration. Its output includes the number of nodes, uname, uuid, status, the resources configured in your cluster, and the current status of each.
Page 173 --group-by-node, -n Group resources by node. --inactive, -r Display inactive resources. --simple-status, -s Display the cluster status once as a simple one line output (suitable for nagios). --one-shot, -1 Display the cluster status once on the console then exit (does not use ncurses). --as-html filename, -h filename Write the cluster's status to the specified file.
Page 174 Display your cluster's status and group resources by node: crm_mon -n Display your cluster's status, group resources by node, and include inactive resources in the list: crm_mon -n -r Write your cluster's status to an HTML file: crm_mon -h filename Run crm_mon as a daemon in the background, specify the daemon's pid file for easier control of the daemon process, and create HTML output.
Page 175 crm_node (8) crm_node — Lists the members of a cluster Synopsis crm_node [-V] [-p|-e|-q] Description Lists the members of a cluster. Options be verbose --partition, -p print the members of this partition --epoch, -e print the epoch this node joined the partition --quorum, -q print a 1 if our partiton has quorum Cluster Management Tools...
Page 176 crm_resource (8) crm_resource — Perform tasks related to cluster resources Synopsis crm_resource [-?|-V|-S] -L|-Q|-W|-D|-C|-P|-p [options] Description The crm_resource command performs various resource-related actions on the cluster. It can modify the definition of configured resources, start and stop resources, and delete and migrate resources between nodes. --help, -? Print the help message.
Page 177 --locate, -W Locate a resource. Requires: -r --migrate, -M Migrate a resource from its current location. Use -N to specify a destination. If -N is not specified, the resource is forced to move by creating a rule for the current location and a score of -INFINITY. NOTE This prevents the resource from running on this node until the constraint is removed with -U.
Page 178 Optional: -H --set-parameter string, -p string Set the named parameter for a resource. Requires: -r, -v. Optional: -i, -s, and --meta --get-parameter string, -g string Get the named parameter for a resource. Requires: -r. Optional: -i, -s, and --meta --delete-parameter string, -d string Delete the named parameter for a resource.
Page 179 --meta Modify a resource's configuration option rather than one which is passed to the resouce agent script. For use with -p, -g and -d. --lifetime string, -u string Lifespan of migration constraints. --force, -f Force the resource to move by creating a rule for the current location and a score of -INFINITY This should be used if the resource's stickiness and constraint scores total more than INFINITY (currently 100,000).
Page 180 Start or stop a resource: crm_resource -r my_first_ip -p target_role -v started crm_resource -r my_first_ip -p target_role -v stopped Query the definition of a resource: crm_resource -Q -r my_first_ip Migrate a resource away from its current location: crm_resource -M -r my_first_ip Migrate a resource to a specific location: crm_resource -M -r my_first_ip -H c001n02 Allow a resource to return to its normal location:...
Page 181 Recheck all nodes for resources started outside the CRM: crm_resource -P Recheck one node for resources started outside the CRM: crm_resource -P -H c001n02 Files /var/lib/heartbeat/crm/cib.xml—the CIB (minus status section) on disk. Editing this file directly is strongly discouraged. See Also cibadmin(8) (page 142), crmadmin(8) (page 148), lrmadmin(8), heartbeat(8) Cluster Management Tools...
Page 182 crm_shadow (8) crm_shadow — Perform Configuration Changes in a Sandbox Before Updating The Live Cluster Synopsis crm_shadow [-V] [-p|-e|-q] Description Sets up an environment in which configuration tools (cibadmin, crm_resource, etc) work offline instead of against a live cluster, allowing changes to be previewed and tested for side-effects.
Page 183 --reset, -rNAME recreate the named shadow copy from the active cluster configuration --commit, -cNAME upload the contents of the named shadow copy to the cluster --delete, -dNAME delete the contents of the named shadow copy --edit, -eNAME Edit the contents of the named shadow copy with your favorite editor --batch, -b do not spawn a new shell --force, -f...
Page 184 Command Syntax/Description tution when the alias is expanded. Alias returns true unless a NAME is given for which no alias has been defined. bg [JOB_SPEC ...] Place each JOB_SPEC in the background, as if it had been started with &. If JOB_SPEC is not present, the shell's notion of the current job is used.
Page 185 Command Syntax/Description Selectively execute COMMANDS based upon WORD matching PATTERN. The `|' is used to separate multiple patterns. cd [-L|-P] [dir] Change the current directory to DIR. command [-pVv] command command [arg ...] Runs COMMAND with ARGS ignoring shell functions. If you have a shell function called `ls', and you wish to call the command `ls', you can say "command ls".
Page 186 Command Syntax/Description Resume the next iteration of the enclosing FOR, WHILE or UNTIL loop. If N is specified, resume at the N-th enclosing loop. declare [-afFirtx] [-p] [name[=value] ...] declare Declare variables and/or give them attributes. If no NAMEs are given, then display the values of variables instead.
Page 187 Command Syntax/Description \r (carriage return) \t (horizontal tab) \v (vertical tab) \\ (backslash) \0nnn (the character whose ASCII code is NNN (octal). NNN can be 0 to 3 octal digits) You can turn off the interpretation of the above characters with the -E option.
Page 188 Command Syntax/Description process to NAME. If the file cannot be executed and the shell is not interactive, then the shell exits, unless the shell option execfail is set. exit [N] exit Exit the shell with a status of N. If N is omitted, the exit status is that of the last command executed.
Page 189 Command Syntax/Description The for loop executes a sequence of commands for each member in a list of items. If in WORDS ...; is not present, then in "$@" is assumed. For each element in WORDS, NAME is set to that element, and the COMMANDS are executed.
Page 190 Command Syntax/Description the entries. The -d option deletes the history entry at offset OFFSET. The -w option writes out the current history to the history file; -r means to read the file and append the contents to the history list instead. -a means to append history lines from this session to the history file.
Page 191 Command Syntax/Description local NAME[=VALUE] ... local Create a local variable called NAME, and give it VALUE. local can only be used within a function; it makes the variable NAME have a visible scope restricted to that function and its children. logout logout Logout of a login shell.
Page 192 Command Syntax/Description Print the current working directory. With the -P option, pwd prints the physical directory, without any symbolic links; the -L option makes pwd follow symbolic links. read [-ers] [-u fd] [-t timeout] [-p prompt] [-a array] [-n read nchars] [-d delim] [NAME ...] The given NAMEs are marked readonly and the values of these NAMEs may not be changed by subsequent assignment.
Page 193 Command Syntax/Description the prompt are redisplayed. If EOF is read, the command completes. Any other value read causes NAME to be set to null. The line read is saved in the variable REPLY. COMMANDS are executed after each se- lection until a break command is executed. set [--abefhkmnptuvxBCHP] [-o OPTION] [ARG...] Sets internal shell options.
Page 194 Command Syntax/Description test [expr] test Exits with a status of 0 (true) or 1 (false) depending on the evaluation of EXPR. Expressions may be unary or binary. Unary expressions are often used to examine the status of a file. There are string operators as well, and numeric comparison operators.
Page 195 Command Syntax/Description true true Return a successful result. type [-afptP] NAME [NAME ...] type Obsolete, see declare. typeset [-afFirtx] [-p] name[=value] typeset Obsolete, see declare. ulimit [-SHacdfilmnpqstuvx] [limit ulimit Ulimit provides control over the resources available to processes started by the shell, on systems that allow such control. umask [-p] [-S] [MODE] umask The user file-creation mask is set to MODE.
Page 196 Command Syntax/Description until COMMANDS; do COMMANDS; done until Expand and execute COMMANDS as long as the final command in the until COMMANDS has an exit status which is not zero. wait [N] wait Wait for the specified process and report its termination status. If N is not given, all currently active child processes are waited for, and the return code is zero.
Page 197 crm_standby (8) crm_standby — manipulate a node's standby attribute to determine whether resources can be run on this node Synopsis crm_standby [-?|-V] -D -u|-U node -r resource crm_standby [-?|-V] -G -u|-U node -r resource crm_standby [-?|-V] -v string -u|-U node -r resource [-l string] Description The crm_standby command manipulates a node's standby attribute.
Page 198 --quiet, -Q When doing an attribute query using -G, print just the value to stdout. Use this option with -G. --get-value, -G Retrieve rather than set the preference. --delete-attr, -D Specify the attribute to delete. --attr-value string, -v string Specify the value to use. This option is ignored when used with -G. --attr-id string, -i string For advanced users only.
Page 199 Remove the standby property from a node: crm_standby -D -U node1 Have a node go to standby for an indefinite period of time: crm_standby -v true -l forever -U node1 Have a node go to standby until the next reboot of this node: crm_standby -v true -l reboot -U node1 Files /var/lib/heartbeat/crm/cib.xml—the CIB (minus status section) on disk.
Page 200 crm_verify (8) crm_verify — check the CIB for consistency Synopsis crm_verify [-V] -x file crm_verify [-V] -X string crm_verify [-V] -L|-p crm_verify [-?] Description crm_verify checks the configuration database (CIB) for consistency and other problems. It can be used to check a file containing the configuration or can it can connect to a running cluster.
Page 201 --live-check, -L Connect to the running cluster and check the CIB. --crm_xml string, -X string Check the configuration in the supplied string. Pass complete CIBs only. --xml-file file, -x file Check the configuration in the named file. --xml-pipe, -p Use the configuration piped in via stdin. Pass complete CIBs only. Examples Check the consistency of the configuration in the running cluster and produce verbose output:...
Page 203: 7 Cluster Resources
Cluster Resources This chapter summarizes the most important facts and figures related to cluster resources: the resource agent classes the High Availability Extension supports, the error codes for OCF resource agents and how the cluster reacts to the error codes, the available resource options, resource operations and instance attributes.
Page 204: Ocf Return Codes
http://wiki Availability OCF RAs if possible. For more information, see .linux-ha.org/HeartbeatResourceAgent. Linux Standards Base (LSB) Scripts LSB resource agents are generally provided by the operating system/distribution and are found in /etc/init.d. To be used with the cluster, they must conform to the LSB specification.
Page 205 the result does not match the expected value, then the operation is considered to have failed and a recovery action is initiated. There are three types of failure recovery: Table 17.1 Failure Recovery Types Recovery Description Action Taken by the Cluster Type soft A transient error occurred.
Page 206 OCF Alias Description Recovery Return Type Code OCF_ERR_ARGS The resource’s configuration is not valid hard on this machine (for example, it refers to a location/tool not found on the node). OCF_ERR_UNIM- The requested action is not implemented. hard PLEMENTED OCF_ERR_PERM The resource agent does not have suffi- hard cient privileges to complete the task.
Page 207: Resource Options
17.3 Resource Options For each resource you add, you can define options. Options are used by the cluster to decide how your resource should behave—they tell the CRM how to treat a specific resource. Resource options can be set with the crm_resource --meta command or with the GUI as described in .
Page 208: Resource Operations
17.4 Resource Operations By default, the cluster will not ensure that your resources are still healthy. To instruct the cluster to do this, you need to add a monitor operation to the resource’s definition. Monitor operations can be added for all classes or resource agents. Table 17.4 Resource Operations Operation...
Page 209: Instance Attributes
Operation Description • standby: Move all resources away from the node on which the resource failed. enabled If false, the operation is treated as if it does not exist. Allowed values: true, false. 17.5 Instance Attributes The scripts of all resource classes can be given parameters which determine how they behave and which instance of a service they control.
Page 211: 8 Ha Ocf Agents
HA OCF Agents All OCF agents require several parameters to be set when they are started. The following overview shows how to manually operate these agents. The data that is available in this appendix is directly taken from the meta-data invocation of the respective RA. Find all these agents in /usr/lib/ocf/resource.d/heartbeat/.
Page 212 ocf:anything_ra (7) ocf:anything_ra — anything Synopsis OCF_RESKEY_binfile=string [OCF_RESKEY_cmdline_options=string] [OCF_RESKEY_pidfile=string] [OCF_RESKEY_logfile=string] [OCF_RESKEY_errlogfile=string] [OCF_RESKEY_user=string] [OCF_RESKEY_monitor_hook=string] anything_ra [start | stop | monitor | meta-data | validate-all] Description This is a generic OCF RA to manage almost anything. Supported Parameters OCF_RESKEY_binfile=Full path name of the binary to be executed The full name of the binary to be executed.
Page 213 OCF_RESKEY_user=User to run the command as User to run the command as OCF_RESKEY_monitor_hook=Command to run in monitor operation Command to run in monitor operation HA OCF Agents...
Page 214 ocf:apache (7) ocf:apache — Apache web server Synopsis OCF_RESKEY_configfile=string [OCF_RESKEY_httpd=string] [OCF_RESKEY_port=integer] [OCF_RESKEY_statusurl=string] [OCF_RESKEY_testregex=string] [OCF_RESKEY_options=string] [OCF_RESKEY_envfiles=string] apache [start | stop | status | monitor | meta-data | validate-all] Description This is the resource agent for the Apache web server. Thie resource agent operates both version 1.x and version 2.x Apache servers.
Page 215 OCF_RESKEY_port=httpd port A port number that we can probe for status information using the statusurl. This will default to the port number found in the configuration file (or 80, if none can be found in the configuration file). OCF_RESKEY_statusurl=url name The URL to monitor (the apache server status page by default).
Page 216 ocf:AudibleAlarm (7) ocf:AudibleAlarm — AudibleAlarm resource agent Synopsis [OCF_RESKEY_nodelist=string] AudibleAlarm [start | stop | restart | status | monitor | meta-data | validate-all] Description Resource script for AudibleAlarm. It sets an audible alarm running by beeping at a set interval. Supported Parameters OCF_RESKEY_nodelist=Node list The node list that should never sound the alarm.
Page 217 ocf:ClusterMon (7) ocf:ClusterMon — ClusterMon resource agent Synopsis [OCF_RESKEY_user=string] [OCF_RESKEY_update=integer] [OCF_RESKEY_extra_options=string] OCF_RESKEY_pidfile=string OCF_RESKEY_htmlfile=string ClusterMon [start | stop | monitor | meta-data | validate-all] Description This is a ClusterMon Resource Agent. It outputs current cluster status to the html. Supported Parameters OCF_RESKEY_user=The user we want to run crm_mon as The user we want to run crm_mon as OCF_RESKEY_update=Update interval...
Page 218 ocf:db2 (7) ocf:db2 — db2 resource agent Synopsis [OCF_RESKEY_instance=string] [OCF_RESKEY_admin=string] db2 [start | stop | status | monitor | validate-all | meta-data | methods] Description Resource script for db2. It manages a DB2 Universal Database instance as an HA re- source.
Page 219 ocf:Delay (7) ocf:Delay — Delay resource agent Synopsis [OCF_RESKEY_startdelay=integer] [OCF_RESKEY_stopdelay=integer] [OCF_RESKEY_mondelay=integer] Delay [start | stop | status | monitor | meta-data | validate-all] Description This script is a test resource for introducing delay. Supported Parameters OCF_RESKEY_startdelay=Start delay How long in seconds to delay on start operation. OCF_RESKEY_stopdelay=Stop delay How long in seconds to delay on stop operation.
Page 220 ocf:drbd (7) ocf:drbd — This resource agent manages a Distributed Replicated Block Device (DRBD) object as a master/slave resource. DRBD is a mechanism for replicating storage; please see the documentation for setup details. Synopsis OCF_RESKEY_drbd_resource=string [OCF_RESKEY_drbdconf=string] [OCF_RESKEY_clone_overrides_hostname=boolean] [OCF_RESKEY_clone_max=integer] [OCF_RESKEY_clone_node_max=integer] [OCF_RESKEY_master_max=integer] [OCF_RESKEY_master_node_max=in- teger] drbd [start | promote | demote | notify | stop | monitor | monitor | meta-data | validate-all] Description...
Page 221 OCF_RESKEY_clone_node_max=Number of nodes Clones per node. Do not modify the default. OCF_RESKEY_master_max=Number of primaries Maximum number of active primaries. Do not modify the default. OCF_RESKEY_master_node_max=Number of primaries per node Maximum number of primaries per node. Do not modify the default. HA OCF Agents...
Page 222 ocf:Dummy (7) ocf:Dummy — Dummy resource agent Synopsis OCF_RESKEY_state=string Dummy [start | stop | monitor | reload | migrate_to | mi- grate_from | meta-data | validate-all] Description This is a Dummy Resource Agent. It has no purpuse other than to keep track of whether its running or not.
Page 223 ocf:eDir88 (7) ocf:eDir88 — eDirectory resource agent Synopsis OCF_RESKEY_eDir_config_file=string [OCF_RESKEY_eDir_monitor_ldap=boolean] [OCF_RESKEY_eDir_monitor_idm=boolean] [OCF_RESKEY_eDir_jvm_initial_heap=integer] [OCF_RESKEY_eDir_jvm_max_heap=integer] [OCF_RESKEY_eDir_jvm_options=string] eDir88 [start | stop | monitor | meta- data | validate-all] Description Resource script for managing an eDirectory instance. Manages a single instance of eDirectory as an HA resource. The "multiple instances" feature or eDirectory has been added in version 8.8.
Page 224 Supported Parameters OCF_RESKEY_eDir_config_file=eDir config file Path to the configuration file for an eDirectory instance. OCF_RESKEY_eDir_monitor_ldap=eDir monitor ldap Should we monitor if LDAP is running for the eDirectory instance? OCF_RESKEY_eDir_monitor_idm=eDir monitor IDM Should we monitor if IDM is running for the eDirectory instance? OCF_RESKEY_eDir_jvm_initial_heap=DHOST_INITIAL_HEAP value Value for the DHOST_INITIAL_HEAP java environment variable.
Page 225 ocf:Filesystem (7) ocf:Filesystem — Filesystem resource agent Synopsis [OCF_RESKEY_device=string] [OCF_RESKEY_directory=string] [OCF_RESKEY_fstype=string] [OCF_RESKEY_options=string] Filesystem [start | stop | notify | monitor | validate-all | meta-data] Description Resource script for Filesystem. It manages a Filesystem on a shared storage medium. Supported Parameters OCF_RESKEY_device=block device The name of block device for the filesystem, or -U, -L options for mount, or NFS mount specification.
Page 226 ocf:ICP (7) ocf:ICP — ICP resource agent Synopsis [OCF_RESKEY_driveid=string] [OCF_RESKEY_device=string] ICP [start | stop | status | monitor | validate-all | meta-data] Description Resource script for ICP. It Manages an ICP Vortex clustered host drive as an HA re- source. Supported Parameters OCF_RESKEY_driveid=ICP cluster drive ID The ICP cluster drive ID.
Page 227 ocf:ids (7) ocf:ids — OCF resource agent for the IBM's database server called Informix Dynamic Server (IDS) Synopsis [OCF_RESKEY_informixdir=string] [OCF_RESKEY_informixserver=string] [OCF_RESKEY_onconfig=string] [OCF_RESKEY_dbname=string] [OCF_RESKEY_sqltestquery=string] ids [start | stop | status | monitor | validate- all | meta-data | methods | usage] Description OCF resource agent to manage an IBM Informix Dynamic Server (IDS) instance as a High-Availability resource.
Page 228 at '/etc/'. If this parameter is unspecified the script will try to get the value from the shell environment. OCF_RESKEY_dbname= database to use for monitoring, defaults to 'sysmaster' This parameter defines which database to use in order to monitor the IDS instance. If this parameter is unspecified the script will use the 'sysmaster' database as a de- fault.
Page 229 ocf:IPaddr2 (7) ocf:IPaddr2 — Manages virtual IPv4 addresses Synopsis OCF_RESKEY_ip=string [OCF_RESKEY_nic=string] [OCF_RESKEY_cidr_netmask=string] [OCF_RESKEY_broadcast=string] [OCF_RESKEY_iflabel=string] [OCF_RESKEY_lvs_support=boolean] [OCF_RESKEY_mac=string] [OCF_RESKEY_clusterip_hash=string] [OCF_RESKEY_unique_clone_address=boolean] [OCF_RESKEY_arp_interval=integer] [OCF_RESKEY_arp_count=integer] [OCF_RESKEY_arp_bg=string] [OCF_RESKEY_arp_mac=string] IPaddr2 [start | stop | status | monitor | meta-data | validate-all] Description This Linux-specific resource manages IP alias IP addresses. It can add or remove an IP alias.
Page 230 OCF_RESKEY_cidr_netmask=CIDR netmask The netmask for the interface in CIDR format (e.g., 24 and not 255.255.255.0) If unspecified, the script will try to determine this from the routing table. OCF_RESKEY_broadcast=Broadcast address Broadcast address associated with the IP. If left empty, the script will determine this from the netmask.
Page 231 OCF_RESKEY_arp_mac=ARP MAC MAC address to send the ARP packets to. For advanced users only. HA OCF Agents...
Page 232 ocf:IPaddr (7) ocf:IPaddr — Manages virtual IPv4 addresses Synopsis OCF_RESKEY_ip=string [OCF_RESKEY_nic=string] [OCF_RESKEY_cidr_netmask=string] [OCF_RESKEY_broadcast=string] [OCF_RESKEY_iflabel=string] [OCF_RESKEY_lvs_support=boolean] [OCF_RESKEY_local_stop_script=string] [OCF_RESKEY_local_start_script=string] [OCF_RESKEY_ARP_INTERVAL_MS=integer] [OCF_RESKEY_ARP_REPEAT=in- teger] [OCF_RESKEY_ARP_BACKGROUND=boolean] [OCF_RESKEY_ARP_NETMASK=string] IPaddr [start | stop | monitor | validate-all | meta-data] Description This script manages IP alias IP addresses. It can add an IP alias, or remove one. Supported Parameters OCF_RESKEY_ip=IPv4 address The IPv4 address to be configured in dotted quad notation, for example...
Page 233 OCF_RESKEY_cidr_netmask=Netmask The netmask for the interface in CIDR format. (ie, 24), or in dotted quad notation 255.255.255.0). If unspecified, the script will try to determine this from the routing table. OCF_RESKEY_broadcast=Broadcast address Broadcast address associated with the IP. If left empty, the script will determine this from the netmask.
Page 234 ocf:IPsrcaddr (7) ocf:IPsrcaddr — IPsrcaddr resource agent Synopsis [OCF_RESKEY_ipaddress=string] IPsrcaddr [start | stop | stop | monitor | vali- date-all | meta-data] Description Resource script for IPsrcaddr. It manages the preferred source address modification. Supported Parameters OCF_RESKEY_ipaddress=IP address The IP address. High Availability Guide...
Page 235 ocf:IPv6addr (7) ocf:IPv6addr — manages IPv6 alias Synopsis [OCF_RESKEY_ipv6addr=string] IPv6addr [start | stop | status | monitor | validate- all | meta-data] Description This script manages IPv6 alias IPv6 addresses. It can add or remove an IP6 alias. Supported Parameters OCF_RESKEY_ipv6addr=IPv6 address The IPv6 address this RA will manage HA OCF Agents...
Page 236 ocf:iscsi (7) ocf:iscsi — iscsi resource agent Synopsis [OCF_RESKEY_portal=string] OCF_RESKEY_target=string [OCF_RESKEY_discovery_type=string] [OCF_RESKEY_iscsiadm=string] [OCF_RESKEY_udev=string] iscsi [start | stop | status | monitor | validate-all | methods | meta-data] Description OCF Resource Agent for iSCSI. Add (start) or remove (stop) iSCSI targets. Supported Parameters OCF_RESKEY_portal=portal The iSCSI portal address in the form: {ip_address|hostname}[":"port]...
Page 237 ocf:Ldirectord (7) ocf:Ldirectord — Wrapper OCF Resource Agent for ldirectord Synopsis OCF_RESKEY_configfile=string [OCF_RESKEY_ldirectord=string] Ldirectord [start | stop | monitor | meta-data | validate-all] Description It's a simple OCF RA wrapper for ldirectord and uses the ldirectord interface to create the OCF compliant interface. You win monitoring of ldirectord. Be warned: Asking ldirectord status is an expensive action.
Page 238 ocf:LinuxSCSI (7) ocf:LinuxSCSI — LinuxSCSI resource agent Synopsis [OCF_RESKEY_scsi=string] LinuxSCSI [start | stop | methods | status | monitor | meta-data | validate-all] Description This is a resource agent for LinuxSCSI. It manages the availability of a SCSI device from the point of view of the linux kernel. It make Linux believe the device is absent, and it can make it come back again.
Page 239 ocf:LVM (7) ocf:LVM — LVM resource agent Synopsis [OCF_RESKEY_volgrpname=string] LVM [start | stop | status | monitor | methods | meta-data | validate-all] Description Resource script for LVM. It manages a Linux Volume Manager volume (LVM) as an HA resource. Supported Parameters OCF_RESKEY_volgrpname=Volume group name The name of a volume group.
Page 240 ocf:MailTo (7) ocf:MailTo — MailTo resource agent Synopsis [OCF_RESKEY_email=string] [OCF_RESKEY_subject=string] MailTo [start | stop | status | monitor | meta-data | validate-all] Description This is a resource agent for MailTo. It sends email to a sysadmin whenever a takeover occurs. Supported Parameters OCF_RESKEY_email=Email address The email address of sysadmin.
Page 241 ocf:ManageRAID (7) ocf:ManageRAID — Manages RAID devices Synopsis [OCF_RESKEY_raidname=string] ManageRAID [start | stop | status | monitor | validate-all | meta-data] Description Manages starting, stopping and monitoring of RAID devices which are preconfigured in /etc/conf.d/HB-ManageRAID. Supported Parameters OCF_RESKEY_raidname=RAID name Name (case sensitive) of RAID to manage. (preconfigured in /etc/conf.d/HB- ManageRAID) HA OCF Agents...
Page 242 ocf:ManageVE (7) ocf:ManageVE — OpenVZ VE resource agent Synopsis [OCF_RESKEY_veid=integer] ManageVE [start | stop | status | monitor | validate-all | meta-data] Description This OCF complaint resource agent manages OpenVZ VEs and thus requires a proper OpenVZ installation, including a recent vzctl util. Supported Parameters OCF_RESKEY_veid=OpenVZ ID of VE OpenVZ ID of virtual environment (see output of vzlist -a for all assigned IDs)
Page 243 ocf:mysql (7) ocf:mysql — MySQL resource agent Synopsis [OCF_RESKEY_binary=string] [OCF_RESKEY_config=string] [OCF_RESKEY_datadir=string] [OCF_RESKEY_user=string] [OCF_RESKEY_group=string] [OCF_RESKEY_log=string] [OCF_RESKEY_pid=string] [OCF_RESKEY_socket=string] [OCF_RESKEY_test_table=string] [OCF_RESKEY_test_user=string] [OCF_RESKEY_test_passwd=string] [OCF_RESKEY_enable_creation=in- teger] [OCF_RESKEY_additional_parameters=integer] mysql [start | stop | status | monitor | validate-all | meta-data] Description Resource script for MySQL. It manages a MySQL Database instance as an HA resource. Supported Parameters OCF_RESKEY_binary=MySQL binary Location of the MySQL binary...
Page 244 OCF_RESKEY_log=MySQL log file The logfile to be used for mysqld. OCF_RESKEY_pid=MySQL pid file The pidfile to be used for mysqld. OCF_RESKEY_socket=MySQL socket The socket to be used for mysqld. OCF_RESKEY_test_table=MySQL test table Table to be tested in monitor statement (in database.table notation) OCF_RESKEY_test_user=MySQL test user MySQL test user OCF_RESKEY_test_passwd=MySQL test user password...
Page 245 ocf:nfsserver (7) ocf:nfsserver — nfsserver Synopsis [OCF_RESKEY_nfs_init_script=string] [OCF_RESKEY_nfs_notify_cmd=string] [OCF_RESKEY_nfs_shared_infodir=string] [OCF_RESKEY_nfs_ip=string] nfsserver [start | stop | monitor | meta-data | validate-all] Description Nfsserver helps to manage the Linux nfs server as a failover-able resource in Linux- HA. It is dependent on Linux-specific NFS implementation details, so is considered not portable to other platforms yet.
Page 246 OCF_RESKEY_nfs_ip= IP address. The floating IP address used to access the the nfs service High Availability Guide...
Page 247 ocf:oracle (7) ocf:oracle — oracle resource agent Synopsis OCF_RESKEY_sid=string [OCF_RESKEY_home=string] [OCF_RESKEY_user=string] [OCF_RESKEY_ipcrm=string] [OCF_RESKEY_clear_backupmode=boolean] [OCF_RESKEY_shutdown_method=string] oracle [start | stop | status | monitor | validate-all | methods | meta-data] Description Resource script for oracle. Manages an Oracle Database instance as an HA resource. Supported Parameters OCF_RESKEY_sid=sid The Oracle SID (aka ORACLE_SID).
Page 248 What we use here is the "oradebug" feature and its "ipc" trace utility. It is not optimal to parse the debugging information, but this might be the only way to find out about the IPC information. In case the format or wording of the trace report changes, parsing might fail.
Page 249 ocf:oralsnr (7) ocf:oralsnr — oralsnr resource agent Synopsis OCF_RESKEY_sid=string [OCF_RESKEY_home=string] [OCF_RESKEY_user=string] OCF_RESKEY_listener=string oralsnr [start | stop | status | monitor | validate-all | meta-data | methods] Description Resource script for Oracle Listener. It manages an Oracle Listener instance as an HA resource.
Page 250 ocf:pgsql (7) ocf:pgsql — pgsql resource agent Synopsis [OCF_RESKEY_pgctl=string] [OCF_RESKEY_start_opt=string] [OCF_RESKEY_ctl_opt=string] [OCF_RESKEY_psql=string] [OCF_RESKEY_pgdata=string] [OCF_RESKEY_pgdba=string] [OCF_RESKEY_pghost=string] [OCF_RESKEY_pgport=string] [OCF_RESKEY_pgdb=string] [OCF_RESKEY_logfile=string] [OCF_RESKEY_stop_escalate=string] pgsql [start | stop | status | monitor | meta-data | validate-all | methods] Description Resource script for PostgreSQL. It manages a PostgreSQL as an HA resource. Supported Parameters OCF_RESKEY_pgctl=pgctl Path to pg_ctl command.
Page 251 OCF_RESKEY_pgdba=pgdba User that owns PostgreSQL. OCF_RESKEY_pghost=pghost Hostname/IP Address where PosrgeSQL is listening OCF_RESKEY_pgport=pgport Port where PosrgeSQL is listening OCF_RESKEY_pgdb=pgdb Database that will be used for monitoring. OCF_RESKEY_logfile=logfile Path to PostgreSQL server log output file. OCF_RESKEY_stop_escalate=stop escalation Number of retries (using -m fast) before resorting to -m immediate HA OCF Agents...
Page 252 ocf:pingd (7) ocf:pingd — pingd resource agent Synopsis [OCF_RESKEY_pidfile=string] [OCF_RESKEY_user=string] [OCF_RESKEY_dampen=integer] [OCF_RESKEY_set=integer] [OCF_RESKEY_name=integer] [OCF_RESKEY_section=integer] [OCF_RESKEY_multiplier=integer] [OCF_RESKEY_host_list=integer] pingd [start | stop | monitor | meta-data | validate-all] Description This is a pingd Resource Agent. It records (in the CIB) the current number of ping nodes a node can connect to.
Page 253 OCF_RESKEY_section=Section name The section to place the value in. Rarely needs to be specified. OCF_RESKEY_multiplier=Value multiplier The number by which to multiply the number of connected ping nodes. OCF_RESKEY_host_list=Host list The list of ping nodes to count. Defaults to all configured ping nodes. Rarely needs to be specified.
Page 254 ocf:portblock (7) ocf:portblock — portblock resource agent Synopsis [OCF_RESKEY_protocol=string] [OCF_RESKEY_portno=integer] [OCF_RESKEY_action=string] portblock [start | stop | status | monitor | meta- data | validate-all] Description Resource script for portblock. It is used to temporarily block ports using iptables. Supported Parameters OCF_RESKEY_protocol=protocol The used protocol to be blocked/unblocked.
Page 255 ocf:Pure-FTPd (7) ocf:Pure-FTPd — OCF Resource Agent compliant FTP script. Synopsis OCF_RESKEY_script=string OCF_RESKEY_conffile=string OCF_RESKEY_daemon_type=string [OCF_RESKEY_pidfile=string] Pure-FTPd [start | stop | monitor | validate-all | meta-data] Description This script manages Pure-FTPd in an Active-Passive setup Supported Parameters OCF_RESKEY_script=Script name with full path The full path to the Pure-FTPd startup script.
Page 256 ocf:Raid1 (7) ocf:Raid1 — RAID1 resource agent Synopsis [OCF_RESKEY_raidconf=string] [OCF_RESKEY_raiddev=string] [OCF_RESKEY_homehost=string] Raid1 [start | stop | status | monitor | validate- all | meta-data] Description Resource script for RAID1. It manages a software Raid1 device on a shared storage medium. Supported Parameters OCF_RESKEY_raidconf=RAID config file The RAID configuration file.
Page 257 ocf:Route (7) ocf:Route — Manages network routes Synopsis OCF_RESKEY_destination=string OCF_RESKEY_device=string OCF_RESKEY_gateway=string OCF_RESKEY_source=string Route [start | stop | monitor | reload | meta-data | validate-all] Description Enables and disables network routes. Supports host and net routes, routes via a gateway address, and routes using specific source addresses. This resource agent is useful if a node's routing table needs to be manipulated based on node role assignment.
Page 258 OCF_RESKEY_gateway=Gateway IP address The gateway IP address to use for this route. OCF_RESKEY_source=Source IP address The source IP address to be configured for the route. High Availability Guide...
Page 259 ocf:rsyncd (7) ocf:rsyncd — OCF Resource Agent compliant rsync daemon script. Synopsis [OCF_RESKEY_binpath=string] [OCF_RESKEY_conffile=string] [OCF_RESKEY_bwlimit=string] rsyncd [start | stop | monitor | validate-all | meta- data] Description This script manages rsync daemon Supported Parameters OCF_RESKEY_binpath=Full path to the rsync binary The rsync binary path.
Page 260 ocf:SAPDatabase (7) ocf:SAPDatabase — SAP database resource agent Synopsis OCF_RESKEY_SID=string OCF_RESKEY_DIR_EXECUTABLE=string OCF_RESKEY_DBTYPE=string OCF_RESKEY_NETSERVICENAME=string OCF_RESKEY_DBJ2EE_ONLY=boolean OCF_RESKEY_JAVA_HOME=string OCF_RESKEY_STRICT_MONITORING=boolean OCF_RESKEY_AUTOMATIC_RECOVER=boolean OCF_RESKEY_DIR_BOOTSTRAP=string OCF_RESKEY_DIR_SECSTORE=string OCF_RESKEY_DB_JARS=string OCF_RESKEY_PRE_START_USEREXIT=string OCF_RESKEY_POST_START_USEREXIT=string OCF_RESKEY_PRE_STOP_USEREXIT=string OCF_RESKEY_POST_STOP_USEREXIT=string SAPDatabase [start | stop | status | monitor | validate-all | meta-data | methods] Description Resource script for SAP databases. It manages a SAP database of any type as an HA resource.
Page 261 OCF_RESKEY_NETSERVICENAME=listener name The Oracle TNS listener name. OCF_RESKEY_DBJ2EE_ONLY=only JAVA stack installed If you do not have a ABAP stack installed in the SAP database, set this to TRUE OCF_RESKEY_JAVA_HOME=Path to Java SDK This is only needed if the DBJ2EE_ONLY parameter is set to true. Enter the path to the Java SDK which is used by the SAP WebAS Java OCF_RESKEY_STRICT_MONITORING=Activates application level monitoring This controls how the resource agent monitors the database.
Page 262 OCF_RESKEY_PRE_STOP_USEREXIT=path to a pre-start script The fully qualified path where to find a script or program which should be executed before this resource gets stopped. OCF_RESKEY_POST_STOP_USEREXIT=path to a post-start script The fully qualified path where to find a script or program which should be executed after this resource got stopped.
Page 263 ocf:SAPInstance (7) ocf:SAPInstance — SAP instance resource agent Synopsis OCF_RESKEY_InstanceName=string OCF_RESKEY_DIR_EXECUTABLE=string OCF_RESKEY_DIR_PROFILE=string OCF_RESKEY_START_PROFILE=string OCF_RESKEY_START_WAITTIME=string OCF_RESKEY_AUTOMATIC_RECOVER=boolean OCF_RESKEY_PRE_START_USEREXIT=string OCF_RESKEY_POST_START_USEREXIT=string OCF_RESKEY_PRE_STOP_USEREXIT=string OCF_RESKEY_POST_STOP_USEREXIT=string SAPInstance [start | recover | stop | status | monitor | validate-all | meta-data | methods] Description Resource script for SAP. It manages a SAP Instance as an HA resource. Supported Parameters OCF_RESKEY_InstanceName=instance name: SID_INSTANCE_VIR-HOSTNAME The fully qualified SAP instance name.
Page 264 OCF_RESKEY_START_WAITTIME=Check the successful start after that time (do not wait for J2EE-Addin) After that time in seconds a monitor operation is executed by the resource agent. Does the monitor return SUCCESS, the start is handled as SUCCESS. This is useful to resolve timing problems with e.g. the J2EE-Addin instance. OCF_RESKEY_AUTOMATIC_RECOVER=Enable or disable automatic startup recovery The SAPInstance resource agent tries to recover a failed start attempt automaticaly one time.
Page 265 ocf:scsi2reserve (7) ocf:scsi2reserve — scsi-2 reservation Synopsis [OCF_RESKEY_scsi_reserve=string] [OCF_RESKEY_sharedisk=string] [OCF_RESKEY_start_loop=string] scsi2reserve [start | stop | monitor | meta- data | validate-all] Description The scsi-2-reserve resource agent is a place holder for SCSI-2 reservation. A healthy instance of scsi-2-reserve resource, indicates the own of the specified SCSI device. This resource agent depends on the scsi_reserve from scsires package, which is Linux- specific.
Page 266 ocf:SendArp (7) ocf:SendArp — SendArp resource agent Synopsis [OCF_RESKEY_ip=string] [OCF_RESKEY_nic=string] SendArp [start | stop | monitor | meta-data | validate-all] Description This script send out gratuitous Arp for an IP address Supported Parameters OCF_RESKEY_ip=IP address The IP address for sending arp package. OCF_RESKEY_nic=NIC The nic for sending arp package.
Page 267 ocf:ServeRAID (7) ocf:ServeRAID — ServeRAID resource agent Synopsis [OCF_RESKEY_serveraid=integer] [OCF_RESKEY_mergegroup=integer] ServeRAID [start | stop | status | monitor | validate-all | meta-data | methods] Description Resource script for ServeRAID. It enables/disables shared ServeRAID merge groups. Supported Parameters OCF_RESKEY_serveraid=serveraid The adapter number of the ServeRAID adapter. OCF_RESKEY_mergegroup=mergegroup The logical drive under consideration.
Page 268 ocf:sfex (7) ocf:sfex — SF-EX resource agent Synopsis [OCF_RESKEY_device=string] [OCF_RESKEY_index=integer] [OCF_RESKEY_collision_timeout=integer] [OCF_RESKEY_monitor_interval=integer] [OCF_RESKEY_lock_timeout=integer] sfex [start | stop | monitor | meta-data] Description Resource script for SF-EX. It manages a shared storage medium exclusively . Supported Parameters OCF_RESKEY_device=block device Block device path that stores exclusive control data. OCF_RESKEY_index=index Location in block device where exclusive control data is stored.
Page 269 ocf:SphinxSearchDaemon (7) ocf:SphinxSearchDaemon — searchd resource agent Synopsis OCF_RESKEY_config=string [OCF_RESKEY_searchd=string] [OCF_RESKEY_search=string] [OCF_RESKEY_testQuery=string] SphinxSearchDaemon [start | stop | monitor | meta-data | validate-all] Description This is a searchd Resource Agent. It manages the Sphinx Search Daemon. Supported Parameters OCF_RESKEY_config=Configuration file searchd configuration file OCF_RESKEY_searchd=searchd binary searchd binary OCF_RESKEY_search=search binary...
Page 270 ocf:Squid (7) ocf:Squid — The RA of Squid Synopsis [OCF_RESKEY_squid_exe=string] OCF_RESKEY_squid_conf=string OCF_RESKEY_squid_pidfile=string OCF_RESKEY_squid_port=integer [OCF_RESKEY_squid_stop_timeout=integer] [OCF_RESKEY_debug_mode=string] [OCF_RESKEY_debug_log=string] Squid [start | stop | status | monitor | meta-data | validate-all] Description The resource agent of Squid. This manages a Squid instance as an HA resource. Supported Parameters OCF_RESKEY_squid_exe=Executable file This is a required parameter.
Page 271 OCF_RESKEY_squid_stop_timeout=Number of seconds to await to confirm a normal stop method This is an omittable parameter. On a stop action, a normal stop method is initially used. and then the confirmation of its completion is awaited for the specified seconds by this parameter.
Page 272 ocf:Stateful (7) ocf:Stateful — Example stateful resource agent Synopsis OCF_RESKEY_state=string Stateful [start | stop | monitor | meta-data | validate- all] Description This is an example resource agent that implements two states Supported Parameters OCF_RESKEY_state=State file Location to store the resource state in High Availability Guide...
Page 273 ocf:SysInfo (7) ocf:SysInfo — SysInfo resource agent Synopsis [OCF_RESKEY_pidfile=string] [OCF_RESKEY_delay=string] SysInfo [start | stop | monitor | meta-data | validate-all] Description This is a SysInfo Resource Agent. It records (in the CIB) various attributes of a node Sample Linux output: arch: i686 os: Linux-2.4.26-gentoo-r14 free_swap: 1999 cpu_info: Intel(R) Celeron(R) CPU 2.40GHz cpu_speed: 4771.02 cpu_cores: 1 cpu_load: 0.00 ram_total: 513 ram_free: 117 root_free: 2.4 Sample Darwin output: arch: i386 os: Darwin-8.6.2 cpu_info: Intel Core Duo cpu_speed: 2.16 cpu_cores: 2 cpu_load: 0.18...
Page 274 ocf:tomcat (7) ocf:tomcat — tomcat resource agent Synopsis OCF_RESKEY_tomcat_name=string OCF_RESKEY_script_log=string [OCF_RESKEY_tomcat_stop_timeout=integer] [OCF_RESKEY_tomcat_suspend_trialcount=integer] [OCF_RESKEY_tomcat_user=string] [OCF_RESKEY_statusurl=string] [OCF_RESKEY_java_home=string] OCF_RESKEY_catalina_home=string OCF_RESKEY_catalina_pid=string [OCF_RESKEY_tomcat_start_opts=string] [OCF_RESKEY_catalina_opts=string] [OCF_RESKEY_catalina_rotate_log=string] [OCF_RESKEY_catalina_rotatetime=integer] tomcat [start | stop | status | monitor | meta-data | validate-all] Description Resource script for tomcat. It manages a Tomcat instance as an HA resource. Supported Parameters OCF_RESKEY_tomcat_name=The name of the resource The name of the resource...
Page 275 OCF_RESKEY_tomcat_suspend_trialcount=The re-try number of times awaiting a stop The re-try number of times awaiting a stop OCF_RESKEY_tomcat_user=A user name to start a resource A user name to start a resource OCF_RESKEY_statusurl=URL for state confirmation URL for state confirmation OCF_RESKEY_java_home=Home directory of the Java Home directory of the Java OCF_RESKEY_catalina_home=Home directory of Tomcat Home directory of Tomcat...
Page 276 ocf:VIPArip (7) ocf:VIPArip — Virtual IP Address by RIP2 protocol Synopsis OCF_RESKEY_ip=string [OCF_RESKEY_nic=string] VIPArip [start | stop | monitor | validate-all | meta-data] Description Virtual IP Address by RIP2 protocol. This script manages an IP alias in a different subnet with quagga/ripd. It can add or remove an IP alias. Supported Parameters OCF_RESKEY_ip=The IP address in different subnet The IPv4 address in different subnet, for example "192.168.1.1".
Page 277 ocf:VirtualDomain (7) ocf:VirtualDomain — Manages virtual domains Synopsis OCF_RESKEY_config=string [OCF_RESKEY_hypervisor=string] [OCF_RESKEY_force_stop=boolean] [OCF_RESKEY_migration_transport=string] [OCF_RESKEY_monitor_scripts=string] VirtualDomain [start | stop | status | monitor | migrate_from | migrate_to | meta-data | validate-all] Description Resource agent for a virtual domain (a.k.a. domU, virtual machine, virtual environment etc., depending on context) managed by libvirtd.
Page 278 OCF_RESKEY_monitor_scripts=space-separated list of monitor scripts To additionally monitor services within the virtual domain, add this parameter with a list of scripts to monitor. Note: when monitor scripts are used, the start and mi- grate_from operations will complete only when all monitor scripts have completed successfully.
Page 279 ocf:WAS6 (7) ocf:WAS6 — WAS6 resource agent Synopsis [OCF_RESKEY_profile=string] WAS6 [start | stop | status | monitor | validate-all | meta-data | methods] Description Resource script for WAS6. It manages a Websphere Application Server (WAS6) as an HA resource. Supported Parameters OCF_RESKEY_profile=profile name The WAS profile name.
Page 280 ocf:WAS (7) ocf:WAS — WAS resource agent Synopsis [OCF_RESKEY_config=string] [OCF_RESKEY_port=integer] WAS [start | stop | status | monitor | validate-all | meta-data | methods] Description Resource script for WAS. It manages a Websphere Application Server (WAS) as an HA resource. Supported Parameters OCF_RESKEY_config=configration file The WAS-configuration file.
Page 281 ocf:WinPopup (7) ocf:WinPopup — WinPopup resource agent Synopsis [OCF_RESKEY_hostfile=string] WinPopup [start | stop | status | monitor | validate- all | meta-data] Description Resource script for WinPopup. It sends WinPopups message to a sysadmin's workstation whenever a takeover occurs. Supported Parameters OCF_RESKEY_hostfile=Host file The file containing the hosts to send WinPopup messages to.
Page 282 ocf:Xen (7) ocf:Xen — Manages Xen DomUs Synopsis [OCF_RESKEY_xmfile=string] [OCF_RESKEY_name=string] [OCF_RESKEY_allow_migrate=boolean] [OCF_RESKEY_shutdown_timeout=boolean] [OCF_RESKEY_allow_mem_management=boolean] [OCF_RESKEY_reserved_Dom0_memory=string] [OCF_RESKEY_monitor_scripts=string] Xen [start | stop | migrate_from | mi- grate_to | monitor | meta-data | validate-all] Description Resource Agent for the Xen Hypervisor. Manages Xen virtual machine instances by mapping cluster resource start and stop, to Xen create and shutdown, respectively.
Page 283 OCF_RESKEY_allow_migrate=Use live migration This bool parameter allows the use of live migration for paravirtual machines. OCF_RESKEY_shutdown_timeout=Shutdown escalation timeout The Xen agent will first try an orderly shutdown, using xm shutdown. Should this not succeed within this timeout, the agent will escalate to xm destroy, forcibly killing the node.
Page 284 ocf:Xinetd (7) ocf:Xinetd — Xinetd resource agent Synopsis [OCF_RESKEY_service=string] Xinetd [start | stop | restart | status | monitor | validate-all | meta-data] Description Resource script for Xinetd. It starts/stops services managed by xinetd. Note that the xinetd daemon itself must be running: the system will not start it or stop it. Important: in case the services managed by the cluster are the only ones enabled, you should specify the -stayalive option for xinetd or it will exit on Heartbeat stop.
Page 285: Part V Appendix
Part V. Appendix...
Page 287: A Gnu Licenses
GNU Licenses This appendix contains the GNU General Public License and the GNU Free Documen- tation License. GNU General Public License Version 2, June 1991 Copyright (C) 1989, 1991 Free Software Foundation, Inc. 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed.
Page 288 GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The “Program”, below, refers to any such program or work, and a “work based on the Program” means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language.
Page 289 You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, dis- tributing or modifying the Program or works based on it.
Page 290: Gnu Free Documentation License
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;...
Page 291 We have designed this License in order to use it for manuals for free software, because free software needs free documentation: a free program should come with manuals providing the same freedoms that the software does. But this License is not limited to software manuals; it can be used for any textual work, regardless of subject matter or whether it is published as a printed book.
Page 292 If the required texts for either cover are too voluminous to fit legibly, you should put the first ones listed (as many as fit reasonably) on the actual cover, and continue the rest onto adjacent pages. If you publish or distribute Opaque copies of the Document numbering more than 100, you must either include a machine-readable Transparent copy along with each Opaque copy, or state in or with each Opaque copy a computer-network location from which the general network-using public has access to download using public-standard network protocols a complete Transparent copy of the Document, free of added material.
Page 293 The author(s) and publisher(s) of the Document do not by this License give permission to use their names for publicity for or to assert or imply endorsement of any Modified Version. COMBINING DOCUMENTS You may combine the Document with other documents released under this License, under the terms defined in section 4 above for modified versions, provided that you include in the combination all of the Invariant Sections of all of the original documents, unmodified, and list them all as Invariant Sections of your combined work in its license notice, and that you preserve all their Warranty Disclaimers.
Page 294 Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 only as published by the Free Software Foundation; with the Invariant Section being this copyright notice and license. A copy of the license is included in the section entitled “GNU Free Documentation License”.
Page 295: Terminology
Terminology active/active, active/passive A concept about how services are running on nodes. An active-passive scenario means that one or more services are running on the active node and the passive nodes waits for the active node to fail. Active-active means that each node is active and passive at the same time.
Page 296 designated coordinator (DC) The “master” node. This node is where the master copy of the CIB is kept. All other nodes get their configuration and resource allocation information from the current DC. The DC is elected from all nodes in the cluster after a membership change.
Page 297 try-restart and reload as well. LSB resource agents are located in /etc/ init.d. Find more information about LSB resource agents and the actual speci- http://www.linux-ha.org/LSBResourceAgent fication at http://www.linux-foundation.org/spec/refspecs/LSB_3.0 .0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html node Any computer (real or virtual) that is a member of a cluster and invisible to the user.
Page 298 resource Any type of service or application that is known to Heartbeat. Examples include an IP address, a file system, or a database. resource agent (RA) A resource agent (RA) is a script acting as a proxy to manage a resource. There are three different kinds of resource agents: OCF (Open Cluster Framework) re- source agents, LSB resource agents (Standard LSB init scripts), and Heartbeat re- source agents (Heartbeat v1 resources).

Novell LINUX ENTERPRISE 11 - HIGH AVAILABILITY Manual

About this Guide

Part I Installation and Setup

1 Conceptual Overview

2 Getting Started

3 Installation and Basic Setup with Yast

Part II Configuration and Administration

4 Configuring Cluster Resources with the GUI

5 Configuring Cluster Resources from Command Line

6 Setting up a Simple Testing Resource

7 Adding or Modifying Resource Agents

8 Fencing and STONITH

9 Load Balancing with Linux Virtual Server

10 Network Device Bonding

1 Updating Your Cluster to SUSE Linux Enterprise

Part III Storage and Data Replication

2 Oracle Cluster File System

3 Cluster LVM

4 Distributed Replicated Block Device (DRBD)

Part IV Troubleshooting and Reference

5 Troubleshooting

6 Cluster Management Tools

7 Cluster Resources

8 HA OCF Agents

Part V Appendix

A GNU Licenses

Terminology

Quick Links

Troubleshooting

Related Manuals for Novell LINUX ENTERPRISE 11 - HIGH AVAILABILITY

Summary of Contents for Novell LINUX ENTERPRISE 11 - HIGH AVAILABILITY

Table of Contents