Sunday, May 31, 2009

Sun Cluster 3.2 Topics for Certification

Section 1: Product Introduction

Describe various aspects of Sun Cluster 3.2 hardware including nodes, storage, and private and public networks
Describe the various layers of cluster software, including the cluster framework, agents, and applications
Describe Solaris 10 cluster-specific features such as zones, ZFS and SMF

Section 2: Plan and Install Sun Cluster 3.2 Software Framework

Configure console connectivity and the cluster console software and describe requirements, installation, and configuration of cluster servers and storage connections
Plan and install software, configure quorum devices, and describe quorum votes and cluster amnesia prevention
Describe the necessary tools, resources, and steps required to install Sun Cluster
Differentiate between alternate methods of configuring Sun cluster software

Section 3: Manage Sun Cluster Framework

Perform basic cluster operations, including cluster node startup and modification of basic cluster properties
Perform more advanced cluster operations, including administering quorum devices, disk paths, and interconnect components
Show how to utilize Sun Cluster Manager, control clusters, and modify private network addresses and netmasks

SECTION 4: Section 4: Perform Volume Management in Sun Cluster Software

Explain and implement Solaris Volume Manager, including describing SVM disk space management and disksets
Manage SVM database replicas, and create local metadbs and shared disksets
Build volumes and disksets, and create local and global filesystems
Explain and implement VERITAS Volume Manager including describing VxVM disk groups, disk organization, and volume requirements in the Sun Cluster software environment
Install and configure VxVM in the Sun Cluster environment, and create shared storage disk groups and volumes with VxVM
Manage VxVM device groups, register disk groups, and build global and local filesystems on VxVM devices
Configure the ZFS volume management and filesystem layers

SECTION 5: Section 5: Configure Applications in Sun Cluster Software

Describe how to utilize IPMP within Sun Cluster software including operation, uses, configuration, and failover/failback
Explain how to define and configure resources and resource groups for failover applications in the cluster
Explain how to configure scalable services and advanced resource group relationships, including the use of SharedAddress and Resource group affinities
Configure Oracle HA and Oracle RAC in Sun Cluster software


Exam type: Multiple choice
Number of questions: 62
Pass score: 66%
Time limit: 105 minutes

Monday, May 4, 2009

Quorum Devices in Sun Cluster

A quorum device is a disk shared by two or more nodes in the cluster to establish a quorum for the cluster to run.  Cluster operates only when a quorum of votes is available. Quorum devices are necessary to protect the cluster from split brain and amnesia situations. Each quorum device must be connected to at least two nodes.

Adding a quorum device automatically configures node-to-device paths for the nodes attached to the device. Later, if we add more nodes to the cluster, we might need to update these paths by removing then adding back the quorum device.

A SCSI quorum device is considered to be any Sun Cluster supported attached storage which connected to two or more nodes of the cluster. Dual-ported SCSI-2 disks may be used as quorum devices in two-node clusters.  However, clusters with more than two nodes require that SCSI-3 PGR disks be used for all disks with more than two node-to-disk paths.  We can use a disk containing user data or one that is a member of device group as a quorum device.

Quorum Arithmetic

Quorum equation states that a cluster must have the total number of configured votes, divided by two ( Remainders are discarded ) plus one.

Q = ( TQV /2 + 1 )

TQV - Total Quorum votes

Key to understanding quorum is learning how votes are assigned and counted. Each node in a configured cluster has one ( 1 ) quorum vote. Each shared storage device configured as quorum device has votes totaling the number of connected devices minus one  QD = ( TQD - 1 ).   For folks who need better understanding like me, it is the number of servers the quorum device is connected minus one.   If the quorum device is connected to 3 servers, then quorum devices’ vote will be two ( 2 ).

Ownership of a quorum device is decided by SCSI reservations which will be dealt at a later post.   Just for better understanding I am now posting couple of outputs which are for two node and three node cluster.  One can do his or her own math to understand the quorum arithmetic disscussed above. 

Two-node cluster

phys-host1 # scstat -q
– Quorum Summary –
  Quorum votes possible:      3
  Quorum votes needed:        2
  Quorum votes present:       3
– Quorum Votes by Node –
                                    Node Name           Present Possible Status
                                   ----———–--          ——–        ---——–   ---——
  Node votes:       phys-host1           1        1       Online
  Node votes:       phys-host2           1        1       Online
– Quorum Votes by Device –
                    Device Name         Present Possible Status
                    ———–         ——- ——– ——
  Device votes:     /dev/did/rdsk/d101s2 1        1       Online

Three-node cluster

phys-host1:> /usr/cluster/bin/scstat -q

– Quorum Summary –
Quorum votes possible: 7
Quorum votes needed: 4
Quorum votes present: 7

– Quorum Votes by Node –
Node Name                            Present Possible  Status
----------                           ------- --------  --------
Node votes: phys-host1                 1     1          Online
Node votes: phys-host2                 1     1          Online
Node votes: phys-host3                 1     1          Online

– Quorum Votes by Device –
Device Name                                                                Present Possible Status
———–                                                                            ——- ——– ——
Device votes: /dev/did/rdsk/d200s2                     2             2          Online
Device votes: /dev/did/rdsk/d199s2                      2             2           Online

Removing and unregistering a Disk group in Sun Cluster

Lets see, how to remove a disk group from Sun Cluster running veritas volume manager. It was a bit surprising for me to see very little of this topic posted in the net, though, I did came across couple of posts for removing device groups with SVM.

Environment is a 3.1 Sun Cluster running VxVM. Normally, one would see this type of a request associated with removal of resource group or migration of applications from the server.

Make sure

a) no filesystems are mounted from the disk group.

b) Entries are not there in vfstab

In order to remove the disk group from the cluster control, you need to bring the device group offline from cluster control and then un-register the disk group from cluster.

1. Make sure no mounted file systems of the disk groups are present.

2. Make sure no entries are present in /etc/vfstab.

3. Make sure the volumes are removed ; # vxedit -g testdg-rf rm

4. Sync the cluster ; # scconf -c -D name=testdg,sync

5. Offline the disk group. ; # scswitch -F -D testrg

6. Un-Register the disk group ; # scconf -r -D name=testdg

7. Once the diskgroup is offline, Sun Cluster deports the disk group. You need to import the dg again and destroy it.

# vxdg import ; # vxdg destroy

Clearing STOP_FAILED FLAG in Sun Cluster

Yesterday, I ran in to an issue. while trying to switch a resource group in Sun Cluster. The switch took a long time to complete, eventually failing.

When I had a look in to the resource status, it was in stop_failed state and the resource group was in the ERROR_STOP_FAILED state.

Needed to clear the status of the resource in order to proceed further with the activity.

1. Get the status of the resource ; #scstat -g

2. clear the resource ; # scswitch -c -h phys-cluster-1 -j webrs -f STOP_FAILED

Once the resource flag was cleared, the RG went to offline mode. Some times it may not be case, one has manually make the RG offline and then switch it back.

# scswitch -F -g webrg

Was then able to failover the resource group.

# scswitch -z -h phys-cluster1 -g webrg


Adding HAStorage Plus resource to Sun Cluster

There is a task to add couple of filesystems to the cluster. When I had a look at the config, I realised that each file system is configured as a hastp resource. This somewhat makes my job easier. I Just have to create the volume, make it cluster aware & enable it on Sun Cluster. Lets see how can we make this happen.

1. Create volume ; # vxassist -g testdg make testvol 25g

2. Make entries in /etc/vfstab

3. Update the cluster ; # scconf -c -D name=testdg, sync

4. Create the filesystem ; # newfs /dev/vx/rdsk/testdg/testvol

5. Register it in sun cluster ; # scrgadm -a -j testvol-res -t SUNW.HAStoragePlus -g test-rg -x FileSystemMountPoints=/global/oracle/testvol -x AffinityOn=True

6. enable the resource ; # scswitch -e -j testvol-res

7. Verify the filesystems mounted.

Business Continuity Volumes

Business continuity volume ( BCV ) is a symmetrix device with special attributes. It can function either as an additional mirror or a separate host addresable volume. Establishing BCV devices as mirror images of active production volumes allows you to run multiple business continuity tasks in parallel.

Principal device, known as the standard device remains online for regular symmetrix operation from the original production server. Each BCV contains a unique host address, making it accessible to a separate backup/recovery server. When you establish a BCV as a mirror of the standard device, the relationship is known as BCV PAIR. One can imagine a BCV device to be a third component of the mirror.

BCV/standard mirror pairs make it practical for to access the copied data on a BCV at any point in time without interfering in business operations. Any time you split one of the BCV’s from the standard device, BCV has data that is available for backup, testing or analysis.

BCV PAIRS

A Business continuance sequence first involves establishing the BCV device as a mirror of the standard symmetrix device. When a BCV is mirrored with the standard device, the BCV is inaccessible to its host. After the BCV becomes synchronized with its paired standard device, we can split the BCV from the standard device at any time, making the BCV accessible again to its host. After the split, the BCV contains valid data and is available for:

* Business continuance tasks through its original device address

* Restoring data to the standard device if there is a loss of data on the standard device

DEVICE GROUPS

The device group is the entity that is used to manage and control BCV pairs. Host SYMAPI database file stores information about the device group, the standard and BCVs contained in it. Information about device pairs are stored in the Symmetrix global memory and can be updated by subsequent establisth or restore operations.

When device groups are formed initially, there is a possibility of standard and BCV devices may have a previous pairing relationship with a device that does not belong to the device group. In that scenario, we can use commands to check the association.

1. Discover the symmetrix ; # symcfg discover

2. Get the symmetrix devices ; # symdev list -sid 0123

3. Get the relationship between standard and BCV ; # symbcv list -sid 0123

Setting up Device Groups

A device group can hold any of the three standard devices .

a) Regular ( non-RDF)

b) RDF1 ( source )

c) RDF2 ( target )

Although a BCV cannot be associated with more than one device group at a time, a BCV can be moved from one group to another without the regard for any device group type. However, the movement of standard devices between device groups is possible only if the source and destination groups are of the same type.

The following sequence creates a device group and adds devices to the group:

1. Create a device group named BCVGRP ; # symdg create BCVGRP -type regular

2. Add a standard device ( ex symm device name 080 ) to the device group BCVGRP on the symmetrix number 0123. A default logical name of the form DEV001 is assigned.

# symld -g BCVGRP -sid 0123 add dev 080

3. Associate a BCV ( symm device name 090 ) to the standard device. A default logical name of BCV001 is assigned.

# symbcv -g BCVGRP -sid 0123 associate dev 090

With this we have created a device group of type Regular and have added Standard and BCV devices to the group. We shall now go ahead and create the pairs.

Creating BCV pairs

To assign a BCV as a mirror of a standard symmetrix device, we can use the following steps.

a) Establish a BCV pair explicitly;

# symmir -g BCVGRP -full establish DEV001 bcv ld BCV001

Here we are explicitly pairing DEV001 with BCV001. Use the -full option to copy the contents completly the very first time.

Once a pairing relationship has been created, Symmetrix keep a record of that relationship. One can alter that record only by performing a subsequent establish, restore or cancel operation.