?

Log in

No account? Create an account

Josh-D. S. Davis

Xaminmo / Omnimax / Max Omni / Mad Scientist / Midnight Shadow / Radiation Master

Previous Entry Share Next Entry
GPFS cheatsheet (General Parallel File System)
Josh 201604 KWP
joshdavis
-----------------------------------------
Top Level GPFS goodies and notes:
-----------------------------------------
   http://techsupport.services.ibm.com/server/gpfs/related
Pick Library:
   http://publib.boulder.ibm.com/clresctr/library/
Pick GA22-7968-02 GPFS V2.3 Concepts, Planning, and Installation Guide:
   http://publib.boulder.ibm.com/epubs/pdf/bl1ins10.pdf
   Chapter 5, Migration, coexistence and compatability.

NOTE: mmexportfs came about in GPFS 2.2.1

NOTE: Sometimes, especially on larger clusters, the mount will have
to wait for mmstartup to propagate.  If automount is set, just give it
a few minutes.  If you try to mount it manually too early, you may get:
   mount: 0506-324 Cannot mount /dev/gpfs on /gpfs: There is an input
   or output error.
-----------------------------------------
Upgrading from GPFS 2.2 to 2.2.1
-----------------------------------------
Make sure all disks are ready and up
   # mmlsdisk
Shutdown apps
Make sure data is backed up
Unmount any gpfs filesystems
Shut down GPFS:
   # mmshutdown -a
Install the new code
Reboot the nodes
Make sure everything works.
After you're 100% sure you won't be reverting:
   # mmchfs   -V
-----------------------------------------
Upgrading from 2.2.1 to 2.3
-----------------------------------------
Make sure all disks are ready and up
   # mmlsdisk
Shutdown apps
Make sure data is backed up
Unmount any gpfs filesystems
Shut down GPFS:
   # mmshutdown -a
Export your nodeset definitions:
   # mmexportfs all -o /tmp/outfile
Delete all nodes from each nodeset
   # mmdelnode -a (nodesetId)
For non SP clusters, delete the MMFS Cluster:
   # mmdelcluster -a
Make sure if there are any new nodes that they are properly attached
If this is an RPD cluster and you're not using VSDs:
   # rmrpdomain gpfsRPD
Uninstall GPFS and install the new version
   # mmcrcluster -C clustername -n NodeFile -p primary -s secondary -A
Import the cluster config
   # mmimportfs all -i /tmp/outfile
Start GPFS
   # mmstartup -a
After you're 100% sure you won't be reverting:
   # mmchfs   -V
-----------------------------
Uninstalling GPFS 2.2
-----------------------------
unmount all gpfs filesystems from all nodes
mmdelfs to delete all GPFSes
mmshutdown -a
installp -u
rm -r /var/mmfs /usr/lpp/mmfs /var/adm/ras/mm*
--------------------------------------------------
Creating GPFS 2.2 and HA cluster on a disk
--------------------------------------------------
# smitty clstart
(info on making the HA cluster not included)
# mmcrcluster -t hacmp -n nodelist -p gandalf -s frodo
# mmlscluster shows it there
# mmconfig -n nodelist -A -C set1 -D /tmp/mmfs -p 80 -V no -U yes
# mmlsconfig
   all looks good
-
# mmcrlv -F- < echo hdisk21::::
If you try to use a preexisting LV, you'll get:
   mmcrfs: 6027-1909 There are no available free disks.
    Disks must be prepared prior to invoking mmcrfs.
    Define the disks using the mmcrnsd command.
If mmcrlv doesn't like what's already there, you'll see:
   mkvg: 0516-1254   changing the pvid in the odm
   0516-1207 mkvg An invalid physical volume identifier has been
     detected on hdisk21
   0516-862 mkvg unable to create volume group
   6027-1306 /usr/lpp/mmfs/bin/mmvsdhelper -c -C -b -n -d hdisk21 -g
      gpfs10vg -l gpfs10lv     failed with return code 1
If you get this:
   The volume group cannot be varied on because there are no good
   copies of the descriptor area.
Then run this:
   # chdev -l hdisk21 -a pv=clear
-
# mmstartup -a
-
# mount /gpfs
If not using disks that GPFS supports persistant reserve,
mount will fail with: 
      GPFS: 6027-701 Some file system data are inaccessible at this time
   Set disk leasing with:
      # mmchconfig useDiskLease=yes
      NOTE: This will disable single node quorum.
There are some ssa fence ID issues I left out since you're Fibre
--------------------------------------------------
Creating GPFS 2.2 on RPD or LC cluster
--------------------------------------------------
For the manual pages:
   # export MANPATH=$MANPATH:/usr/lpp/mmfs/gpfsdocs/man/aix
   # catman -w
-
From every node in the cluster:
   # preprpnode (all nodes, space separated)
-
From one node:
   # mkrpdomain gpfsRPD node1 node2
   # startrpdomain gpfsRPD
-
Wait a few minutes for it to come online
   lsrpdomain on each node will show it's status.
   lsrpnode should show all online when it's done.
-
If the RSCT version doesn't match what's installed:
   # export CT_MANAGEMENT_SCOPE=2
   # runact -c IBM.PeerDomain CompleteMigration Options=0
-
Set up the actual GPFS portion
   # vi nodelist
   node1:manager
   node2:manager
   node3:manager,nonquorum
   node4:client
   node4:client
   # mmcrcluster -t rpd -n nodelist -p primary -s secondary
   # mmconfig -n nodelist -A -C set1 -D /tmp/mmfs -p 80M -V no -U yes
   # for i in 2 3 4 5 6; do
      chdev -l hdisk$i -a pv=clear
      echo hdisk$i:::: | mmcrlv -F-
   done
-
# mmlsgpfsdisk -F
   should show free disks
-
# mmcrfs /gpfs /dev/gpfs -F /var/mmfs/etc/diskdsc -B 512K -C set1 -A yes
-
# mmstartup -a
If you ever get this:
   The current RSCT peer domain node number is 7. GPFS expects 1.
   Thu  9 Dec 16:03:27 2004 runmmfs: 6027-1242 GPFS is waiting for the
   RSCT peer domain node number mismatch to be corrected
   runmmfs: 6027-1127 There is a discrepancy in the RSCT node numbers
   for lc. (or rpd)
Then run these commands:
   # /usr/lpp/mmfs/bin/mmshutdown -a
   # /usr/lpp/mmfs/bin/mmcommon recoverPeerDomain
      I'm not sure, but you might need to stoprpdomain first.
   # /usr/lpp/mmfs/bin/mmstartup -a
-
# mount /gpfs
--------------------------------------------------
Creating GPFS 2.3 cluster on hdisks
--------------------------------------------------
NOTE: cluster.es.server.cfs doesn't necessarily support this
NOTE: nodenames should be what comes back from "hostname" and this
   should have proper name resolution
-
NOTE: gpfs will use RCP, so we should make sure that rsh/rcp work
   see the GPFS manual for this.
-
Create the nodelist file, one hostname per line
   # vi /tmp/nodelist
-
Make the cluster itself
   # /usr/lpp/mmfs/bin/mmcrcluster -n /tmp/nodelist \
      -p primary_node -s secondary_node
-
Set a larger block size if needed
   # mmchcluster maxblocksize=512k
-
Set to autostart, single node quorum, 80M cache, don't verify disks
   # mmconfig -n nodelist -A -p 80M -v no -U yes
   NOTE:  There's a change in single-node quorum called tie-breaker
   disks which I haven't looked into yet.
-
You should see
   mmconfig: Command successfully completed
   mmconfig: 6027-1371 Propagating the changes to all affected nodes.
   This is an asynchronous process
-
At this point, you should be able to start gpfs
   # mmstartup -a
-
-
Now, create your disk list to be used for NSDs.
   # echo hdiskpower10::::1 > /tmp/disklist
NOTE: you can specify primary and secondary nodes if you don't have
fibre to all of the nodes who will access these disks.
-
Create the Network Shared Disk
   # mmcrnsd -F /tmp/disklist
-
List the NSDs to make sure it's there
# mmlsnsd
You should see:
   File system          NSD name       Primary Node         Backup Node
   --------------------------------------------------------------------
   (free disk)          gpfs01nsd
-
Create the file system now
   We're choosing a 256K block size, replicated metadata, one LV
   NOTE: If you want this to automount on boot, use a -A flag as well.
   If this fails about missing nodeset, add -C gpfs1
-
   # mmcrfs $mountpt $fsname -F /tmp/disklist  -B 256K -M 2 -m 1 \
     -n 1 -s balancedRandom
-
List the NSDs again to make sure it shows in use
   # mmlsnsd
You should see:
   File system          NSD name       Primary Node        Backup Node
   -------------------------------------------------------------------
   fsname               gpfs
-
List the disk to make sure it shows ready and available
   # mmlsdisk $fsname
You should see:
   disk   driver  sector  failure  holds      holds
   name   type      size    group  metadata    data   status  avail
   -----  ------ -------  -------  ---------  -----  -------  -----
   gpfs   disk       512        1  yes          yes    ready     up
-
List the filesystem to make sure it exists
   # mmlsfs $fsname
-
Mount the filesystem for the first time
   # mount $mountpt
-
Reboot to make sure everything autostarts
-
To add a new node to GPFS
   Create the node file
      echo nodename:manager:nonquorum
      where nodename is the hostname, manager and nonquorum are 0 or 1
   Add the nodes
      # mmaddnode  -n /tmp/nodefile
-
Other useful commands:
   Debugging:       export DEBUG=1
   Stop/Start:      mmshutdown -a ; mmstartup -a
   Remove a node:   mmdelnode ; mmdelcluster
   Removing a disk: mmdeldisk
   Removing an fs:  mmdelfs
   Removing an NSD: mmdelnsd   (only for free NSDs)
   Ex/Import:       mmexportfs all -o (filename)
                    mmimportfs all -i (filename)
   Logs:            /var/adm/ras/mmfs.log.LATEST
--------------------------------------------------
Creating GPFS 2.3 on Logical Volumes
--------------------------------------------------
This is what I tested in the lab:
-
Make the VG and LV:
   node1# chdev -l hdisk# -a pv=yes
   othernodes# rmdev -l hdisk#; mkdev -l hdisk#
   node1# mkvg -n -f -s pvside -c -x -y name hdisk###
   node1# varyonvg -c 
   node1# mklv -y  -t raw   hdisk###
   othernodes# importvg -y  -c -x hdisk##
   othernodes# varyonvg -c 
   NOTE: I named my LVs jdtest050418a and b on only one system.
   NOTE: It is questionable whether it's OK to have more than 1 LV per
   PV.  It's been made to work, but the documentation seems to indicate
   that you shouldn't do this.
Make the nodelist:
   # vi /tmp/nodelist
   jdtest050418a:spserv.dfw.ibm.com:::1
   jdtest050418b:spserv.dfw.ibm.com:::1
Make the NSDs
   # mmcrnsd -F ./tmp/nodelist
      mmcrnsd: Processing disk jdtest050418a
      mmcrnsd: Processing disk jdtest050418b
      mmcrnsd: 6027-1371 Propagating the changes to all affected nodes.
      This is an asynchronous process.
View the disk descriptor files:
   # cat 1
      # jdtest050418a:spserv.dfw.ibm.com:::1
      gpfs4nsd:::dataAndMetadata:1
      # jdtest050418b:spserv.dfw.ibm.com:::1
      gpfs5nsd:::dataAndMetadata:1
View the NSDs
   # mmlsnsd -a
   File system   Disk name    Primary node             Backup node
   ------------------------------------------------------------------
   homie         gpfs1nsd     spserv.dfw.ibm.com
   (free disk)   gpfs4nsd     spserv.dfw.ibm.com
   (free disk)   gpfs5nsd     spserv.dfw.ibm.com
Create the filesystem:
   # mmcrfs /jdtest050418 jdtest050418 -F 1 -B 256K -M2 -m1 -n 1
   GPFS: 6027-531 The following disks of jdtest050418 will be formatted
   on node spserv.dfw.ibm.com:
      gpfs4nsd: size 81920 KB
      gpfs5nsd: size 81920 KB
   GPFS: 6027-540 Formatting file system ...
   Creating Inode File
   Creating Allocation Maps
   Clearing Inode Allocation Map
   Clearing Block Allocation Map
   Flushing Allocation Maps
   GPFS: 6027-535 Disks up to size 207 MB can be added to this file syst
   GPFS: 6027-572 Completed creation of file system /dev/jdtest050418.
   mmcrfs: 6027-1371 Propagating the changes to all affected nodes.
   This is an asynchronous process.
Mount the filesystem:
   # mount /jdtest050418
Make sure it's there:
   # df -k /jdtest050418
   /dev/jdtest050418   163328    148224   10%    9     1% /jdtest050418
---------------------------