?

Log in

No account? Create an account

Josh-D. S. Davis

Xaminmo / Omnimax / Max Omni / Mad Scientist / Midnight Shadow / Radiation Master

Previous Entry Share Next Entry
DLPAR without RMC
Josh 201604 KWP
joshdavis

DLPAR Without RMC - Unsupported Laboratory Method

It is possible to use DRMGR for DLPAR when rmc is missing.

NOTE: this may fail and if it does, the device ree will be inconsistant and may leave the resource lost until the entire managed system can be power cycled.

NOTE: AIX Support won't help you if you blow your system up with this. They will tell you to power cycle your whole frame. drmgr is not supported for manual use.

Basic syntax for drmgr:

   # drmgr
   -r  remove
   -a  add
   -c  phb|cpu|mem|slot
   -q  quantity for cpu or memory
   -w  wait time (5 mins is common)
   -d  debug level (1 is default 4 is verbose)

Examples include:

   source# drmgr -r -c phb -s phb2 -w 5 -d 1
   source# drmgr -r -c cpu -q 1 -w 5 -d 1
   dest# drmgr -a -c slot -s U1.9-P1-I10 -d 4

Device Tree Correction

Once the remove is done, the device tree must be updated
If DLPAR is not greyed out for an LPAR,
  • Choose Add for the type of resource you moved
  • Choose "Additional Information" button
  • Correct the problems listed there
In general, you'll probably use this instead:
      hmc$ rsthwres -m "CECNAME" -r slot -l U1.9-P1-I10

Example DLPAR, no output, no RMC:

Find the parent of your device
   source# lsdev -Cl cd0 -F parent
   source# lsdev -Cl scsi1 -F parent
Do this until you find the PCI slot
   Note, some 4-port cards have a pci bridge on the card
   You'd go one up from that.  If you go past "pci###" then too far
Make the devices go unavailable:
   source# rmdev -Rl pci13
Release the device back to PHYP
   source# drmgr -r -c slot -s U1.9-P1-I10 -d 5
Clean up your mess in the device tree
   hmc$ rsthwres -m "CECNAME" -r slot -l U1.9-P1-I10
Verify the device shows no LPAR as owner
   hmc$ lshwres -m "CECNAME" -r slot
Tell PHYP to move the device
   hmc$ chhwres -m "CECNAME" -o a -p lparname -r slot -l U1.9-P1-I10 \
      -d 5 -w 5
If it hangs for no RMC at all
   hit ^C to free up.
Tell the target to import the slot
   target# drmgr -a -c slot -s U1.9-P1-I10 -d 4
Tell the target to configure the device drivers
   target# cfgmgr

Here is a full debug output of testing and examples

REMINDER: AIX Support won't help you if you blow your system up with this. They will tell you to power cycle your whole frame. drmgr is not supported for manual use.
MAKE SURE NO ONE IS USING THE SYSTEM!!!!

dlpar3# w
06:49PM   up 10 days,   6:52,  2 users,  load average: 0.29, 0.08, 0.03
User     tty          login@       idle      JCPU      PCPU what
admin    vty0        06:48PM          0        15        15 -ksh
root     pts/0       17Mar05      7days         0         0 -ksh


dlpar3# lsdev -Cl cd0 -F parent
scsi1


dlpar3# lsdev -Cl scsi1 -F parent
pci13


dlpar3# rmdev -Rl pci13
scsi0 Defined
cd0 Defined
rmt0 Defined
scsi1 Defined
pci13 Defined


dlpar3# drmgr -r -c slot -s U1.9-P1-I10 -d 4
validate_input: operation: 2 (1:add,2:remove,4:query)
validate_input: drc_name: U1.9-P1-I10
drslot_chrp_slot: slot_type=1 all slots=0x20042b48

slot name = U1.5-P1-I1 index=0x2018
slot name = U1.5-P1-I2 index=0x2019
slot name = U1.5-P1-I3 index=0x201a
slot name = U1.5-P1-I4 index=0x201b
slot name = U1.5-P1-I5 index=0x201c
slot name = U1.5-P1-I6 index=0x201d
        odm = pci18
        odm = pci19
        odm = ent2
        odm = ent3
        odm = ent4
        odm = ent5
slot name = U1.5-P1-I7 index=0x201f
slot name = U1.5-P1-I8 index=0x2020
slot name = U1.5-P1-I9 index=0x2021
slot name = U1.5-P1-I10 index=0x2023
slot name = U1.5-P1/Z1 index=0x201e
slot name = U1.5-P1/Z2 index=0x2022
slot name = U1.5-P2-I1 index=0x2024
slot name = U1.5-P2-I2 index=0x2025
slot name = U1.5-P2-I3 index=0x2026
slot name = U1.5-P2-I4 index=0x2027
slot name = U1.5-P2-I5 index=0x2028
slot name = U1.5-P2-I6 index=0x2029
slot name = U1.5-P2-I7 index=0x202b
        odm = pci16
        odm = ent1
slot name = U1.5-P2-I8 index=0x202c
slot name = U1.5-P2-I9 index=0x202d
slot name = U1.5-P2-I10 index=0x202f
slot name = U1.5-P2/Z1 index=0x202a
        odm = pci15
        odm = scsi2
slot name = U1.5-P2/Z2 index=0x202e
        odm = pci17
        odm = scsi3
slot name = U1.9-P1-I1 index=0x2000
slot name = U1.9-P1-I2 index=0x2001
slot name = U1.9-P1-I3 index=0x2002
slot name = U1.9-P1-I4 index=0x2003
slot name = U1.9-P1-I5 index=0x2004
slot name = U1.9-P1-I6 index=0x2005
slot name = U1.9-P1-I7 index=0x2007
slot name = U1.9-P1-I8 index=0x2008
slot name = U1.9-P1-I9 index=0x2009
slot name = U1.9-P1-I10 index=0x200b
        odm = pci13
        odm = scsi0
        odm = scsi1
slot name = U1.9-P1/Z1 index=0x2006
slot name = U1.9-P1/Z2 index=0x200a
slot name = U1.9-P2-I1 index=0x200c
slot name = U1.9-P2-I2 index=0x200d
slot name = U1.9-P2-I3 index=0x200e
slot name = U1.9-P2-I4 index=0x200f
slot name = U1.9-P2-I5 index=0x2010
slot name = U1.9-P2-I6 index=0x2011
slot name = U1.9-P2-I7 index=0x2013
slot name = U1.9-P2-I8 index=0x2014
slot name = U1.9-P2-I9 index=0x2015
slot name = U1.9-P2-I10 index=0x2017
slot name = U1.9-P2/Z1 index=0x2012
slot name = U1.9-P2/Z2 index=0x2016


Entered remove_a_slot()
valid_name: Entered! checking for U1.9-P1-I10 all=0x20042b48

Entered remove_check_for_HP
Looking for HP with index: 0x200b
Found pci slot with index: 0x201d
Found pci slot with index: 0x202b
Found pci slot with index: 0x200b
Processing any interrupt nodes drc-index 0x200b
process_intr_nodes_rem: entered
Isolating drc-index 0x200b
Unallocating drc-index 0x200b
U1.9-P1-I10


hscroot$ rsthwres -m "CECNAME" -r slot -l U1.9-P1-I10

   no output

dlpar2# drmgr -a -c slot -s U1.9-P1-I10 -d 4
Entered add_a_slot
valid_name: Entered! checking for U1.9-P1-I10 all=0x2003ff88
sense state of slot (index:0x200b) is 2
Allocating drc-index 0x200b
Allocation failed with -1

0931-011 Unable to allocate the resource to the partition.


hscroot$ rsthwres -m "CECNAME" -r slot -l U1.9-P1-I10
The I/O device  with the physical location code  entered is not a
recoverable resource.


dlpar3# drmgr -a -c slot -s U1.9-P1-I10 -d 4
Entered add_a_slot
valid_name: Entered! checking for U1.9-P1-I10 all=0x2003ff88
sense state of slot (index:0x200b) is 2
Allocating drc-index 0x200b
Allocation failed with -1

0931-011 Unable to allocate the resource to the partition.


hscroot$ lshwres -m "CECNAME" -r slot
...
7040-61D*#######-P1  10       SCSI bus controller



This was with RMC up but failing negotiation on dlpar2:
hscroot$  chhwres -m "CECNAME" -o a -p dlpar2 -r slot -l U1.9-P1-I10 \
-d 5 -w 5
! Debug messages are enabled
! Showing messages for:  MIN
timeout = 5
detailLevel = 5
Start CSP for addSlot
DrawerId[0]: 7040-61D*#######-P1
SlotId[0]: 10
PCIBus[0]: 2
Start calling Slot CIM method
Finish calling Slot CIM method
Finish CSP for addMem
Start RMC for addSlot
rmc command: drmgr -a -c slot -s U1.9-P1-I10 -w 5 -d 5
Enter getLocalSession for RM=IBM.LparCmd
Loading libstdc for the rmcjni layer.
return elm count: 6
succ_num is: 0
endLocalSession() _rcl=com.ibm.rsct.rmc.McResourceClass@4b7b9923
Enter RMCSession.sessionEnded
RMC error encountered mc_errnum0x=4000c

The operating system drmgr command failed.  Please consult the
appropriate operating system log files for further information and
retry the operation if desired.
RMC_CMD_RETURN_CODE: 1023
rtn: 4
command failed without exception
aixErr: true
newExp: com.ibm.hsc.client.rmc.HSCRMCException: HMCERRV3DLPAR020: a
operation for slot has completed, but only 0 out of 1 were successful.
The AIX command is:
drmgr -a -c slot -s U1.9-P1-I10 -w 5 -d 5

The AIX standard output is:
 RMC error encountered mc_errnum0x=4000c
The AIX standard error is:

The return code is 0. The AIX return code is 1023.
Start rollback for addSlot
DrawerId[0]: 7040-61D*#######-P1
SlotId[0]: 10
PCIBus[0]: 2
Start calling Slot CIM method
Finish calling Slot CIM method
Finish rollback for addSlot
com.ibm.hsc.common.exceptions.HSCException: HMC adding I/O slot ......
HMC add I/O slot operation finished successfully.
AIX adding I/O slot .....
.HMCERRV3DLPAR020: a operation for slot has completed, but only 0 out of
1 were successful.
The AIX command is:
drmgr -a -c slot -s U1.9-P1-I10 -w 5 -d 5

The AIX standard output is:
 RMC error encountered mc_errnum0x=4000c
The AIX standard error is:

The return code is 0. The AIX return code is 1023.HMC rollback after AIX
failed while adding I/O slot ......
HMCERRV3DLPAR021: The I/O slot dynamic logical partitioning operation
failed.  Here are the I/O DrawerIDs/SlotIDs that failed and the reasons
for failure:
7040-61D*#######-P1/10 The I/O slot has an invalid state or status.
The dynamic logical partitioning operation failure may cause the data
in HMC and AIX to be out of sync.

        at com.ibm.hsc.cim.client.HSCLPARCIMClient.addSlots(
        HSCLPARCIMClient.java:2239)
        at com.ibm.hsc.command.HardwareConfigurationChange.
        doAddOperation(HardwareConfigurationChange.java:401)
        at com.ibm.hsc.command.HardwareConfigurationChange.
        performCommand(HardwareConfigurationChange.java:1230)
        at com.ibm.hsc.command.HardwareConfigurationChange.
        main(HardwareConfigurationChange.java:1473)
HMC adding I/O slot ......
HMC add I/O slot operation finished successfully.
AIX adding I/O slot .....
.HMCERRV3DLPAR020: a operation for slot has completed, but only 0 out of
1 were successful.
The AIX command is:
drmgr -a -c slot -s U1.9-P1-I10 -w 5 -d 5

The AIX standard output is:
 RMC error encountered mc_errnum0x=4000c
The AIX standard error is:

The return code is 0. The AIX return code is 1023.HMC rollback after AIX
failed while adding I/O slot ......
HMCERRV3DLPAR021: The I/O slot dynamic logical partitioning operation
failed.  Here are the I/O DrawerIDs/SlotIDs that failed and the reasons
for failure:
7040-61D*#######-P1/10 The I/O slot has an invalid state or status.
The dynamic logical partitioning operation failure may cause the data in
HMC and AIX to be out of sync.




Then, cfgmgr had it.

With it down entirely, it hung at:
chhwres -m "CECNAME" -o a -p dlpar3 -r slot -l U1.9-P1-I10 -d 5 -w 5
! Debug messages are enabled
! Showing messages for:  MIN
timeout = 5
detailLevel = 5
Start CSP for addSlot
DrawerId[0]: 7040-61D*#######-P1
SlotId[0]: 10
PCIBus[0]: 2
Start calling Slot CIM method
Finish calling Slot CIM method
Finish CSP for addMem
Start RMC for addSlot
rmc command: drmgr -a -c slot -s U1.9-P1-I10 -w 5 -d 5
Enter getLocalSession for RM=IBM.LparCmd
Loading libstdc for the rmcjni layer.



and after ^C

com.ibm.rsct.rmcjni.McERmc: 2610-610 The command group has been sent,
but the session ended before all responses could be received.

        at com.ibm.rsct.rmcjni.McApi.JNIinvokeClassAction(Native Method)
        at com.ibm.rsct.rmcjni.McApi.invokeClassAction_BP(
        McApi.java:5003)
        at com.ibm.rsct.rmc.McResourceClass.invokeAction(
        McResourceClass.java:2715)
        at com.ibm.hsc.client.rmc.HSCRMCClientImpl.
        rmcDLPARCommand(HSCRMCClientImpl.java:351)
        at com.ibm.hsc.client.rmc.HSCRMCClientImpl.
        runRMCCommand(HSCRMCClientImpl.java:259)
        at com.ibm.hsc.client.rmc.HSCRMCClientImpl.
        rmcDLPAROp(HSCRMCClientImpl.java:154)
        at com.ibm.hsc.cim.client.HSCLPARCIMClient.
        addSlots(HSCLPARCIMClient.java:2181)
        at com.ibm.hsc.command.HardwareConfigurationChange.
        doAddOperation(HardwareConfigurationChange.java:401)
        at com.ibm.hsc.command.HardwareConfigurationChange.
        performCommand(HardwareConfigurationChange.java:1230)
        at com.ibm.hsc.command.HardwareConfigurationChange.
        main(HardwareConfigurationChange.java:1473)
endLocalSession() _rcl=com.ibm.rsct.rmc.McResourceClass@21305ef9
Get exception in endLocalSession _rcl=com.ibm.rsct.rmc.
McResourceClass@21305ef9
Start rollback for addSlot
DrawerId[0]: 7040-61D*#######-P1
SlotId[0]: 10
PCIBus[0]: 2
Start calling Slot CIM method
Enter RMCSession.sessionEnded


After that, drmgr add worked:
Entered add_a_slot
valid_name: Entered! checking for U1.9-P1-I10 all=0x2003ff88
sense state of slot (index:0x200b) is 2
Allocating drc-index 0x200b
Unisolating drc-index 0x200b
Attempting configure-connector with drc-index 0x200b
add_new_nodes: entered
calling process_intr_nodes_add
process_intr_nodes_add: entered
get_cc_property: could not find property interrupt-ranges

find_llist_node: cmp prop:name with ibm,my-drc-index
find_llist_node: cmp prop:device_type with ibm,my-drc-index
find_llist_node: cmp prop:reg with ibm,my-drc-index
find_llist_node: cmp prop:#address-cells with ibm,my-drc-index
find_llist_node: cmp prop:#size-cells with ibm,my-drc-index
find_llist_node: cmp prop:ibm,my-drc-index with ibm,my-drc-index
find_llist_node: cmp value:8203 with 8203
find_llist_node: ret_ll_node=0x200477d8
cfg_HP: return from add_HP_nodes = 0
U1.9-P1-I10
NOTE: AIX Support won't help you if you blow your system up with this. They will tell you to power cycle your whole frame. drmgr is not supported for manual use.
Tags: , , , ,