However, today, Shelby had a customer who's nightly backup took around 4 hours, but the restore was taking over 24. In 4 hours, it had restored 5.6g.
The drives are Ultrium, and all 4 are online. The session was always in sendW or recvW.
lots of little stuff, 28% idle.
# lsdev -Ccproc
# time dd if=/dev/zero of=/c00/TEST bs=256k count=4k
Ok, no problem with the shark or CPU load
So, even though backup and restore are to same host, it's stil TCP.
# no -a
# cat dsmserv.opt
# cat dsm.sys
neither tcpwin nor tcpbuf
So, the client's send and receive buffers were both 16k, and the server's send was 16k but receive was 32k.
In data transfer, you generally want your receive buffer a whole multiple of, and at least 2x your other side's send buffer.
What was happening here was they would send one packet, stop, wait, ack, prepare, send again. If I'd been jacked into the tape drive, I'd have seen it stop, rewind, reposition, play again. tiny bits.
So, this is AIX. Windowsize can be 640k. The magstar and ultrium drives stream best at 256k blocks, and that SHOULD be what TSM is putting onto the tapes.
He has NO windows clients, only AIX.
SO, we set TCPWIN to 512 (2x tape block size) and TCPBUF to 256 (half of TCPWIN). We left a note in the dsmserv.opt stating that if they get windows clients, generally the tcpbuf will need to come down to 31. (The reason is that windows tcp send and receive buffers can't be over 63k without special registry mojo, last I recall.)
So, in some small number of minutes, we'd restored the directories, mounted the tape, and restored over 6g of data.