Josh-Daniel S. Davis (joshdavis) wrote,
Josh-Daniel S. Davis


OK, so I'm feeling a bit dumb. Trying to pull up stuff from 3.5 years ago is proving dificult.

However, today, Shelby had a customer who's nightly backup took around 4 hours, but the restore was taking over 24. In 4 hours, it had restored 5.6g.

The drives are Ultrium, and all 4 are online. The session was always in sendW or recvW.

# topas
lots of little stuff, 28% idle.
# lsdev -Ccproc
4 processors
# time dd if=/dev/zero of=/c00/TEST bs=256k count=4k
real 18.36s
Ok, no problem with the shark or CPU load

So, even though backup and restore are to same host, it's stil TCP.

# no -a
tcp_sendspace 16384
tcp_recvspace 16384
# cat dsmserv.opt
# cat dsm.sys
neither tcpwin nor tcpbuf

So, the client's send and receive buffers were both 16k, and the server's send was 16k but receive was 32k.

In data transfer, you generally want your receive buffer a whole multiple of, and at least 2x your other side's send buffer.

What was happening here was they would send one packet, stop, wait, ack, prepare, send again. If I'd been jacked into the tape drive, I'd have seen it stop, rewind, reposition, play again. tiny bits.

So, this is AIX. Windowsize can be 640k. The magstar and ultrium drives stream best at 256k blocks, and that SHOULD be what TSM is putting onto the tapes.

He has NO windows clients, only AIX.
SO, we set TCPWIN to 512 (2x tape block size) and TCPBUF to 256 (half of TCPWIN). We left a note in the dsmserv.opt stating that if they get windows clients, generally the tcpbuf will need to come down to 31. (The reason is that windows tcp send and receive buffers can't be over 63k without special registry mojo, last I recall.)

So, in some small number of minutes, we'd restored the directories, mounted the tape, and restored over 6g of data.

Tags: performance, storage
  • Post a new comment


    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded