Log in

No account? Create an account

Josh-D. S. Davis

Xaminmo / Omnimax / Max Omni / Mad Scientist / Midnight Shadow / Radiation Master

Previous Entry Share Next Entry
Josh 201604 KWP
OK, so I'm feeling a bit dumb. Trying to pull up stuff from 3.5 years ago is proving dificult.

However, today, Shelby had a customer who's nightly backup took around 4 hours, but the restore was taking over 24. In 4 hours, it had restored 5.6g.

The drives are Ultrium, and all 4 are online. The session was always in sendW or recvW.

# topas
lots of little stuff, 28% idle.
# lsdev -Ccproc
4 processors
# time dd if=/dev/zero of=/c00/TEST bs=256k count=4k
real 18.36s
Ok, no problem with the shark or CPU load

So, even though backup and restore are to same host, it's stil TCP.

# no -a
tcp_sendspace 16384
tcp_recvspace 16384
# cat dsmserv.opt
# cat dsm.sys
neither tcpwin nor tcpbuf

So, the client's send and receive buffers were both 16k, and the server's send was 16k but receive was 32k.

In data transfer, you generally want your receive buffer a whole multiple of, and at least 2x your other side's send buffer.

What was happening here was they would send one packet, stop, wait, ack, prepare, send again. If I'd been jacked into the tape drive, I'd have seen it stop, rewind, reposition, play again. tiny bits.

So, this is AIX. Windowsize can be 640k. The magstar and ultrium drives stream best at 256k blocks, and that SHOULD be what TSM is putting onto the tapes.

He has NO windows clients, only AIX.
SO, we set TCPWIN to 512 (2x tape block size) and TCPBUF to 256 (half of TCPWIN). We left a note in the dsmserv.opt stating that if they get windows clients, generally the tcpbuf will need to come down to 31. (The reason is that windows tcp send and receive buffers can't be over 63k without special registry mojo, last I recall.)

So, in some small number of minutes, we'd restored the directories, mounted the tape, and restored over 6g of data.


  • 1
(Deleted comment)
One plus one is.....

damn... four?

(Deleted comment)
Hrm, on AIX I don't remember anything related to 31. Windows had a limit somewhere registry that had to be set to go over 63.

Gary told someone to use 32 for TCPBUF since windows can't go over 63, which seemed like a typo, but I haven't talked to him yet.

TCPWIN and TCPBUF directly correspond to tcp_sendspace and tcp_recvspace. If you set in TSM, it overrides the network options values.

The 1323 I thought was the same as the tcpnodelay. It's mostly big files, so he'd be nageliciously OK without tcpnodelay, but it was set anyway.

(Deleted comment)

TCPWINDOWSIZE overrides no -a | grep tcp_recvspace
TCPBUFFERSIZE overrides no -a | grep tcp_sendspace

For best streaming perf, TCPWIN on the receiving side should be a whole multiple of TCPBUF of the sending side. Also, tape block size should be a whole multiple of, or even-sized fragments of the data being dumped to it.

AIX is limited at 640k, and the RMSS drives get peak perf at 256K (due to native block size of 256K). They'll handle up to 384k without barfing, and I think they reblock to 256, but you don't get a perf gain with the larger transfers. DLT is native 32K, which is still OK, since 8*32=256 (8 is integer).

I remember doing this stuff alot. I still need to dig into policy domain crud again though.

I have no idea what any of that means, but I am happy to see it ended with a "yAy!"

YAY!!! :))


Imagine you have a leak in the bathroom.
You have to move 100 gallons of water.
You have one bucket to take outside, and one bucket to bail water with.

So, the bucket to take outside, if it's smaller than your bailing bucket, you'll spill alot of water and spend time mopping it up.

If the bailing bucket is too small, you spend all of your time bailing and no real time moving alot of water.

If your bailing bucket is the same size as your one to take outside, it works out well, but you have to wait for Matt to come back from the door.

So, what you REALLY want is multiple buckets to take outside, all the same size as your bailing bucket. That way, you can be filling one that I'm carrying, while matt is trekking to the door and back with the other one. You might even want 3 or 4 carrying buckets.

Well, data is EXACTLY like that. You have a huge sea of bits, and the "TCPBUF" is the bailing bucket, and the "TCPWIN" is the combined sum of all of the carrying buckets. "outside" is defined as "a big, fast, tape drive", sort of like an audio cassette, but it's about a half inch thick, and only one spool.

Basically, these people had one dixie cup to bail with, and one dixie cup to carry with.

So, I reconfigured it such that they had a giant bucket that was exactly as big as the doorway to outside for bailing, and two of those for carrying. Except it wasn't as heavy, and it takes the same time to make a round trip to the door as it does to fill a carrying bucket.


Awesome. I can totally understand that. It reminds me of when I was reading about internet security one day (just for the hell of it) and they talked about mailing boxes with locks and keys. It all came together.

Nice job! Sounds like you really pulled through for them.

  • 1