Optimizing ClearCase Performance on HP-UX (continued)
                   [Last Section: Hardware Configuration]
                   --------------------------------------

Hewlett-Packard

Tuning for NFS Performance

   * Tuning the ClearCase Client
   * Tuning the VOB Server

Tuning the ClearCase Client

Client BIODs

When an application performs a write to an NFS file, the kernel invokes the
NFS "biod" (Block I/O daemon). The biod blocks up a number of NFS write
requests equivalent to the block size specified when the disk was mounted.
If, for example, the block size is 8192 (the normal value), which is
specified as wsize=8192 in /etc/mnttab, the write is delayed by the biod
until a full 8 KB block can be written to the server. This is known as a
"write-behind".

Similarly, when an application performs a read from and NFS file, the kernel
invokes the NFS biod, which reads the required block (normally 8 KB, which
is specified as in /etc/disktab when the filesystem is made), and does a
read-ahead for the next block 8 KB as well.

However, if an attempt is made to read to or write from and NFS file and the
kernel finds that all biod's are busy, then the applications read or write
will cause an immediate NFS RPC to the server to service that read or write.
Thus too few biod's will reduce client performance. The default is 4, and
this is too few. Try starting 12 - 16 biods at either HP-UX 9.X or 10.X and
monitor biod usage using GlancePlus.

Setting wsize and rsize on the Client

For best performance, client read/write block size (rsize and wsize,
respectively) should equal the server's disk block size.

To see why, suppose an NFS fileserver filesystem is configured for 8 KB
blocks and 1 KB fragments. If the client mounts the server with rsize=8192
and wsize=8192, a client write may cause 2 to 3 disk writes:

  1. A write of 8 KB of data to the server's 8KB disk block
  2. A write of the file's inode to disk
  3. For larger files, the potential write on an indirect block to disk.

If a client mounts this same server with rsize=1024 and wsize=1024 instead,
a client write may cause over 30 server disk reads and writes (up to 3 for
the first KB of data, and up to 4 reads and writes for each of the next 7 KB
of data).

To write the first 1 KB of data:

  1. A write of 8 KB of data to the server's 8KB disk block
  2. A write of the file's inode to disk
  3. For larger files, the potential write on an indirect block to disk.

To write the nth 1 KB of data (2 <= n <= 8):

  1. A read of the "(n-1)th" KB fragment to combine it with the nth 1 KB
     fragment
  2. A write of the combined n KB of data to n adjacent 1 KB fragments
  3. A write of the file's inode to disk.
  4. For larger files, the potential write on an indirect block to disk.

Note. NT clients follow the second scenario and thus place a heavy load on
the server.

Tuning the VOB Server

Export VOBs with -async

This improves disk performance and requires UPS on the VOB server. (See
discussion in section X.X below for more details.)

How Many NFSDs?

To discuss this question, we first review how data is received and processed
by an NFS file server.

In the case of a write, the incoming NFS data (RPC) is taken from the
network and placed onto the system's UDP receive queue. An nfsd removes the
NFS data from the queue and writes it to the server's filesystem I/O buffer
RAM. Finally, it is taken from the buffer RAM and written to disk.

For a read, outgoing data is taken from disk, placed in the server's
filesystem I/O buffer RAM. An nfsd then transfers the data from buffer RAM
to the UDP send queue. Finally, data is taken from the UDP send queue and
transferred to the network.

From this picture, it is apparent that the server needs enough nfsds to keep
the UDP socket empty. On the other hand, too many nfsds may cause the cpu to
context switch excessively.

So, How many NFSDs do I need at HP-UX 9.X?

Restrict the number of nfsds. For example, if 32 nfsds are started and a
single NFS rpc comes into the server, then all 32 nfsds will be placed on
the server's run queue. The HP-UX will context switch in the first nfsd,
which will service the incoming rpc, and then the other 31 nfsds will be
context switched in, find no rpc to service, and then block on I/O.

At the low end, a good first order approximation is to have 2 NFSDs per disk
spindle at HP-UX 9.X, which is also likely to be the minimum number of NFSDs
required.

At the high end, NFS benchmarks performed in HP labs indicate that there is
a very poor cost/benefit ratio for starting more than 30 or so NFSDs at
HP-UX 9.X due to excessive context-switching.

So pick some number in between, and use the techniques below to tell if you
have too many or too few. You can start with 32 NFSDs, the maximum
recommended, and reduce this number if analysis shows this is more than you
need.

How can I tell if I have started too few NFSDs? If there are not enough
NFSDs to keep up with incoming rpc's, the UDP socket will overflow, causing
NFS client re quests to be dropped. You can check the number of times the
UDP socket has overflowed by executing the netstat command, eg:

$ netstat -s  | grep overflow

           1120 socket overflows

A number in the 10's or 100's is probably okay. In the 1000's is too many.

How can I tell if too many NFSDs have been started? Look at the cpu
utilization of the NFSDs with the following command:

ps -ef | grep nfsd | grep -v pcnfsd | grep -v grep

If some of the NFSDs show a lot of accumulated cpu time, while cpu time for
other NFSDs is significantly reduced and levels off, you may have more NFSDs
than you need because the latter are doing little, if any, work on behalf of
incomping rpc requests.

Another way to look at this is to examine the output from "nfsstat -c" (the
following output is from an HP-UX 9.X server):

$ nfsstat -s

Server rpc:

calls       badcalls  nullrecv  badlen   xdrcall   nfsdrun

2335351        0         0         0        0      53333280

"calls" indicates the server received 2.3M rpc calls, and ideally should
have awakened 2.3M nfsds to service these incoming rpc calls. "nfsdrun"
indicates the server actually awakened 5.3M NFSDs to service the 2.3M rpc
calls. It may appear that the number of nfsds being run should be cut in
half: if 24 NFSDs have been started, it may be more appropriate to start
only 12 NFSDs to get "calls" and "nfsdrun" to match.

Bear in mind, however, that although 12 NFSDs may be the average needed over
the period of a day, there are busy times when rpc calls peak, and the
server may need 2 times as many NFSDs as the daily average. For a ratio of
2: 1 between peak and average NFS rpc activity, a corresponding ratio of 2 :
1 for "nfsdrun" to "calls" may actually be good. So adjust the number of
NFSDS accordingly.

How many NFSDs at HP-UX 10.X?

Use as many nfsds as you need to keep the UDP socket emtpy. Regardless of
how many nfsds are started, for each rpc arriving at the server, only a
single nfsd is awakened at HP-UX 10.X. For example, To achieve 4363 NFS IOPS
(4363 NFS rpc calls per second) with the HP 9000 series K400, the LADDIS
benchmark was run with 132 NFSDs.

So, try at least 32 NFSD processes, and you may want as many as 64 - 128: at
HP-UX 10.X, the only cost is the entry in the process table along with some
RAM and swap.

Disk Buffer Cache (aka Disk Buffer RAM) - a Kernel Parameter

Overview

The performance bottleneck in most fileservers is usually the disk
subsystem: disk heads move slowly and NFS requests stack up in the disk
queues waiting in line for access to disk heads. To mininimize the number of
such physical disk accesses, the operating system makes use of an area of
RAM referred to as disk buffer cache, as follows.

The first time an NFS read occurs for a file block, a slow physical disk
access occurs, the file block is read into buffer cache, and it is then
passed to the client. Subsequent reads on that file block are made by a fast
logical disk access to buffer cache, thus avoiding the slower physical disk
access, until the buffer cache location is flushed. Similarly, NFS writes
are accumulated in disk buffer cache until their total size reaches some
threshold, whereupon the buffer cache location is flushed with a slow
physical write to disk.

The effective use of disk buffer cache is possibly the single most important
factor affecting ClearCase performance. It can cause the majority of such
NFS requests to bypass physical disk accesses and markedly speed up NFS read
and write response time to clients. The two most important ways to achieve
this are 1) to have enough buffer cache, and 2) to keep from flushing its
contents, that is, as far as possible avoid running anything other than NFS
on a VOB server.

How much disk buffer cache do I need?

For optimum performance, the disk buffer cache should be able to hold the
data accessed by 85% - 95% of the read requests, which is known as the
working set size. (To assess whether this your estimated disk buffer cache
for the estimated working set is sufficient, you can measure the Buffer
Cache hit rate with GlancePlus.) However, this is only a recommendation and
may not be achievable at some sites. For example, sites with users reading
in 100 MB ME-CAD files will flush a large amount of the current buffer cache
to read in a 100 MB file. It is likely that little, if any, of this file
would have previously been in buffer cache. In this type of situation, it
may not be possible to achieve more than a 20% - 30% buffer cache hit rate.

For sites that have no existing server and cannot measure the working set
size, a useful first order approximation is 128 KB of buffer cache per NFS
IOP the server is providing. Thus, a K400 with 4 cpus, capable of 4363 NFS
IOPs, might start with (4363 * 128K) = about 512 MB of buffer cache.

Or you can first estimate how much RAM is required for everything else (see
next section).

Buffer Cache at HP-UX 9.X

On the 800 series: 10% of RAM is allocated for disk buffer cache by default,
which is much too low. On the 700 series disk buffer cache is dynamic by
default (buffer cache is set to 0), which means that it should float as high
as low as is needed: however, it doesn't work, and over time it will slow
down your server to the point where you need to reboot it. So for either 700
or 800 series, first determine a good size for buffer cache, then set the
parameter bufpages to this value.

Leave 24MB for the kernel, increase this to 32 MB for large configuration
MVFS, and add whatever you need for anything else that is running (avoid
running any other applications if possible). The remaining memory should be
dedicated to buffer cache.

For example, for 128 MB of buffer cache, bufpages = (128 MB / 4096) =
32,768.

Upper limit on buffer cache size. Due to the segmented address space (4
quadrants of 1 GB each) of HP-UX, the practical limit of buffer cache that
should be configured on a server is 800 MB .

Once bufpages has been set, monitor the buffer cache hit rate with
GlancePlus on the "m" (memory detail) screen, and adjust the amount of
buffer cache on the server as you need to.

Buffer Cache at HP-UX 10.X

In this case, the default for disk buffer cache is dynamic (bufpages = 0).
(SPEC_SFS_1 benchmark data indicates the system performs better using
dynamic buffer cache than it does using fixed buffer cache.) The default
maximum for buffer cache is 50% of RAM. Instead, set the minimum and maximum
for buffer cache as follows:

db_max_pct = 90% (approximately)
db_min_pct = as large as you can (see discussion for HP-UX 9.X above),
typically 70%.

Once bufpages has been set, monitor the buffer cache hit rate with
GlancePlus on the "m" (memory detail) screen, and adjust the minimum and
maximum amount of buffer cache on the server as you need to.

Locate VOBS and their server processes on dedicated server machines.

Since the performance of an NFS fileserver is very sensitive to the buffer
cache hit rate (every cache miss causes a disk seek and maybe a wait in the
disk queue), avoid running any significant applications on it other than
ClearCase (with server processes vob_server, db_server (one or more), and
vobrpc_server (one or more)). For example, avoid compiling on a fileserver,
because this will cause source and header files to be read into buffer cache
from disk and flush the NFS client files that were there already. The
exceptions to this rule are well-contained server-based application programs
such as NIS, DNS, and the X11 Font Server.

Other Kernel Parameters

nfile (maximum number of open files)

This should be set to the maximum number of files that will be opened by all
clients on the server at any given time. This figure may vary widely,
depending on whether the clients are diskless and open all their files via
the server, or the clients open only data files on the server. The actual
number of files opened may be monitored with GlancePlus (on the "tables"
screen).

ninode (the number of inodes)

A general rule of thumb is that this should be equal to 1 to 2 times the
number of 8K blocks of buffer cache. Another way to state this is that there
should be at least one inode entry for each 8K file block of buffer cache,
and possibly a second inode entry to retain the inodes in memory for the
most recently access files.

Following this rule of thumb, for every 128 MB of buffer cache, the value of
ninode should minimally be incremented by 128MB/8KB = 16,000 entries.

For the SPEC_SFS_1 benchmark, the K400/4 at 4363 NFS IOPS was configured for
ninode = 80,000.

Since disk head movement is usually the limiting factor in fileserver
performance, and because an inode entry is only 240 bytes, the extra memory
required for the inode table is a good investment. So at HP-UX 9.X or 10.X,
you can set ninode way up, ie, in the range 80,000 to 100,000.

nflocks (the number of file locks)

This should be set to the maximum number of file locks that will be
requested by clients and the server at any time. This figure may vary widely
depending on whether the applications being run actually use NFS locks (few
applications do). Set to about (30*number_of_clients). This may be monitored
with GlancePlus on the "tables" screen.

nproc (maximum number of processes)

This should be set to the maximum number of processes that will be executed
on the server at any given time.This should not be a large number, given
that fileservers should not run applications other than a few like NIS, DNS,
etc. Nevertheless, the default value of 256 is too low. Set to at least 512.

Tuning the Disks

The primary mechanism used to reduce the number of disk accesses is the
effective use of disk buffer cache, as discussed above. However, there are
other technologies that can help alleviate the slow performance of the disk
writes themselves by affecting when they happen. Historically, this has not
been possible because of a requirement imposed by NFS to avoid accidental
loss of data written by NFS to the server, namely that the server must write
the data to disk prior to acknowledging write completion to the client. Two
such technologies are disk caching and asynchronous NFS writes.

Disk Caching on HP Model 10 or 20 (aka Nike) Disk Arrays (series 800 systems
only)
HP-UX supports HP model 10 and model 20 disk arrays with immediate reporting
mode: when NFS data is written from buffer cache to disk, the disk
immediately reports (to the server, which then reports to the NFS client)
that the data has been written, even though it has only been written to disk
cache, RAM local to the disk or disk array. The potential for the loss of
data, should the disk experience a power failure, is reduced by using
battery-backed (disk cache) RAM.

This disk caching has two benefits. The first is that the NFS client is
released to continue processing prior to actual disk head movement, speeding
up client processing by many milliseconds. The second is that the disk drive
may be able to do more I/O's per second because it is being allowed to use
its internal, more intelligent write algorithms.

This capability is available on HP model 10 and model 20 disk arrays, but
not on series 700 disks.

Caveat. Recommend that customers do not use this capability for servers that
use Cascade arrays or third party disks.

Exporting VOB partitions with -async, and Uninterruptible Power Supplies
(UPS)

Data written using disk caching, described above, must go through buffer
cache, and usually a disk queue, to make it to the disk cache before the
client can be released. HP--UX supports asynchronous NFS writes to release
the client even sooner: prior to placing client data in buffer cache, the
HP-UX NFSD immediately reports the data as written. This capablility is
achieved for by exporting the filesystem in which a VOB resides with the
-async option.

The benefits of asynchronous writes are as follows. First, the NFS client is
released to continue processing as soon as the nfsd has data. Secondly, the
NFS fileserver is able to do many more NFS IOPS because it is now allowed to
use its internal, more intelligent disk write algorithms. Lastly, server
repsonse time may be dramatically improved.

When using the -async option, place your VOB server on an integrated UPS to
protect against the potential loss of data and VOBS getting scrambled should
the server experience a power failure or a disk crash.

----------------------------------------------------------------------------

[Next Section: VOBS, Views, and Servers]
----------------------------------------------------------------------------

[To: Introduction and Table of Contents]
Top| HP Technical Computing Homepage| HP Homepage| HP Search

(c) Copyright 1996 Hewlett-Packard Company.
Contact webmaster@www.InterWorks.org if you have problems accessing this
document on this server.

August 26, 1996