Optimizing ClearCase Performance on HP-UX (continued) [Last Section: Hardware Configuration] -------------------------------------- Hewlett-Packard Tuning for NFS Performance * Tuning the ClearCase Client * Tuning the VOB Server Tuning the ClearCase Client Client BIODs When an application performs a write to an NFS file, the kernel invokes the NFS "biod" (Block I/O daemon). The biod blocks up a number of NFS write requests equivalent to the block size specified when the disk was mounted. If, for example, the block size is 8192 (the normal value), which is specified as wsize=8192 in /etc/mnttab, the write is delayed by the biod until a full 8 KB block can be written to the server. This is known as a "write-behind". Similarly, when an application performs a read from and NFS file, the kernel invokes the NFS biod, which reads the required block (normally 8 KB, which is specified as in /etc/disktab when the filesystem is made), and does a read-ahead for the next block 8 KB as well. However, if an attempt is made to read to or write from and NFS file and the kernel finds that all biod's are busy, then the applications read or write will cause an immediate NFS RPC to the server to service that read or write. Thus too few biod's will reduce client performance. The default is 4, and this is too few. Try starting 12 - 16 biods at either HP-UX 9.X or 10.X and monitor biod usage using GlancePlus. Setting wsize and rsize on the Client For best performance, client read/write block size (rsize and wsize, respectively) should equal the server's disk block size. To see why, suppose an NFS fileserver filesystem is configured for 8 KB blocks and 1 KB fragments. If the client mounts the server with rsize=8192 and wsize=8192, a client write may cause 2 to 3 disk writes: 1. A write of 8 KB of data to the server's 8KB disk block 2. A write of the file's inode to disk 3. For larger files, the potential write on an indirect block to disk. If a client mounts this same server with rsize=1024 and wsize=1024 instead, a client write may cause over 30 server disk reads and writes (up to 3 for the first KB of data, and up to 4 reads and writes for each of the next 7 KB of data). To write the first 1 KB of data: 1. A write of 8 KB of data to the server's 8KB disk block 2. A write of the file's inode to disk 3. For larger files, the potential write on an indirect block to disk. To write the nth 1 KB of data (2 <= n <= 8): 1. A read of the "(n-1)th" KB fragment to combine it with the nth 1 KB fragment 2. A write of the combined n KB of data to n adjacent 1 KB fragments 3. A write of the file's inode to disk. 4. For larger files, the potential write on an indirect block to disk. Note. NT clients follow the second scenario and thus place a heavy load on the server. Tuning the VOB Server Export VOBs with -async This improves disk performance and requires UPS on the VOB server. (See discussion in section X.X below for more details.) How Many NFSDs? To discuss this question, we first review how data is received and processed by an NFS file server. In the case of a write, the incoming NFS data (RPC) is taken from the network and placed onto the system's UDP receive queue. An nfsd removes the NFS data from the queue and writes it to the server's filesystem I/O buffer RAM. Finally, it is taken from the buffer RAM and written to disk. For a read, outgoing data is taken from disk, placed in the server's filesystem I/O buffer RAM. An nfsd then transfers the data from buffer RAM to the UDP send queue. Finally, data is taken from the UDP send queue and transferred to the network. From this picture, it is apparent that the server needs enough nfsds to keep the UDP socket empty. On the other hand, too many nfsds may cause the cpu to context switch excessively. So, How many NFSDs do I need at HP-UX 9.X? Restrict the number of nfsds. For example, if 32 nfsds are started and a single NFS rpc comes into the server, then all 32 nfsds will be placed on the server's run queue. The HP-UX will context switch in the first nfsd, which will service the incoming rpc, and then the other 31 nfsds will be context switched in, find no rpc to service, and then block on I/O. At the low end, a good first order approximation is to have 2 NFSDs per disk spindle at HP-UX 9.X, which is also likely to be the minimum number of NFSDs required. At the high end, NFS benchmarks performed in HP labs indicate that there is a very poor cost/benefit ratio for starting more than 30 or so NFSDs at HP-UX 9.X due to excessive context-switching. So pick some number in between, and use the techniques below to tell if you have too many or too few. You can start with 32 NFSDs, the maximum recommended, and reduce this number if analysis shows this is more than you need. How can I tell if I have started too few NFSDs? If there are not enough NFSDs to keep up with incoming rpc's, the UDP socket will overflow, causing NFS client re quests to be dropped. You can check the number of times the UDP socket has overflowed by executing the netstat command, eg: $ netstat -s | grep overflow 1120 socket overflows A number in the 10's or 100's is probably okay. In the 1000's is too many. How can I tell if too many NFSDs have been started? Look at the cpu utilization of the NFSDs with the following command: ps -ef | grep nfsd | grep -v pcnfsd | grep -v grep If some of the NFSDs show a lot of accumulated cpu time, while cpu time for other NFSDs is significantly reduced and levels off, you may have more NFSDs than you need because the latter are doing little, if any, work on behalf of incomping rpc requests. Another way to look at this is to examine the output from "nfsstat -c" (the following output is from an HP-UX 9.X server): $ nfsstat -s Server rpc: calls badcalls nullrecv badlen xdrcall nfsdrun 2335351 0 0 0 0 53333280 "calls" indicates the server received 2.3M rpc calls, and ideally should have awakened 2.3M nfsds to service these incoming rpc calls. "nfsdrun" indicates the server actually awakened 5.3M NFSDs to service the 2.3M rpc calls. It may appear that the number of nfsds being run should be cut in half: if 24 NFSDs have been started, it may be more appropriate to start only 12 NFSDs to get "calls" and "nfsdrun" to match. Bear in mind, however, that although 12 NFSDs may be the average needed over the period of a day, there are busy times when rpc calls peak, and the server may need 2 times as many NFSDs as the daily average. For a ratio of 2: 1 between peak and average NFS rpc activity, a corresponding ratio of 2 : 1 for "nfsdrun" to "calls" may actually be good. So adjust the number of NFSDS accordingly. How many NFSDs at HP-UX 10.X? Use as many nfsds as you need to keep the UDP socket emtpy. Regardless of how many nfsds are started, for each rpc arriving at the server, only a single nfsd is awakened at HP-UX 10.X. For example, To achieve 4363 NFS IOPS (4363 NFS rpc calls per second) with the HP 9000 series K400, the LADDIS benchmark was run with 132 NFSDs. So, try at least 32 NFSD processes, and you may want as many as 64 - 128: at HP-UX 10.X, the only cost is the entry in the process table along with some RAM and swap. Disk Buffer Cache (aka Disk Buffer RAM) - a Kernel Parameter Overview The performance bottleneck in most fileservers is usually the disk subsystem: disk heads move slowly and NFS requests stack up in the disk queues waiting in line for access to disk heads. To mininimize the number of such physical disk accesses, the operating system makes use of an area of RAM referred to as disk buffer cache, as follows. The first time an NFS read occurs for a file block, a slow physical disk access occurs, the file block is read into buffer cache, and it is then passed to the client. Subsequent reads on that file block are made by a fast logical disk access to buffer cache, thus avoiding the slower physical disk access, until the buffer cache location is flushed. Similarly, NFS writes are accumulated in disk buffer cache until their total size reaches some threshold, whereupon the buffer cache location is flushed with a slow physical write to disk. The effective use of disk buffer cache is possibly the single most important factor affecting ClearCase performance. It can cause the majority of such NFS requests to bypass physical disk accesses and markedly speed up NFS read and write response time to clients. The two most important ways to achieve this are 1) to have enough buffer cache, and 2) to keep from flushing its contents, that is, as far as possible avoid running anything other than NFS on a VOB server. How much disk buffer cache do I need? For optimum performance, the disk buffer cache should be able to hold the data accessed by 85% - 95% of the read requests, which is known as the working set size. (To assess whether this your estimated disk buffer cache for the estimated working set is sufficient, you can measure the Buffer Cache hit rate with GlancePlus.) However, this is only a recommendation and may not be achievable at some sites. For example, sites with users reading in 100 MB ME-CAD files will flush a large amount of the current buffer cache to read in a 100 MB file. It is likely that little, if any, of this file would have previously been in buffer cache. In this type of situation, it may not be possible to achieve more than a 20% - 30% buffer cache hit rate. For sites that have no existing server and cannot measure the working set size, a useful first order approximation is 128 KB of buffer cache per NFS IOP the server is providing. Thus, a K400 with 4 cpus, capable of 4363 NFS IOPs, might start with (4363 * 128K) = about 512 MB of buffer cache. Or you can first estimate how much RAM is required for everything else (see next section). Buffer Cache at HP-UX 9.X On the 800 series: 10% of RAM is allocated for disk buffer cache by default, which is much too low. On the 700 series disk buffer cache is dynamic by default (buffer cache is set to 0), which means that it should float as high as low as is needed: however, it doesn't work, and over time it will slow down your server to the point where you need to reboot it. So for either 700 or 800 series, first determine a good size for buffer cache, then set the parameter bufpages to this value. Leave 24MB for the kernel, increase this to 32 MB for large configuration MVFS, and add whatever you need for anything else that is running (avoid running any other applications if possible). The remaining memory should be dedicated to buffer cache. For example, for 128 MB of buffer cache, bufpages = (128 MB / 4096) = 32,768. Upper limit on buffer cache size. Due to the segmented address space (4 quadrants of 1 GB each) of HP-UX, the practical limit of buffer cache that should be configured on a server is 800 MB . Once bufpages has been set, monitor the buffer cache hit rate with GlancePlus on the "m" (memory detail) screen, and adjust the amount of buffer cache on the server as you need to. Buffer Cache at HP-UX 10.X In this case, the default for disk buffer cache is dynamic (bufpages = 0). (SPEC_SFS_1 benchmark data indicates the system performs better using dynamic buffer cache than it does using fixed buffer cache.) The default maximum for buffer cache is 50% of RAM. Instead, set the minimum and maximum for buffer cache as follows: db_max_pct = 90% (approximately) db_min_pct = as large as you can (see discussion for HP-UX 9.X above), typically 70%. Once bufpages has been set, monitor the buffer cache hit rate with GlancePlus on the "m" (memory detail) screen, and adjust the minimum and maximum amount of buffer cache on the server as you need to. Locate VOBS and their server processes on dedicated server machines. Since the performance of an NFS fileserver is very sensitive to the buffer cache hit rate (every cache miss causes a disk seek and maybe a wait in the disk queue), avoid running any significant applications on it other than ClearCase (with server processes vob_server, db_server (one or more), and vobrpc_server (one or more)). For example, avoid compiling on a fileserver, because this will cause source and header files to be read into buffer cache from disk and flush the NFS client files that were there already. The exceptions to this rule are well-contained server-based application programs such as NIS, DNS, and the X11 Font Server. Other Kernel Parameters nfile (maximum number of open files) This should be set to the maximum number of files that will be opened by all clients on the server at any given time. This figure may vary widely, depending on whether the clients are diskless and open all their files via the server, or the clients open only data files on the server. The actual number of files opened may be monitored with GlancePlus (on the "tables" screen). ninode (the number of inodes) A general rule of thumb is that this should be equal to 1 to 2 times the number of 8K blocks of buffer cache. Another way to state this is that there should be at least one inode entry for each 8K file block of buffer cache, and possibly a second inode entry to retain the inodes in memory for the most recently access files. Following this rule of thumb, for every 128 MB of buffer cache, the value of ninode should minimally be incremented by 128MB/8KB = 16,000 entries. For the SPEC_SFS_1 benchmark, the K400/4 at 4363 NFS IOPS was configured for ninode = 80,000. Since disk head movement is usually the limiting factor in fileserver performance, and because an inode entry is only 240 bytes, the extra memory required for the inode table is a good investment. So at HP-UX 9.X or 10.X, you can set ninode way up, ie, in the range 80,000 to 100,000. nflocks (the number of file locks) This should be set to the maximum number of file locks that will be requested by clients and the server at any time. This figure may vary widely depending on whether the applications being run actually use NFS locks (few applications do). Set to about (30*number_of_clients). This may be monitored with GlancePlus on the "tables" screen. nproc (maximum number of processes) This should be set to the maximum number of processes that will be executed on the server at any given time.This should not be a large number, given that fileservers should not run applications other than a few like NIS, DNS, etc. Nevertheless, the default value of 256 is too low. Set to at least 512. Tuning the Disks The primary mechanism used to reduce the number of disk accesses is the effective use of disk buffer cache, as discussed above. However, there are other technologies that can help alleviate the slow performance of the disk writes themselves by affecting when they happen. Historically, this has not been possible because of a requirement imposed by NFS to avoid accidental loss of data written by NFS to the server, namely that the server must write the data to disk prior to acknowledging write completion to the client. Two such technologies are disk caching and asynchronous NFS writes. Disk Caching on HP Model 10 or 20 (aka Nike) Disk Arrays (series 800 systems only) HP-UX supports HP model 10 and model 20 disk arrays with immediate reporting mode: when NFS data is written from buffer cache to disk, the disk immediately reports (to the server, which then reports to the NFS client) that the data has been written, even though it has only been written to disk cache, RAM local to the disk or disk array. The potential for the loss of data, should the disk experience a power failure, is reduced by using battery-backed (disk cache) RAM. This disk caching has two benefits. The first is that the NFS client is released to continue processing prior to actual disk head movement, speeding up client processing by many milliseconds. The second is that the disk drive may be able to do more I/O's per second because it is being allowed to use its internal, more intelligent write algorithms. This capability is available on HP model 10 and model 20 disk arrays, but not on series 700 disks. Caveat. Recommend that customers do not use this capability for servers that use Cascade arrays or third party disks. Exporting VOB partitions with -async, and Uninterruptible Power Supplies (UPS) Data written using disk caching, described above, must go through buffer cache, and usually a disk queue, to make it to the disk cache before the client can be released. HP--UX supports asynchronous NFS writes to release the client even sooner: prior to placing client data in buffer cache, the HP-UX NFSD immediately reports the data as written. This capablility is achieved for by exporting the filesystem in which a VOB resides with the -async option. The benefits of asynchronous writes are as follows. First, the NFS client is released to continue processing as soon as the nfsd has data. Secondly, the NFS fileserver is able to do many more NFS IOPS because it is now allowed to use its internal, more intelligent disk write algorithms. Lastly, server repsonse time may be dramatically improved. When using the -async option, place your VOB server on an integrated UPS to protect against the potential loss of data and VOBS getting scrambled should the server experience a power failure or a disk crash. ---------------------------------------------------------------------------- [Next Section: VOBS, Views, and Servers] ---------------------------------------------------------------------------- [To: Introduction and Table of Contents] Top| HP Technical Computing Homepage| HP Homepage| HP Search (c) Copyright 1996 Hewlett-Packard Company. Contact webmaster@www.InterWorks.org if you have problems accessing this document on this server. August 26, 1996