Wednesday, January 2, 2008

Using NFS partitions on AIX

Unless you are running an E1 enterprise server on an NFS partition on the AIX platform, you can probably skip this posting.

Still here? Ok. This post outlines a potential problem with changing the tools release of an enterprise server when it is running on a NFS partition on AIX. It pertains to that combination only.

The AIX operating system has a feature that keeps shared libraries in memory even when the program that loads them terminates. Subsequent loads of that or any other program using the same library would be faster because the library is already in memory.

This behavior can cause some problems when the shared library is located on an NFS partition. Consider the case when Server Manager is performing a tools change for an enterprise server. The management agent will 1) stop the enterprise server, 2) delete the existing tools release, 3) extract and replace it with the new tools release.

So where's the problem? After stopping the enterprise server the E1 shared libraries may be cached by AIX even though no active processes are using them. AIX maintains open file handle to the shared library. On UNIX based platforms you are able to delete a file that is open by another process; although it will immediately disappear from the file system directory listings it will not actually be removed once the last handle to that file is closed. This behavior is done within the filesystem implementation.

The remote nature of the NFS file system requires a special implementation. When an open file is deleted on a NFS partition it will appear as a .nfs##### file in the same directory, where #### refers to a number randomly assigned. This file cannot be removed directly; it will disappear as soon as the last process holding the originally deleted file closes that handle.

So what does this have to do with E1 and Server Manager? The second step of performing a tools release change involves deleting the existing tools release. The caching of the shared libraries, and thus the presence of the .nfs#### files in the $EVRHOME/system/lib directories will prevent the removal of the system directory. This will cause the tools release change to fail, and the previous tools release will be restored. Even root cannot delete this .nfs files directly.

What can be done is to stop the enterprise server using Server Manager then sign on as root and run the command 'slibclean'. This will instruct AIX to unload/uncache any shared libraries that are no longer being used by an active process. You may then change the tools release using Server Manager without any issue.