Monitoring AFS Connectivity
The increasingly widespread use of AFS at CERN naturally makes the reliability of the service a matter of prime concern to many users. In a dual effort to speed up the fixing of problems and to identify points of weakness that require attention, we are introducing a new service to monitor the connectivity of key AFS client machines.
The possible reason for the apparent instability of the AFS service are very varied. In some cases misunderstandings on how AFS and in particular AFS security works are at the origin of the problems, in some others a bad AFS cache manager configuration on the local workstation is the cause. In certain cases network connectivity problems cause applications to fail.
We have therefore decided to monitor critical AFS client machines all over the site. The data thus collected will primarily serve as a log when problems are to be traced back. They might in future be used to compile statistics about the reliability of the AFS service as perceived from workstations in different areas of the site.
Data will be collected initially every 10 minutes, which should not perceptibly affect an AFS client machine. It is however ``heavy'' enough on the initiating AFS server that we could not possibly monitor all machines at CERN. This service is therefore aimed primarily at system administrators, who should designate 1-2 machines representative for their cluster of stations. The node names of these machines should be communicated to AFS.Support@cern.ch. The log files will be made available on request.