T. Cass, P. Martucci, H. Renshall IT/PDP
The CERNSP complex consists of three main parts:
The SP2 part of this complex is now three years old and is definitely overloaded, giving poor response for interactive users and long turnround times for the longer batch jobs. In addition, the maintenance contract under which it was bought ends this year and, as is often the case with today's fast changing technologies, more new capacity can now be bought for less than the cost of maintaining the old. We have therefore consulted the committee which advises the director for research on two to three year computing planning, the Forum on Computing: Users and Services (FOCUS), on the strategy to adopt towards the CERNSP services, and as part of this a user survey and pricing enquiry were made.
The conclusions were that there was a strong demand for the public interactive and batch services provided by CERNSP and that an interactive service based on AIX should continue, but that batch services could be provided on a different UNIX architecture should this have a significant advantage. The committee also recommended a strategy whereby public batch capacity is reviewed each three years in order to buy, with a three-year maintenance built in, the current best value while keeping the previous system unmaintained for a three-year overlap. The pricing enquiry showed that two UNIX vendors, one of which was IBM, were offering batch type capacity at almost the same relatively low price while the other vendors were higher. Given this situation, the committee agreed that public batch should also stay on AIX to avoid the manpower costs of converting applications to another UNIX flavour.
During the coming Christmas shutdown we will install new machines that will soon replace the 24 SP2 nodes providing the interactive service, namely, 15 twin-CPU PowerPC 43P workstations, each CPU being some two to three times faster than SP2 nodes (depending on the application), with twice the memory and bigger local scratch space. We will also install the replacements for the 18 SP2 public batch nodes, namely, 15 single CPU PowerPC 43P workstations having a faster chip, about four times faster than the current SP2 batch nodes, and also more scratch space. The PaRC engineering batch nodes will stay unchanged for the moment as these are funded via the Computer Aided Engineering Committee (CAEC). The disk and tape server nodes will be concentrated into one frame of the SP2 to save on maintenance costs, and the 18 existing PowerPC batch nodes will stay unchanged.
We plan to restart the CERNSP service on January 5th, based on the current
nodes, and as soon as possible start the new services and run them in
parallel for some time. Users will be able to
telnet rsplus (instead of
to reach the new interactive nodes and, when we are happy
with their functioning, probably towards the end of January, we will
cernsp to point to the new nodes. For the public
batch service, the FOCUS committee has also approved that we standardise on
one batch system at CERN and the only serious candidate is LSF, the
Load Sharing Facility,
which has the right functionality and portability. We
will hence start the new public batch nodes to run LSF only and
publicise how to use LSF. As users migrate we will gradually close down
the Loadleveler batch on the old SP2 nodes and then migrate
PowerPC nodes running Loadleveler to use also LSF.
Last year we were able to keep the CERNSP interactive service running during the shutdown with minimal attention and good success. This year, we will try to provide a continuous interactive service, but given the amount of work required to reconfigure a reduced SP2 system, with some 10 nodes to move and all having new software, interruptions are likely. The most difficult period will be the few days before the January 5 startup, but we will attempt to work so that there are always some interactive nodes available to define the CERNSP service. The work that needs to be done means, however, that there will be no SP2 batch or PaRC services during the shutdown.