|Previous:||Physics Computing||(See printing version)|
Charles Curran , IT/PDP
Some time ago the ALICE collaboration decided to start a project in collaboration with IT to try to use a commercial HSM as a means of storing experimental data at rates approaching those that they will require when LHC starts up. While most LHC experiments demand only a 'modest' data rate of ~ 100 MBytes/s, ALICE will require to sustain a rate of ~1000 MBytes/s during heavy ion runs. At that time, the 'commercial' candidate was the Hight Performance Storage System, HPSS, which was (and still is) an IBM-led consortium. You can see more about HPSS here. HPSS is a very large system, initially limited to IBM hardware, but being extended to COMPAQ (DEC) hosts and Redwood tape drives in a joint project between COMPAQ and CERN, and now also being ported to Sun platforms. The aim of HPSS is to be scalable, up to the size needed for LHC experiments, and to make maximum use of the speed of devices integrated into it. This is, as you might imagine, very important, if the cost of the data handling systems is to be acceptable within CERN's budget.
The initial aim of the project was to sustain ~100 MBytes/s for a period of at least a week, using the entire chain from ALICE's DAQ system down to the central robotic tape library. A first attempt was made last year to store data into HPSS at up to 40 MBytes/s. This soon revealed that we had insufficient disk to sustain such a transfer rate (normally, data sent to HPSS is first written to HPSS-owned disk, and subsequently migrated from disk to tape). It also revealed unexpected hardware problems (since resolved) in the IBM hosts when attempting to sustain a high data rate to tape for several days. Difficulties were also seen with the COMPAQ Alpha 4100 SMP hosts, where performance was lower than expected. These problems have also been corrected. Nevertheless, a fairly respectable 25 MBytes/s was sustained when the tape writing step was suppressed. This year's tests are expected to improve greatly on this rate.
Since the start of this project, a development of the existing
CERN stager software has begun: CASTOR. You can see more about
CASTOR at URL:
http://wwwinfo.cern.ch/pdp/castor. CASTOR will provide
some of the facilities of HPSS, but is a smaller system and can be
more readily adapted to CERN's requirements. It is also designed to
extract the maximum performance from attached devices, and to scale
to the requirements of LHC. Its development has gone ahead very
fast, and it has performed very well in extensive pre-production
tests. Therefore, it will also be tried out in the ALICE Data
The test of CASTOR and a second test using HPSS will start shortly, probably on March 23rd. All the tests will end before the start of CERN's accelerators for 2000. It is hoped to reach ~100 MBytes/s for a 1 week period with CASTOR, and if this goes well we may try for a short period to go above 100 MBytes/s. CASTOR can rather easily use equipment normally in 'public' use; in this case we will 'borrow' 12 Redwood tape units and several hundred Redwood cartridges which will be rewritten as required. The HPSS test is a little more difficult to arrange, as tape and disk 'movers' need to be configured into a DCE cell, and any cartridges to be used need to be imported into HPSS. All of this takes some time. We will also try to reach ~100 Mbytes/s for a 1 week period, as with CASTOR. We will be again be using 12 borrowed Redwood drives, borrowed disk, and ~1000 borrowed and as yet unused Redwood cartridges. We are most grateful to those who have lent these resources, without which this test would not be possible.
Despite our efforts to reduce the impact of these tests on 'normal use' as far as possible, users may see longer waits for data access due to the temporary reduction in unit numbers available to them. This will be true for both the CASTOR test period and the HPSS test period.
Nevertheless, we hope that you will bear with us during these tests, and that you will not experience too much inconvenience. We feel that they are really necessary, as such sustained data rates may well show unsuspected problems which take time to solve. The knowledge gained will also help us to plan the systems that will be required when LHC starts on as realistic a foundation as possible.
Another article at a later data will describe the results of this Data Challenge. If all goes well, further tests will be carried out. It is hoped that the data rate achieved might be doubled every year, thus rapidly approaching the final target of ~1000 MBytes/s. This will definitely require more equipment than we have available now, or different equipment!
For matters related to this article please contact the author.Cnl.Editor@cern.ch