HEPiX CASPUR Meeting Report - Site ReportsSite Reports presented at the Meeting held October 23rd - 25th, 1996
CASPUR has a site licence for AFS and provides AFS services for several AFS cells across Italy. There are 5 data servers, some in Rome, some in Bari, with a total of 100GB of disc space and some 200 client nodes. They provide an ASIS mirror site for Italy which is used by 10 other sites and AFS is also used to distribute some site-licensed commercial software packages.
They plan to implement the SHIFT software for their STK silo, augment the data conversion tools available in the Centre and produce work group tools making use of the arc package from R.Toebbicke of CERN. CASPUR is putting in a major investment in LINUX, including some training tools using LINUX.
Their AFS cell now consists of 7 AIX servers and 350GB of disc space. There are also 4 AIX tape servers with 400GB of RAID disc spool attached. All the major clusters and servers are interlinked with ATM and a new experimental DLT service has begun to be offered on 2 HP 735 nodes; it consists of 2 model 2000 drives and 2 model 4000. Their local version of CERNLIB has been modified for specific tape access but is still RFIO-compatible.
WDSF for file backup has been replaced by ADSM and a graphical user interface for archiving (ELLIOT) has been written; it will be covered later in the proceedings. There have been improvements to BQS and PIAF has been implemented on their SP2 using IBM's parallel file access method instead of RFIO. CCIN2P3 is now using AFS mail via Pine and Netscape; they are also examining IMAP. Other CCIN2P3 topics will be covered later.
The AFS cell is also steadily expanding but AFS 3.4 has given many problems and CERN has a close collaboration with Transarc to resolve these as they occur. One result of this however has been a postponement of any real advance work on DFS.
There are many PC initiatives at CERN and although the main administration and occasional users are switching to a Windows 95 version of CERN's NICE architecture, there are also many independent Windows NT projects. A few people have expressed interest in LINUX and at least one experiment uses it seriously but there is (so far) no official support for it.
On the network side, the move to structured cabling and routed networks is progressing well and the result is a much more stable network. The CORE batch services are also expanding to take care of user demand and the first HP PA 8000 chip systems were recently successfully installed.
Other things to note include
The current major activity is the forthcoming fixed target run which will use UNIX as the online system rather than VMS as in the past. Fermilab are also testing PC farms, initially with LINUX then moving to NT. Fermilab's PC group is migrating to NT and are investigating Wincenter. For the future, working groups have been set up to discuss how to deal with the so-called RUN II cycle of experiments.
Fermilab batch is migrating from Loadleveler to LSF and this will be used by both CDF and D0 as well as on the central UNIX farm, which itself is being upgraded with 45 more SGI and 22 more RS/6000 nodes and both CDF and D0 have expanded their batch capacity in disc and/or CPU power. PIAF has been installed on SGI Challenges. On the interactive side, FNALU will receive more AFS space. The Sloan Digital Sky Survey, supported by Fermilab, has added 2 Alpha 8200 systems.
They have a mail server based on IMAP (see later) and they are setting up a Wincenter service following the presentations at HEPiX at TRIUMF.
For public interactive services, there are two multi-node SUN servers with the home directories spread across several file servers. NIKHEF is participating in the Internet 96 World Exposition.
Central services have been established for - a mail service based on IMAP but with some POP support; a central registry service based on DCE (DESY's first steps towards a DCE/DFS cell), see later talk; and a rudimentary equipment database for all workstations at DESY.
At the high performance end, the 10 SGI Challenges installed have a total of 186 CPUs of which 32 are R10000s which therefore require IRIX 6. There are 2TB of disc space and the OSM (Lachman) mass storage scheme is used; access to this is possible from all DESY platforms. A new Grau-Abba tape robot is on order for the Hamburg campus with 2100 cartridge slots and 6 Ampex D2 drives. A smaller similar robot is installed at Zeuthen. RFIO is being adapted to trap calls for remote disc access and transfer this to an OSM call.
DESY is starting seriously to investigate PC farms. The Hermes experiment has 10 Pentium Pro systems running LINUX which has been in production for just a few weeks. On the other hand, ZEUS has just ordered 20 similar PCs but will run Windows NT.
The current problem areas at DESY include --
Current work includes investigating the use of OSM as a data server for medium-sized files and making improvements into the mass storage system.
At RAL itself, the CSF farm is now 30 nodes including 4 new HP C Series systems which may later be upgraded with PA8000 CPUs. The service delivers over 5000 CPU hours per week and is not only used for Monte Carlo work but increasingly for physics analysis. It is much accessed from across the UK via SuperJANET.
On the Digital side, there is a 6 node Alphaserver 8400 with 2GB of memory running Digital-UNIX and there is still a VMS-based mainframe which is now becoming expensive to maintain so discusions have started with a target to stop this in October 1997.
RAL are planning a trial NT farm service and this will be discussed at the end of this meeting. Much work has been put into tape service reliability and the efficiency of data throughput. An extra 100 GB of disc space has been added to permit holding more data online and RAL is experimenting with SHIFT software for staging. They are also evaluating various options for file migration and there is the possibility of an update to the ATLAS Datastore (see previous HEPiX, for example Prague).
A new challenge at RAL may be a plan by the H1 experiment from DESY to install an analysis farm at RAL which would have high output rates as well as the more traditional high input only.
There is an IBM 3494 cartridge robot with 6 drives and 12TB capacity. ADSM is in use for backup along with an in-house archive/retrieval tool which has been ported to many platforms. They are in the process of migrating their file base (5TB) from MVS to AIX.
They have defined a recommended UNIX desktop environment consisting of fvwm, gnu utilities and Z-mail; they may move to CDE when they upgrade to AIX 4. There are plans for an NT service and they find their Wincenter service rather popular; it runs on 2 dedicated PC servers with a total of 50 client licences and makes use of Samba to access UNIX files.
They have PIAF running on their SP2 but it uses NFS for disc access and gives poor performance so they are looking at the parallel file system on the SP2 to speed this up. Other plans include a site-wide migration to AIX 4.1 or 4.2; upgrading their login service; and acquiring more nodes for their SP2.
First of all, we're now The Thomas Jefferson National Accelerator Facility, or Jefferson Lab for short, and soon jlab.org on the Internet. This is our first year of production physics research after completing our construction phase, thus the name change to denote our status change. Of our three experimental halls, we have achieved beam to two (Halls A and C), with the third (Hall B) expected near the first of December.
We are getting ready to use our new Internet domain -- what a project! We've found that our community is concerned about ".org" status, the Department of Energy site office wonders about moving out of the ".gov" domain, and we're scurrying in the Computer Center to make it all work. (Right now jlab.org is simply an alias to cebaf.gov.)
We have implemented our Common UNIX Environment (CUE) across the UNIX platforms. We are implementing it for every single JLAB system managed by the Computer Center -- to the desktop, central computing JLAB systems, IFARM interactive compute farm, FARM batch compute systems, DAQ data acquisition machines, and SWC specialized workgroup clusters (minimum number of special purpose systems). This gives users an identical environment no matter which platform or class of machines they're on. We are still considering providing elements of the HEPiX environment.
We've purchased a Network Appliance Model 540 NFS fileserver to provide initial file storage and will probably go ahead with the purchase of a second while we investigate high-end RAID solutions. It looks like fibre channel arbitrated loop is 6 months away, so we're planning to delay a little to find what's available in the market then. Our testbed DCE cell is up and running, with cross cell communication between two offsite DCE cells. This is promising as we look to providing wide-area DFS network access to our data files.
In the batch farming arena, we are awaiting the approval of our Jefferson Lab Off-Line Batch System (JOBS) requirements document. The initial implementation of JOBS is scheduled for 3/97. (To refresh your memory, our system uses LSF to manage the farm, and OSM to manage the data.)
We are well into the process of implementing a Fast Ethernet network for both our general backbone LAN as well as for the Experimental Physics segment. We are still planning to dual-attach fibre channel RAID to move the Hall B data directly to our StorageTek silo -- remember it's approximately 10 times the amount of data from Halls A or C. ATM is still under investigation; no production use.
To support this complex new physics environment, we're searching for a Helpdesk Administrator.
The interface is by means of library calls, shell commands or via a MOTIF interface; the last of these is used exclusively by the operators because it has many features designed specifically for them, especially in the area of problem handling.
A system administrator can define access rights to particular drives by group; a user can select criteria to select drives, to queue a request or not, read-only or read-write access, etc. The speaker showed a number of examples of OCS commands. followed by the structure of the various programs and the command message flows.
Plans for the next version, version 3, include linking it to FTT (see later talk) underneath OCS and beyond that to add robot support and to integrate the currently-separate OCS databases.
Nodes must be registered to use FMSS and it is most-commonly found on central systems. Use can also be restricted to members of a given user table.
The speaker showed a list of the various FMSS calls such as copy, remove, query, etc. The authors had maintained traditional UNIX formats for the commands where possible.
The product was not at that time yet in production status, people were still migrating off the STK robot and Unitree scheme. It was hoped to complete beta testing and begin production use within 6 months.
Future versions should permit direct import/export to tape. More staging options would be added.
The objective has been defined as - no more manual tape mounts outside CERN working hours by the end of 1997 and a fully automatic scheme by the time LHC begins. Users should be encouraged to make more use of robotics and in the meantime much effort goes into tuning the tape library such that the busiest tapes are placed in the robots. There is also close cooperation with robotics vendors to improve their hardware and software.
As a result of all these actions, the automatic mount rate does show signs of increasing at the expense of manual mounts and this has permitted the desired reduction of overnight operational support, thus promoting savings for investment in the next stage of automation.
Renshall presented some details of the units under consideration, including performance figures. Tests performed at CERN in general supported the vendors' claims. All tapes gave satisfactory results and differences were in the area of purchase and/or operational costs. For the acquisition exercise currently underway, specific criteria were established such as mounts per hour, the need to demonstrate working sites with the equipment to be offered, the cost of ownership over several years, and so on.
At the time of the meeting, a Call for Tender had been issued and it was hoped to take the result for adjudication to the December Finance Committee meeting and, if confirmed, to have the equipment installed for the 1997 LEP/SPS run.
Zarah is implemented via a fake NFS demon using RPC between clients and the servers. On the server side, there is a stub file system which is identical in structure to the real file system on the mass storage and a user's file tree structure is thus mapped.
The speaker explained in detail the mechanism of staging a file. He stated that NFS was very slow and they used instead the RFIO protocol which offered other advantages over NFS as well as better speeds. They used a local implementation of RFIO and he explained how this affected and improved access to files. Speeds achieved had reached 40MB/second from an SGI server via HiPPI.
FTT provides the basis for other data access packages such as OCS (see earlier), where FTT gives the basic tape access routines. It contains modules of library calls, a setuid binary and some test code. Use of the package provides system- and device-independence, statistics gathering and consistent error returns. It is easy to port to new platforms but there is a need to simplify more the procedures for building the required tables of tape characteristics and operating system features and to automate the necessary checking procedures. Another open point is a mechanism to determine which drives are available on a system, made more difficult since each architecture has its own commands. Work is also needed to improve the Porting Guide and it is then planned to include FTT in the Fermi Tools package for public distribution.
DLT 2000 units are still very popular, can this be the default? Or should we wait for DLT 4000 to become more widespread? Are the various DLT models (7000, 4000, 2000) at least downwards compatible in both read (claimed) and write (less certain)?
H.Renshall stated the CERN position - they are user-driven and try to guarantee support for (almost) any reasonable media requested by CERN users. However, he agreed with the suggestion that if there was a HEP-wide recommendation, it might help to reduce diversity.
It was reported by one delegate that Quantum had stopped manufacture of DLT 2000 models.
After a long discussion, it was agreed to declare a recommended default for any site who sought advice while of course allowing any site who wished to support additional types. On this basis the default should be DLT 2000 media format although any new drive should be DLT 4000 or DLT 7000 (by this time the meeting was convinced that both these units could write DLT 2000 formats). This recommendation would not only provide the lowest common denominator for date interchange between the labs but also offer small sites the possibility to use the same drive (e.g. the DLT 4000) with sufficient local capacity for file backup.
The cluster is used for ALEPH Monte Carlo and data analysis work. The standard ALEPH software modules are mirrored nightly from CERN and the results of the jobs on DANTE can be stored locally or written back via the network to SHIFT at CERN. One node of the cluster is used for interactive work and the others for NQS batch queues, the CERN flavour of NQS with the gettoken scheme for AFS tokens. The HEPiX login and X scripts are in use and there is an ASIS mirror.
Among the problems seen have been occasional Ethernet congestion and a network upgrade is planned using HP's AnyLAN plus a major problem with NFS for which a patch was found. In the discussion, it was noted from the audience that NFS under the latest HP-UX releases showed significant performance improvements.
Parallelism at CERN began with the use of the late Seymour Cray's CDC 6600 and its various functional units. From there we moved to vector processors, to RISC computers, MPP systems, the GP-MIMD2 (a European Commission-funded project) and currently to the Meiko CS-2. This also is EU-funded with 3 industrial partners and 3 user sites including CERN. Meiko also sold two large systems to Lawrence Livermore Laboratory, which had the unfortunate side-effect of diverting vital support staff personal away from our project although this sale did assist in the debugging of the overall system. Recently, Quadrics has taken over Meiko, and its best experts, and more developments of the architecture are planned. So far, Quadrics have given excellent support and Eric also acknowedged here the support and help of Bernd Panzer of CERN in the work around the CS-2.
The CS-2 is a SPARC-based machine with a proprietary internal network rated up to 50 MBps. It has a vector option but this is very expensive and only really useful with long vectors (not our situation). The initial SPARC chips used suffered poor floating point performance and recently Ross 100 MHz HyperSPARC chips have been installed. Memory is physically distributed but logically shared among the 64 nodes of 2 processors each. Each node also has a disc for local swapping. For data space, there is some 100GB of internal disc and 400GB externally. The Meiko parallel file system uses striping but the user sees only improved performance as more nodes are brought into use to access the discs.
The systems are connected to the CERN network by FDDI but HiPPI will be tried with NA48. The operating system is Solaris 2.3 with extensions from Meiko, including an attractive system management tool permitting control of remote nodes. As set up, four nodes are designated control nodes for the whole system. Users' home directories are local, NFS-exported to all nodes in the system. Eight nodes are dedicated to PIAF, 5 for interactive logins and other nodes are used for batch or parallel tasks. AFS access is via the NFS exporter.
There are three programming models available:
Eric gave some examples of applications in operation
After a slow start, the so-called User Migration Task Force (UMTF) was re-organised in October 1994 after the HEPiX meeting in Saclay/DAPNIA to include more input from other labs, notably DESY. The result was CUTE (Common UNIX and X Terminal Environment, see for example HEPiX Rio at URL http://wwwcn.cern.ch/hepix/meetings/rio/Rio_cute.ps). The first incarnation of CUTE was the interactive service opened on CERNSP, CERN's SP2.
Most major batch work on CERNVM was stopped at the end of 1995 and transferred to CORE (see for example HEPiX Fermilab at URL http://dcdsv0.fnal.gov:8000/hepix/1094/talks/batch/shift.ps). At that time, the CERNVM configuration was halved in size. That left an interactive user base of some 11,000 accounts.
The planned stop of CERNVM was originally end-96 but this was advanced by 6 months to end-June '96. Cleanup of the unused interactive accounts started early in 1996 and this removed some one-quarter of the remaining accounts. Gradually the CERNVM load was reduced until it was formally closed at the end of June. Migration of the VMarchive database (300 GB, of which some 105 GB was "active"; this was compressed to 35 GB) and users' minidisks (50 GB) to the RS/6000-based ADSM service was completed by the end of August. Tools were written to permit users to retrieve their own files from these archives but such occurrances have proved very rare indeed. Also by this time the client side of the ADSM backup service had been changed on user nodes to point to the RS/6000 version of this service.
To cope with the expected problems, staff on duty in the Computer User Consultancy Office (UCO) was doubled throughout June and July and there was a great deal of individual "hand-holding" of users to help with their migration. During the last month of use, some 2500 users were mailed individually to encourage them to migrate either to UNIX/CUTE or to PC/NICE.
After the official shutdown at midnight on June 30th, the system was run throughout July with no users except for 2 particular applications. After this, and once all the archive and minidisc files had been archived on ADSM, the system was finally shutdown, except for a brief restart in mid-August to retrieve via ADSM a disc for a UNIX client.
Harry drew only a few lessons from the process, the main one being the amount of work needed to help users migrate and feel happy with their new environment. He was concerned that this might not always be possible in the future as support services were gradually scaled down. Also he stated that users often felt happier with fewer choices being presented to them, especially in the UNIX environment where there are always several ways of tackling any problem.
Parallel processing is typically effected by CPS, a locally-produced socket-based scheme which distributes events to worker nodes. The base CPS package has been extended to provide a batch queuing scheme.
Problems with the above arrangement include :-
For the future there is known to be a need for much more capacity, both data and CPU, better inter-node communication links and more memory for physics analysis jobs. To this end, Fermilab issued a tender to the suppliers of the 4 platforms already present on site and as a result they are currently purchasing a further 45 SGI Challenge S nodes, 22 IBM 45P nodes, 2 more I/O nodes, one each from SGI and IBM, plus more disc and tape capacity.
On the new farm, the configuration will be modified to use an Ethernet switch to permit a much more flexible allocation of worker nodes to experiments, including node sharing where appropriate. The new farm will have a single file system, possibly AFS, and this should eliminate the need for bulk file or data moves. All the equipment had arrived by the time of the meeting and the target was to have the new farm in production by January 1997.
Speculating on possible futures, Lisa mentioned the suggestion of PC/Linux farms, multi-CPU systems, perhaps a Windows-NT farm. For the moment however, Fermilab intends to stay with Exabyte drives, largely for compatibility with the running experiments.
A script was written to be run locally on target systems to gather information about their configuration; this information is then posted on a web page for each system. Another useful tool was found to be a Suggestions Document where one can easily find a list of Do's and Don't's.
So far, there has been a good level of acceptance by the user community although concerns have been expressed about too-frequent SGI patch releases. Another currently-open question is system level backup: when should it happen, how often, and what to do if there is no local tape drive.
Among the plans of the support group is the idea of creating a more automated method of installing the operating system on client nodes.
The systems were installed with the appropriate certified version of the operating system (see HEPiX Vancouver , URL http://wwwcn.cern.ch/hepix/meetings/triumf/cern-certify.ps), then SUE tailoring (see for example HEPiX Prague, URL http://wwwcn.cern.ch/hepix/meetings/prague/CERN_sue_update.ps), then the WGS/PLUS customisation scripts and then tailored for each cluster according to use. Of course, AFS was vital for all of this. Several tools had been developed for performance monitoring, to identify trends, predict problems and so on. Users had requested some remote monitoring which would then include any networking effects. This will be added shortly.
For the future ...
Other topics under more or less active consideration are CDE, DFS and NT.
On the negative side of AFS, its performance was found to be poor for large files and this has led to the installation of 2 times 60 GB of local, non-AFS, non-backed-up disc space on the cluster nodes, dedicated to certain applications. Also, AFS proves to be less than optimal on multi-CPU systems, for example 20 node machines, giving system cache problems.
There is now a move to merge CLUBS (the heavily I/O batch-oriented cluster) with FNALU in order to reduce the central support load and to provide a unique user interface. One implication would be a migration from Loadleveler to LSF. This would unify the user interface to services and offer a single queuing scheme, attractive to both users and system administrators
Claudia explained the methodology she used: accounting data produces a huge amount of information and this needs to be compressed by splitting it into groups and categories. Users are grouped into "clusters", typifying their use of the systems and different combinations of these clusters are used to produce different benchmarks. By varying the resources used by a member of each cluster type and the total number of users, a Monte Carlo method method can be used to generate system loading and hence statistics. She presented in graphical form some of the results.
So far, she had modelled the ATLAS WGS load and she will now try this for other services, refining the command definitions and comparing different platforms.
SIOUX has 2 HP 735s and 4 RS/6000s, again from the SP2 configuration, and is intended for users with heavier CPU tasks. In fact BAHIA uses nodes of SIOUX for large CPU jobs. Like BAHIA it uses load sharing but based on a simple round-robin sharing algorithm where the user gives a generic name (BAHIA or SIOUX) and the name server allocates a particular node name to the connection request.
PIAF has been ported to run on the SP2 using IBM's parallel file system for data access. They needed to develop a group quota scheme to use this parallel file system. It is in use by 6 user groups. The conclusion so far is that the PIAF/SP2 arrangement is well suited to data mining tasks and that the parallel file system offers an interesting speed-up in a completely transparent way.
All IP traffic is monitored and this helps not only to produce statistics but also provides a means of tracking back when security incidents occur. Since the CRACK service started there has been a dramatic fall in the number of weak passwords (passwords failing the tests have fallen from 25% to 4%). Nevertheless, there is about one major security incident on site per month on average and it is estimated to cost some 4 person-months per year to clean up and repair the effects. Plus there is a small (3 man) team working on security of whom one is full time on security.
From the above it was clear that things are indeed changing for the better vis-a-vis security with much more awareness now of the risks and possibilities, at least in the larger labs. AFS in particular makes security that much more important. For example how to protect sensitive files normally stored in the home directory, should they be partially open to HEP sites or completely protected.
During the discussion it was proposed that each site appoint a named security contact whose name could be included on a mailing list. The question was raised if HEPiX should or even could define a recommendation for a default security policy and if it did how that could be implemented. Could one achieve a secure HEP collaboration over the Internet?
Other subjects raised included
Products "appear" to clients as if they sit in /usr/local/... although in fact many are actually links and there is a regular update procedure (ASISwsm) which normally runs nightly on client nodes to update these links. This procedure is tailorable by a local system administrator. Another utility, ASISsetup, declares which version of which products should be used on a particular node, similar in function to FNAL's UPS/UPD procedures. However, enabling access to multiple versions at a time requires extra effort from the maintainer and this is not possible for all packages.
Product maintainers have a certain number of tools at their disposal: to build a new version across all supported architectures with one command; for testing; and for public release of new versions. Products passed through various states - under test, under certification, release - and Philippe illustrated the flow of modules and the directory hierarchy used.
To offer ASIS at remote sites, a replication scheme has been developed which was usually run daily. It checked for the presence on the CERN master repository of new products or new versions of products, thus minimising file transfers. This extended to checking when the change was only a state change (for example, only moving from certified to release state) in which case the files on the remote site do not required to be updated, only their state and/or directory placement. The ASIS local copy manager tool was currently being updated to take account of the latest ASIS structure changes. At the time of the meeting some 6 remote sites were mirroring ASIS locally.
Product state changes and releases are defined as "transactions" such that if a transaction does not complete fully (removal of an old version for example, followed by release of a new one), then some recovery is attempted and warning messages issued.
The chosen solution was to add a client part "above" ADSM using ORACLE to store the required extra meta-data about the files and to build a MOTIF interface to this. XDesigner was used to build this interface. The client code on each node communicates to an ELLIOT server demon which itself communicates to ADSM and ORACLE. The ELLIOT server must reside on a node with access to the files it is to archive, normally the file server itself.
ELLIOT offers powerful query facilities and if only meta-data is requested then only ORACLE queries are issued, ADSM is only called if the file itself must be accessed. Further, the query calls are contained in a library such that eventually another database could be substituted, even another backup product to replace ADSM.
The speaker showed examples of both the graphical and line mode interface to ELLIOT. Man pages and online help from both the GUI and line mode versions already exist. The program was then in beta-test. In answer to a question, Herve stated that the files could be fully accessible even without the ORACLE meta-data since sufficient archive data was stored in ADSM as well.
The normal registry administrators and the DESY User Consultancy Office can register and modify users' information and group administrators can perform certain maintenance tasks, for example modify a password or space quota. The user can modify his/her own password, choice of shell, mail address and default printer.
The speaker described the 2 databases in some detail and how the information flows when a registry entry is made or altered. He also showed the graphical interface.
This is seen as a first step in the migration to DCE when the plan is currently seen as every user having a DCE account from which accounts would be generated on particular clusters or services.
X.400 was considered and rejected - largely because the address format has no fit with the sendmail protocol, because X.400 is difficult to manage and because there are too few clients.
POP was rejected, despite its nice user agents, because
LAL finally selected IMAP 4 (based on RFC 1730) as the basic mail protocol. Messages are stored on a server and accessed by clients. The so-called IMAP consortium drives the standard and its extension ICAP (the Internet Calendar Access Protocol). For IMAP, more and more clients and servers are becoming available, both public domain and commercial. Among the promised servers are Netscape mail server, Microsoft Exchange (both due end '96) and SUN Solstice.
IMAP supports MIME, it has an efficient header-only search scheme and it is optimised for low speed links. Many mailbox management schemes are possible including ACLs and quotas. The protocol (ICMP) will evolve to ACAP (Application Configuration Access Protocol), which is expected to be ready at the end of 1996. This will help with the configuration of address books and should permit common and similar access for address books as for mailboxes. ACAP should also extend IMAP to multiple mail servers with load sharing and replication.
The CMU/Cyrus project suite (also known as the Andrew II Project) is effectively IMAP+IMSP. LAL runs this on a pair of high-availability Alphaservers with automatic failover. The Cyrus suite supports Kerberos login authentication and has a TCL-based management tool.
As clients they offer one or more on every platform, even on VMS, PC and MAC although on PC the choice is commercial (Netscape 4). At that time, only one tool was available on all platforms, Simeon (from a small Canadian company). This happens to be the only one today supporting ACAP although Pine is expected to do so shortly and others later.
The talk closed with a review of the features of Pine and Simeon and a list of URL references.
X11 is resource-hungry, both in CPU, memory and network traffic, especially if it is badly configured or contains "unfriendly" X11 application (Xeyes is a trivial example, or a pretty pattern screen saver if not run on the X server itself). Lastly, X11 has several inherent security concerns.
From all of this we can say that X11 needs
This was the driving force for the development of the HEPiX X11 scripts. The project started as a joint effort by DESY, CERN and RAL with most of the actual development being done at CERN. They are now widely deployed at CERN and gradually elsewhere (e.g DESY, SLAC).
For the future, Lionel and his X11 team are looking at
He posed the question - should these PCs run NT, or Linux or Solaris - and does it matter? HEP needs good Fortran, C and C++ and these are all already there with more coming. For NT there are several options for NFS access (including Samba), AFS and DFS are announced, there are several NQS ports and a beta of LSF. RAL is porting their tape software (VTP) and Sysreq to NT. And remote access to NT is possible with telnet, WinDD, Wincenter, Ntrigue, etc.
WOMBAT was a trial NT farm for HEP. It consists of 5 Pentium Pro systems with a local file server, connection to the ATLAS file store and Ntrigue for remote access. It was used to demonstrate several ports and provide a platform for experiments to try, including benchmarking. It offered an alternative to UNIX for groups now on VMS which was being rundown. It was hoped to have the first results by the end of 1996.
Areas which John felt need to be addressed include
RAL would be interested in involving other labs, perhaps a HEP-wide effort? They believed that portability was more important than uniformity; and sharing experiences, and mistakes, was very important.
CERN had an approved project (RD47) evaluating the use of PC/NT. The LHC experiments were interested in NT and there were a certain number of individual initiatives happening around the lab.
DESY has established an NT Interest Group including representatives of the Computer Centre, major user groups, and Zeuthen. The largest effort was that of the ZEUS experiment which plans to base itself on NT. They have ordered 20 Pentium Pro systems. On the other hand, there are no current plans for an NT-based desktop.
CASPUR is heavily into PCs but rather under Linux as has been reported at this conference and elsewhere. Windows is only used for secretarial work. CASPUR will investigate the NT/AFS client (as will CERN).
FNAL, in the progress of migrating away from VMS, is recommending administration workers to consider PC and MAC rather than UNIX which is the recommended platform for physicists. Some NT servers are beginning to appear on the site.
NIKHEF have bought some PCs for evaluation. LAL have one NT server, dedicated to a Wincenter service and GSI have several of these. Prague Uni have one PC for work on Objectivity.
The session closed with a discussion on how to keep up-to-date with this area and the suggestions included