HEPiX TRIUMF Meeting Report Minutes of the Meeting held April 10th - 12th, 1996 ___________________________________________________________________________ __________________________________________________________________ 1. Introduction + Logistics The meeting took place at the TRIUMF Laboratory in Vancouver, British Columbia, Canada from 10th to 12th April, 1996. It attracted some 36 people, nearly half from Europe. 20 different HEP sites were represented, 7 of them European. Most of the overheads presented are now on the web and pointed to in the web version of these minutes below. The meeting was organised by Corrie Kost for local arrangements and by Alan Silverman for the programme. Large parts of it were broadcast over the MBONE. + TRIUMF Introduction by Prof. Alan Astbury Prof. Astbury welcomed HEPiX to TRIUMF and described the work done on the site and its relations to other HEP sites. TRIUMF, Canada's national lab for high energy physics, is run by a consortium of 4 universities. Its principle HEP tool was a 500 MEV Cyclotron for subatomic physics. The lab had a basic science programme including nuclear physics; particle physics which was mostly offsite support for experiments taking place at other labs; and a condensed matter programme. There were also substantial Life Sciences and Medical Science programmes. There were some 300 staff at the lab plus students and research assistants. There were several technology transfer collaborations with industrial partners which helped in funding some of the programme. The future goals of the lab were to maintain much of the basic science programme, to build a radioactive beam facility, and to provide Canada's collaboration with LHC "in kind". 2. Site Reports + TRIUMF by C.Kost Since the Rio HEPiX meeting, the number of VMS nodes on site had dropped and the installed UNIX power had increased as had the number of X terminals. Also, an Alpha 600 server model 5/333 had been installed. Internet connectivity had been improved along with more dial-in capability. The lab had migrated to an FDDI backbone and they will move to a switched desktop network infrastructure. Their DAT robot had suffered two major failures, one of which had destroyed very many tapes. They would now be investigating DLT devices. There had been an explosion in the PC presence and they were actively investing in integrating UNIX and Windows with the NCD Wincenter tool (see below). Trends for the future included o Integrating UNIX and Windows o C++ replacing Fortran o Home-based computing, helped by Canadian PTT rates o A plan to migrate away from VMS over several years and concentrate more on UNIX and PCs. Nevertheless, some visions were still blurred -- o Should the future be PCs, workstations or X terminals? For the moment they had chosen X terminals. o Should tapes be 8mm or DLT? DLT looked more promising. o Was C++ the future? It seemed likely. o What was the future of wide-area audio and video? o What was the file system of the future - NFS, AFS, DFS? For the near-term it seemed that AFS was the preferred choice. + Yale University Physics Dept. by R.Lauer There were 6 distinct groups in the department, largely funded by DoE Research Grants; they performed or supported experiments at various different laboratories. Although funds were pooled, this led to diverse requirements. Since the Rio meeting, the network had become based on DEChub devices and included some 65 nodes of different architectures. The main goals for their UNIX cluster remained the same however - a consistent environment, ease of maintenance and reliability. Usage of the cluster had increased greatly, thus proving the principle of the scheme adopted, as described during the Rio meeting. Future plans for the cluster included making it fault-tolerant using Digital software but this had not yet been achieved. The support people had still not decided on the preferred file system and there was a short debate with the audience present on the relative merits of AFS or DFS today; AFS was clearly preferred at the present time. Also, the speaker felt that Digital UNIX lacked ports of many of the application found on other platforms and that her users wanted but this was not echoed in the audience apart from one or two specific programs. She stated that there was a diversity in the user environment found at many HEP sites, which inevitably led to a discussion about the merits or otherwise of at least making the DESY and CERN-developed HEPiX scripts available, not necessarily the default, at all HEP sites. In closing, the speaker made a plea that HEPiX becomes committed to active advocacy on behalf of HEP sites, especially the smaller ones. As an example she felt that Digital should be encouraged to boost the level of support they provide for Digital UNIX. + CCIN2P3 by W.Wocjik The IN2P3 Computer Centre in Lyon provided computing services to all 17 IN3P3 sites across France; there were more than 3000 IP addresses, all connected by PhyNet. CCIN2P3 itself had an ATM switch interconnecting its various UNIX clusters. VM had recently been downgraded by a factor of 3 but was still used to stage tapes to the Basta farm and was still serving the STK robot. The speaker then covered the various farms and clusters: o Basta: 18 HP 735s and 11 IBM RS/6000 390s used for CPU-oriented batch; 90% used. o Anastasie: 14 HP 735s and 9 IBM 390s used for I/O oriented batch; 55% used. o Bahia: 2 HP J200s and 4 IBM 390s used for interactive work and job preparation. The 390s were actually nodes in the SP2 in the Centre. o Sioux: 2 HP 735s and 4 IBM 390s for interactive work; in particular this cluster made use of load sharing for heavy interactive loads. o Dataserv: some 550GB of disc space (including 213GB of RAID) on 6 IBM servers used for file access; the data was accessed via ATM to the other clusters. It made heavy use of AFS plus some NFS automounted discs. o Tape server: 4 IBM 370s connected via parallel and ESCON channels to robots and 2 HP 735s connected to DLTs. Tapes were accessed via XTAGE (see previous HEPiX meetings) and the RFIO package. The SP2 was used for a mixture of interactive work (Sioux and Bahia) and as a PIAF server. It was also used for remote execution of tasks requested by other clusters. Security was implemented by AFS/Kerberos. This scheme was found to have the advantages of saving licence costs on commercial software and offered better loadsharing. + BNL by T.Schlagel BNL ran a very mixed architecture site. They were gearing up for RHIC which was due to start operation in Spring 1999. There would be 4 major experiments with a total of some 840 physicists from 80 institutes in 15 countries. RHIC planned to have an offline computer centre with an 8 node SMP SGI system and a 16 node IBM SP1. For their batch farm, they would use DNQS (already described at previous HEPiX meetings, see for example HEPiX Fermi in 1994). They were also investigating using Pentium Pro systems running either Solaris or Windows NT. Of their 3 AFS cells, one was currently not active, one ran AFS 3.3A on AIX 3 servers and the third ran AFS 3,4A on Solaris 2.4 servers. There was a total of some 100GB of file space. Various tape media were planned or already present on site and an ATM test bed was being set up. BNL had developed SQIRT, an installation tool for installing public domain software. SQIRT stands for Software Query Installation and Removal Tool. They had evaluated WinDD and WABI but had finally settled for WinCenter with which they were very impressed. Plans for the future included investigating -- o Kerberos ticket services o Windows NT versus UNIX for physics o PGP public key services o AFS to DFS migration + RAL by J.Gordon The IBM cartridge robot now had 5 3590 drives and with these they were seeing more than 9 MBps tape to memory transfers, 6 MBps to disc. There were about 900 tapes in the robot today and maximum capacity would be 21TB with 10 GB per tape. Due to the lack of direct access to tape I/O, they were considering to implement CERN's RFIO package on their (non-UNIX) data servers; other alternatives included stage software or dedicating disc for certain users. Use of the CERN tape stager was likely, at least for some users. Their CSF installation, used for more than just Monte Carlo jobs, had been upgraded with the addition of 4 HP C110 nodes. It was used by several LEP and LHC collaborations as well by DESY and SLAC (BaBar) experiments. The HEPiX profiles and ASIS were installed. The RAL production AFS cell was built round a single server today with clients spread across JANET. + CERN by A.Silverman The main areas of expansion in CERN continued to be X stations on desktops and storage capacity, both disc and tape, in the CORE Batch services. There were now some 5TB of disc space and tape cartridge drives and robots of almost every type. Central data recording had become very popular and this year's run of NA48 should set new speed records. Outside of UNIX, a Windows 95 service was about to be announced and released (scheduled for May 1st) and great interest was being shown in Windows NT. CERNVM (the IBM mainframe) switch-off date was set for June 30th and a major effort was being put into user training, user environments and so on but at the date of the meeting, over 2300 users still remained firmly active on CERNVM. AFS 3.4 had finally arrived and been installed but many AFS problems resulted which unfortunately coincided with a number of server hardware incidents, one of which the vendor has so far been unable to trace. As a result of these various problems, a second server architecture had been introduced (SUN/Solaris) and some AFS client-based services had been down-graded back to AFS 3.3 while the newer version was fully debugged and understood. CERN had proposed to the HEPiX AFS Working Group that Transarc be told not to work on AFS 3.5 but concentrate on DFS instead but they seemed to be in a minority. CERN buildings were being steadily recabled with twisted pair cables and the network migrated to being routed instead of bridged. This involved a lot of work by a lot of people, including those in UNIX support since all the vendors' remote installation procedures broke when the protocols (often proprietry) they each had chosen to use during the remote installation procedure, usually for remote discless bootstraps, were refused by the routers between individual systems distributed across the site and the central installation servers. Some solutions had been found but not for every platform. Other areas of work included a new central twin-node mail server (to be covered at the Rome meeting), implementation of the Gnats problem tracking scheme (talk later this meeting) and much work around X11 and the HEPiX environment, both login scripts and X11 look and feel. + DAPNIA by P.Micout Since some time now, DAPNIA shared the central computing resources of CCIN2P3 and discussions were currently under way to discuss arrangements for the years from 1997 onwards. The SP2 used by DAPNIA had had an additional 6 nodes installed as well as 2 Magstar units in the IBM 3494 tape cartridge robot. There was also a DLT robot present and RAID arrays from IBM and Digital. AIX 4.1 and Loadleveler were now installed on the SP2. DAPNIA had now established its own AFS cell. There was a "mini" TMS service which would be reported on later in the meeting and a Pine IMAP-based service. The main building had been recabled with twisted pair cable rated at 100Mbits and an FDDI ring was planned. Lastly, there was a new print server and several new printers. + CEBAF by S.Philpott CEBAF had selected SUN/Solaris and IBM RS/6000 model 43Ps running AIX as their preferred UNIX platforms after a formal procurement procedure as mentioned at HEPiX in Rio. The choice had been made on integer performance, price/performance and support for high-end tapes as principle reasons. Other factors considered as important but not crucial included good floating point, 64 bit support and file, file system size maxima and a few other things. Their existing HP-UX systems would be maintained for now but their ULTRIX systems would be decommissioned by October 1st. The fact of introducing two new UNIX platforms gave the opportunity for a new environment, a new start. A list of requirements had been drawn up: o What file system to choose: should they wait for DFS? o What about clustering? o How to configure X? o Work group servers and/or dataless clients o Whether to put software in /usr/local/bin o How to access non-standard versions of /bin programs They had established a draft documentation on the web at URL http://www.cebaf.gov/~chambers/unixdraft.html and invited comments. + FNAL by S.Hanson Use of FNALU has increased, in both CPU and disc quotas, as VMS migration has advanced. There were now some 330 active users. The speaker showed some interesting graphs to illustrate the usage trends. More AFS server disc space was due to be added plus more CPUs on the SGI Challenges, a four-CPU Alpha 2100, a new IBM PowerPC system and a new HP system. CDF was adding more CPUs to its Challenge and D0 were now running on Challenges for both PIAF and general computing. FNAL was beginning to use arc for the delegation of system authority. Mail users were switching from VMS to UNIX and MAC for mail and tests were underway of both IMAP and POP servers for UNIX and PCs. Use of AFS was increasing steadily as was the use of licensed software packages and products. Other evaluations in progress were a dedicated WABI server and the WinCenter product to provide X access to Windows applications (see later). A study group was being setup on the potential uses of Intel CPUs in farms. Software under consideration included Linux, FreeBSD and Solaris as well as Windows. Meanwhile, FNAL intended to perform some pilot tests on DCE/DFS later this summer and work had begun on an update to the Fermi Tape Tools Project. + SLAC by R.Melen The central SLAC compute farm now consisted of 41 RS/6000 nodes for batch and 12 for interactive work. SLAC could relate many experiences with the latest 43P models. In addition, there were 4 file servers with 60GB of disc each, including 9GB Seagate drives, 4 tape servers connected to 8 STK silos with 3490E drives and a final RS/6000 server driving Exabyte and DAT devices. The software in the farm included LSF and CERN's tape staging package. They expected further expansion of the compute farm, possibly using a different architecture based on suggestions from the BaBar Collaboration. They were adding 3 new servers for NFS and news and looking at STK's Redwood tape product. The SLAC AFS cell now had 3 servers, 48GB of disc each, over 350 active users, of whom 110 used AFS for the home directories. Plans were well advanced for VM shutdown - end December 1997 - and migration had started. One large open question concerned what to do about the SPIRES HEP datebase: o discontinue it and replace it by a commercial product, perhaps based on ORACLE? o convert it to ORACLE themselves? o move to USPIRES, currently under test SLAC was under severe pressure on staff, their system administration team having been cut from 7 persons to 3 for example. PC and MAC support was staffed by only 2 FTEs. + GSI by J.Heilman GSI was a heavy RS/6000 user, still running AIX 3.2.5 as the production version. They had a 16 node SP2 and an 11 node RS/6000 cluster; there was a total of 40-50 workstations plus more than 100 X Terminals. File backup was still based on MVS but they were almost ready to switch to using RS/6000s and ADSM. Other hardware present included 2 HP nodes which ran and would continue to run HP-UX 9.01 for historical reasons plus an ULTRIX-based print server connected to 50 printers around the site. There were some 900 user accounts of whom about half were currently active and there was about 100GB of disc space, all NFS. Loadleveler was in use. Near-term plans included moving to AIX 4.1, customising CDE and the backup migration to ADSM on RS/6000 servers. 3. Experience with a Sendmail Substitute by R.Lauer/Yale The speaker had tried to investigating integrating various mail systems in a simple, easy and secure way. She started from a site where VMS acted today as a mail host, making use of the VMS port of PMDF for Internet mail. Even UNIX users used VMS to send and receive mail. She wanted + users to have only a single mail address + mail to be readable from any workstation whether VMS or UNIX + delivery to a reliable host + simple, low overhead administration; this includes no MX records + PC integration may be required later There appeared to be two options - PMDF on UNIX or sendmail. While sendmail was a single process running setuid as root, pmdf was a set of cooperating processes running setuid pmdf except for one process. Pmdf had lots of programs queuing and de-queuing messages; a demon-like control job scheduled the tasks to be done. The server was SMTP-compatible, multi-threaded and had both POP and IMAP support. In the speaker's opinion, having these multiple processes made pmdf easier to debug problems such as lost mail. Another advantage she felt that pmdf had over sendmail was that its configuration files were more modular and easier to understand than the single monolithic sendmail configuration file and the equally complicated alias file used by sendmail. For a full list of the differences, see the overheads at //osf.physics.yale.edu/www/hepix/yale_pmdf.ps. Her conclusions were that sendmail lacks some features found in pmdf and that the latter offered a better and easier solution. 4. Certification of UNIX Operating System Releases by A.Silverman/CERN The UNIX support team in CERN had decided to implement a scheme to formally certify particular releases of the various UNIX operating systems which they support. Releases would be "approved" in that both they and a list of CERN-recommended products were considered to be in a properly-working state. This should help users decide which version of an operating system to choose and when to think about updating an existing installation. Users would be given a level of expectation regarding support if they chose a particular release and finally it would also help enforce some methodology into the current way that software gets released to users. It was accepted that some users did not need or did not want to be restricted to versions the support team recommended but full support would be concentrated on those users who followed the recommendations. It was also accepted that system updates were disturbing and that users must be given some flexibility on when they chose to upgrade. Operating system releases would move through the stages of being NEW (under test), PRO (the current recommendation), OLD (the previous recommendation but still supported) and DORMANT (no longer recommended or supported). Certification of a release included building and testing both the internal tools used by the support team, for example SUE described in previous HEPiX meetings ( Saclay and Prague), plus a certain number of public domain software in ASIS (see also previous HEPiX meetings at SLAC and NIKHEF ) considered necessary for the CERN UNIX environment and of course the current release of the CERN Program Library. See URL http://wwwcn.cern.ch/hepix/www/meetings.html for an index to previous meetings. A major concern, still partly unresolved, was what to do about vendor-supplied patches. Should they be added "on the fly" or should they constitute a new release? Given the overhead of new releases on CERN staff performing the certification and on users "encouraged" to update to the latest release, it had been decided not to make a new release for patches unless they were deemed critical to system running. Of course patches which could be of use to some users would be made available for individual system managers to install locally. Taken together, the target was to make no more that 2 certified operating system releases per year per platform. 5. LINUX by Bob Bruen/MIT LINUX was described as a superset of POSIX running on PCs as well as Alpha and other chips. It had one main author but there were many people contributing their own drivers and applications. It was available with full source code. Various benchmarks and comparisons were presented, including comparisons of LINUX with both FreeBSD and Solaris running on Intel. The speaker gave lots of references and URLs for further information as well as where to obtain lots of useful software for LINUX. Of course there were already lots of books and magazine articles on the topic and various companies now offered commercial services for it. At the present time, LINUX was undergoing several evaluations such as its use in high performance systems, in firewalls, etc. Its advantages, which included its zero cost, community support, etc were compared to its perceived drawbacks - no central authority for control or support and too many versions. In the discussion that followed, members of the audience discussed the merits and dangers of having the source code easily available and the risks and possibilities offered by being able to modify the source. It was stated that support for MP systems was coming and some people voiced the opinion that a PC plus LINUX was a good competitor to a RISC system running commercial UNIX. 6. Report of the HEPiX AFS Working Group by S.Hanson/FNAL The HEPiX AFS Working Group met for the third time at the recent DECORUM (AFS Users) Meeting. A review was given of the topics treated to date (information gathering, establishing repository of useful software, information exchange, lobbying of Transarc). The group felt it had achieved its objectives although it accepted that the last of these activities has probably been less successful than had been hoped for. In return, Transarc had informed the Working Group of some of its future plans for AFS and for migration towards DFS. + All plans for HSM were based on DFS and it was hoped to release something by the end of 1996 although real products from DFS vendors would not appear until 1997 at the earliest. + Fixing AFS 3.4 problems was higher priority than DFS migration plans. + Short-term AFS 3.4 release plans. The Working Group heard a report from a representative of ESnet. It also decided to increase its lobbying of Transarc, pleading for more stability (especially), faster support for new operating system releases coming from vendors, better customer support and the continuing support of older client versions. The original Working Group chairman, Matt Wicks, had moved to a new position and had therefore resigned from the group. Rainer Toebbicke of CERN was appointed as acting chairman. The group agreed to spend the next 6 months looking at DFS migration planning and agreed to meet at the next HEPiX meeting (Rome) to decide on its future status and directions. A list of DFS tasks had been drawn up (see the speaker's overheads at http://www-oss.fnal.gov:8000/hanson/docs/HEPIX/afswg.ps. The full minutes of the Working Group can be found at URL http://www.in2p3.fr/hepix-afs/0296/. 7. Planning for Data Storage and Batch Farming at CEBAF by R. Chambers CEBAF were in a planning phase for future data storage and batch facilities. The design goals were, for the most prolific experiment, 10KB event size, 10 MBps sustained rate, 1TB per day. They wished to store one year's worth of data online as well as corresponding Monte Carlo data, some 300TB in all. They desired to automate where possible and forego the use of central operators. A central data silo was planned. As the central data silo, they had selected an STK4410 system with 2 Redwood drives and OSM as the storage software. The reasons for this choice were explained in some detail and they noted that they had benefited from the work at DESY on OSM. Networking was part of the study and Cisco routers were being installed to permit Fast Ethernet for the backbone since they had decided after tests that it was too early for ATM. However, they will watch ATM for the future. Meanwhile, to accommodate the data rates from the Hall B experiment (see above) they were planning to use fibre-connected RAID discs from SUN over a 700 metre fibre with local buffering in the Hall for interruptions. Tests of this were currently underway and they had seen 7.5MBps without OSM in use and 7.2MBps with it. Transfers from memory to Redwood drives had been measured at 11.1MBps. LSF had been selected as the batch system after an evaluation of both it and Loadleveler. Again, the reasons were explained for the choice. Estimations of required batch capacity were in the vicinity of 10K SPECint92 units already this year. Finally, the speaker drew some lessons she and her colleagues had learned from the whole planning exercise and the importance of finding common solutions. 8. Roundtable discussion on disk-space management led by John O'Neall/IN2P3 John introduced the subject by pointing out that the more users who converted to UNIX, the more disc space and the more servers would need to be managed and that more client systems implied more distributed access. There were many options including + do nothing + use UNIX quotas + use a staging system + use AFS quotas and ACLs + delegate authority to group managers or to users themselves. He showed a few slides to illustrate what happens at IN2P3, a combination of AFS with its ACLs and delegation of group management. Corrie Kost reported that TRIUMF performed minimal support in this area, considering that the cost of buying more disc space as it was required was less than the cost of space management. However, this method was not without problems - file backup, search for specific data, etc. P.Micout noted that DAPNIA used UNIX quotas in some places to control file space growth. CEBAF users had scratch areas as well as user quotas. Also CEBAF groups could buy their own group disc space. Yale today used quotas for accounting and SLAC had a number of accounting tools. FNAL used AFS quotas as did CERN which had continued the VM policy of a group administrator and project space and the batch scheme (CORE) made use of a very large stage pool. RAL reported that they used staging but had implemented a fair share staging scheme. The discussion then turned to space management: IN2P3 said that they normally overbooked file space by some 200%; CERN, RAL and SLAC offered file archiving facilities. It was noted that often users did not understand the difference between file backup and archive and how long saved files were guaranteed for. A number of labs had looked at, or were looking at, HSM techniques - for example DESY had experimented with OSM, CEBAF had chosen this (see above) and SLAC were looking at it while currently using ADSM. ADSM was also being evaluated by CERN and IN2P3 for its HSM-type facilities. After an evaluation CERN had rejected OSM because of a number of problems, including concern for its future directions. Some of these problems had also been seen by CEBAF but the latter were optimistic that a fix would be provided by the vendor. FNAL used Unitree in one project. Several labs had looked at HPSS (e.g. CEBAF) but judged it not ready yet, as well as being very expensive. 9. The Use of Wincenter from Unix workstations by Pierrick Micout/DAPNIA The French CEA organisation made heavy use of Windows 3.11, especially in administration although this was gradually migrating to Windows NT. On the other hand, scientists used almost exclusively UNIX, often via an X terminal. The speaker would like to bridge both worlds, for example to prepare overhead foils with a good tool from his UNIX system. From the many products available, see next talk, DAPNIA had chosen WinCenter and had installed a 15 user licence on a Pentium 133 MHz configuration. They had gone through several beta releases of the software and were now running a production version. Dedicated access to a number of Windows applications was available but usage remained low. Could their configuration really suppport 15 users? For this reason, tests were progressing slowly. TRIUMF were using the same product. Their configuration was larger, especially memory, and much more popular; there were already some 40 accounts., mostly accessed from X terminals. TRIUMF understood that the vendor, NCD, was adding support for NT clusters as servers. TRIUMF also reported virtually no special network load and current releases included support for Novel Netware and for Appletalk. RAL were running a WinDD service on a twin-CPU NT server and had seen up to 12 users without problems. It supported both Netware and Windows 95. 10. Comparing various methods to access Windows tool by A.Silverman/CERN As the previous speaker had explained, there existed on the market a number of tools to permit the users of UNIX workstations and, in some cases, X terminals to obtain occasional access to Windows applications. CERN had performed a comparative test of some of these and collected information on others. The oldest tool was SoftPC but it was very slow due to its purely emulation method and seemed to be rapidly falling out of favour. WABI and Softwindows were similar to each other: they ran locally on a UNIX station, trapped and translated X Windows calls into X11 calls and emulated the rest of the applications. Thus, they were CPU-limited and not much faster than SoftPC. WABI suffered also from the fact that it did not contain any Microsoft licence and was consequently limited in the number of applications it could support (less than 30 today). For individual users however, both offered possibilities. The last 3 tools considered all ran on a PC server. Tektronix WinDD was the oldest by some months but used a proprietry protocol best understood by Tektronix devices. The HP Application Server 500 was based on SCO UNIX on the server and WinCenterPro was based on Intel/NT. For any kind of general service, these all offered their own advantages and drawbacks. The summary, expressed in a table (URL http://wwwcn.cern.ch/hepix/meetings/triumf/cern-winunix-table.ps) was that for a new user wishing to access both Windows and UNIX, he or she should buy a PC and a PC X interface tool and that existing UNIX users should consider one of the tools listed depending on his or her situation, for example single user, general service, particular application needed, etc. 11. Report of the CERN Connectivity Working Group by A.Miotto This working group, set up in the framework of the migration of CERN users off CERNVM, had studied, among many other possibilities, access to UNIX systems from other UNIX systems and from X terminals, both in the presence and absence of AFS at each end and both with access from on and from off the CERN site. Not all accesses were possible but they had as a target to produce a simple recipe for users to be able to start a remote interactive session or establish an X11 session. In fact, they had found that telnet worked in all cases since there was no variable to pass (e.g. DISPLAY, magic cookie) but passwords passed over the network. The tools rsh and rlogin removed this drawback but were blocked by the CERN Internet firewall; the non-AFS client to AFS server case did not work; and even AFS clients, rlogin failed in some cases. xrsh and xlogin worked better, the only restrictions found being from a non-AFS client to an AFS server and some traffic stopped by the CERN Internet firewall. In addition, the CERN security team recommended the use of mxconns for added security; it created a virtual X server where the owner was informed of new connections for example. The ideal requirement was to pass across the network the DISPLAY variable, the X11 Magic Cookie and the AFS token but not the password, unless encrypted. For non-AFS clients, this meant distributing and installing locally the token granting scheme used by AFS. Currently, investigations were being carried out on xrsh + arc (token scheme from R.Toebbicke) and the ssh (secure shell) product. ssh encrypted all network traffic, including X11 but required extra work for AFS support. Arc used Kerberos which would need to be installed locally on a non-AFS node, non-trivial in many cases (for example, the local login and other images needed to be replaced to grant the token). The current status was that arc was good for internal use only and more work should be done on ssh. It should be modified to pass the three items listed about (DISPLAY, Magic Cookie and AFS token). Encryption was not thought important, at least inside CERN where mxconns gave enough security. Outside CERN, use of encryption was not legal in some countries without permission (e.g. France). [More information about the activities of this working group can be found on the web at URL http://wwwcn.cern.ch/umtf/meetings/net/] 12. Experience with mbone videoconferencing by R.Lauer/Yale It was generally agreed that MBONE videoconferencing coordination needs to be centralised within a site but the speaker had found by hard experience that installing the tools locally and making them work was not easy. Videoconferencing could be used with special, usually expensive, equipment in a central facility (e.g. CODEC) or on a private workstation without special add-ons (e.g. Mbone). Mbone stood for Multicast-Backbone; it was a virtual network running on Multicast routers inside the Internet. Groups of receivers could enter or exit a running conference at will. Although not much was required to start accessing such videoconferencing, there were many options which could be tuned and a high speed link was essential. Among the problems met by the speaker were the lack of good documentation and the difficulty to chase down the tools themselves, at least on the hardware platform with which she was concerned. In particular, there was no package of such tools, only individual constituents. Thus, although the tools were popular, making them work together on a given platform was not easy. Ms.Lauer gave a list of useful references she had found and a review of her experiences with the individual tools used. S.Hanson from FNAL stated that they had written a procedure to start the main video conferencing tools automatically and other labs were encouraged to start from there. The current problems included: + fears on scaling performance + Some of the tools themselves + No guaranteed network bandwidth In the discussion that followed, the point was made that Mbone should not be attempted on an X terminal as it provoked a very high X traffic and good sound support, sometimes any sound support, was generally missing on these devices (modern NCD terminals were an exception).0 13. Use of GNATS by A.Lovell/CERN Gnats is a problem report management scheme in the public domain. The speaker described its main features and the various utilities in the package. He illustrated the talk by showing actual pages from the CERN Gnats database, showing the problem report fields filled in by the user (or person submitting the report), those generated by Gnats itself and those which the maintainer or person answering the problem should complete. However, as delivered, Gnats was not considered user-friendly for the average user. CERN had taken the web interface developed at Caltech and made significant alterations for use in CERN. For example, although Gnats only supports a single flat file structure for storing problem reports, to make it usable by multiple CERN support teams a scheme of artificial domains was created, each with a standard web interface. These domains were in fact a single large domain but by creating different web entry points, we could disguise this fact and only show problems relating to a specific field to the corresponding support team or user community. Gnats had entered production last September and was currently receiving some 30 to 40 reports per week. It had amassed 37 categories of problems across its 7 domains. Studies had begun into the next version, using a client/server architecture. The new version also promised support for a database to store the problems. The speaker was urged to publicise his web interface for interested labs and sites. 14. IMAP at Fermilab by S.Hanson/FNAL The goal was to implement distributed mail access, accessibility to mail from multiple clients as opposed to delivery of mail to personal workstations which would answer the questions of local backup of mail, keeping the stations running during vacations, etc. Access to mail from offsite was also desirable. Today, FNAL recommended the use of exmh as the user mail agent. The options looked at were + Using an AFS mail spool - thought not to be scalable and involving lots of AFS callbacks; + NFS-mounted mailboxes - error-prone and known NFS locking problems; + Use of POP (Post Office Protocol) for mail - establish a common server and POP clients; download mail from server via POP and then work on mail locally ("offline"); collect all outstanding mail at one time. But the POP protocol was inflexible and provoked a lot of network traffic. + IMAP (Internet Mail Access Protocol) - currently proposed Internet standard (IMAP V4); incoming mail spool on the server with the option of storing other folders on the mail server or locally; offers three modes of working o offline, like POP o disconnected, collect selected mails on the client then disconnect and work locally on these o online, access mails on server on demand IMAP was a powerful protocol with lots of interesting features with fewer user agents than POP today, although that state promised to improve in the future. The most popular user agent was pine although this was much too simple for a sizeable fraction of experienced UNIX users. IMAP had some interoperability problems but these too were expected to improve over time as the IMAP V4 standard firmed up. There existed a so-called Cyrus IMAP project at CMU which was adding Kerberos user authentication to pine, support for ACLs, bulletin boards, Zephyr notification of new mail received and file quotas for file control. Fermilab's plan was to use IMAP to access FNAL newsgroups and mailboxes, to enable Zephyr notification of new mail, to explore Kerberised clients and MIME and HTTP integration. A major obstacle in all this was the exmh recommendation since the mh mail agent family only supported POP. The current central server used for the tests was a small SUN with a RAID 5 disc subsystem with failsafe disc mirroring. It would be certainly underpowered for an eventual production service. 15. Tape Management System (TMS) at CERN by A.Maver/CERN The speaker described the evolution of TMS from its beginning in 1989 to today including its recent conversion to UNIX. It currently handled over 800K active tape volumes and made far easier the handling and tracking of tapes and statistics. It was organised by VID, virtual tape ID, with series of one or more libraries, some of which may be archived, in different laboratories. Experiments could define their own libraries and move tapes as required but tapes could only be mounted if they were in the VAULT or robot libraries. All tape moves must be registered. Various utilities and TMS commands for use by administrators and by users were described. Tapes belonged to a group or a user but the term user was frequently chosen as a user ID specially to manage the tapes and shared by several group administrators rather than one individual. In the CERN implementation there was no protection against world read and the default was to permit all writes by group members although some users could be excluded and tapes could be "locked" by a user with write access. Access control lists similar to the UNIX protection scheme were used and group sharing was allowed. CERN extensions included defining generic names for media types and adding extra commands. The speaker closed with a summary of the advantages (better control of the tapes, regular statistics of tape use, easy to add new tapes) and its main limitations (all commands must pass through SYSREQ, volume-oriented rather than file-oriented). The "missing" queue handling from the UNIX SYSREQ implementation was replaced by having 10 TMS processes running. Today SYSREQ and TMS ran on separate nodes. There were plans in the future to consider merging the TMS keyword set as used by CERN and by IN2P3. 16. SHIFT Tape Software in Saclay by P.Micout/Saclay The speaker described how he had installed CERN's SHIFT tape software in Saclay and why he had made some different choices. For example, with an automatic cartridge loader at his disposal, he preferred to add a cartridge instead of another disc when the need arose for more space. He had an IBM 3494 device with 210 slots and 2 Magstar drives. Their choice of control software for this device had included ADSM and EPOCH but they had finally selected the SHIFT package although he said that he had sometimes found some specific SHIFT choices hard to understand. The SHIFT tape server module had compiled without error but initially its interface to TMS had caused problems and this had eventually been turned off. Their Magstar (3590) drives had not been supported when a particular SHIFT release was installed but had been supported correctly in an earlier release. They had recently added Digital DLT robots but SHIFT did not support that model (a TL810), only the later version (the TL820). The current situation was that they believed they required to install a tape management scheme and the options included - + use a TMS based on ORACLE + consider a mini-TMS based on the work done at CASPUR + acquire and install an HSM platform. 17. Roundtable discussion on Tape Interchange led by John Gordon/RAL The discussion was split into three sub-topics, an attempt to define a common interchange medium, tape formats and tape labels. + Interchange Medium Lots of DST and data tapes and cartridges were being shipped around the labs, could we define a common medium as HEPVM had succeeded in doing? [They had agreed on a 3480 tape, uncompressed, 200MB maximum file size.] Today the choice was much wider and in the discussion all the suggestions had their advantages and problems. For example, Exabytes seemed to have many errors reported by some labs but not all; DATs were slow and smallish capacity although very reliable; 3590 drives were expensive. DLTs came closest to a consensus but the frequent changes in density of the newer models was seen as a potential drawback to declaring a standard of any kind. Also, there was little long-term experience yet and doubts about whether the newer models could read tapes written on the oldest models and vice-versa. In the end therefore, there was no conclusion except that it was a moving target and it was expected that the larger labs, where most tapes were written, would often have a range of devices and users should choose carefully the device they used in relation to where those tapes would eventually be read. + Format HEP still by and large maintained the 200 MB file size limit and this was wasteful on modern high capacity tapes. However, moving to multi-GB files would give problems on file schemes of some computer architectures. Also, the effect of the chosen limit on stage space should be remembered. After a discussion, a figure of 1GB as a new recommended limit gained wide acceptance. + Labels There was little discussion about the choice - standard ANSI labels should be written on all tapes, although there was still apparently still some confusion on exactly what the standard defined. Given this unanimity, the question was why were unlabelled tapes still so common? The answer was thought to be just that people did not consider the issue and HEP needed to make a stronger point about writing properly-labelled tapes. One suggestion was to produce a standard piece of code, stored on CERN's ASIS server for example, which would write the required labels and make this generally available. In general there should be more publicity about tape labels. 18. Batch Systems + BNL's Experience with DNQS and Loadleveler by T.Schlagel/BNL BNL's batch farm consisted of 4 AIX nodes, an 8 node Convex system and 2 SUNs running Solaris, although this configuration was subject to frequent changes. For the CCD experiment they were currently studying adding Pentium Pro systems running either Solaris or NT. BNL had had some 7 years of experience with DNQS and had added local mods from time to time on top of the basic package coming from McGill University; see for example the HEPiX FNAL meeting (see URL http://dcdsv0.fnal.gov:8000/hepix/1094/talks/batch/) but the code had been more or less frozen since 1993. The advantages of DNQS were that it was stable, it was free and it was simple to use and to modify locally. Its most significant missing feature was the lack of priority control. The RHIC Collaboration used Loadleveler on their SP1 (currently 8 nodes but another 8 were due shortly). They had found it difficult to add local mods (no source) but IBM support helped. And unlike DNQS, overflow jobs could not be run elsewhere unless the other nodes were licensed (users typically "overflowed" jobs from their local clusters to the central farm). + Cluster Computing Using CONDOR by M.Livny/Uni Wisconsin CONDOR was a batch processing scheme developed some years ago at the University of Wisconsin at Madison. It was described as a resource management scheme since it controlled the scheduling of a 500 node "cluster" of UNIX workstations on the Campus and was also heavily used in a number of both university and industrial sites around the world (in the HEP environment these included NIKHEF, INFN Bologna and Dubna). It was the source, in the public domain, of the Loadleveler product and it was available via anonymous ftp. Basically it was aimed to make use of idle CPU cycles across a site which were normally never used (users' desktops overnight for example) while still giving the workstation owner instant control of his or her system on demand. Thus the submitter still saw only a single point of access but had at his or her disposal a large amount of cheap (even free) power. Obstacles to this goal were the physical distribution and distributed ownership of the resources and CONDOR implemented a scheme of access control in the sense of allowing access to a distributed system only when the owner did not require it. CONDOR included a global "matchmaker" which compared the declared resource requirements of a job against the resources available at that moment and allocated the job accordingly. The job submitted must define the needs of the job and the workstation owner declared what resources may be used on the system and under what circumstances as well as when a job must be suspended or even killed on a station and scheduled elsewhere. CONDOR ran on most UNIX architectures, it required no kernel mods and it had powerful graphical management tools. AFS support was included. The CONDOR team was then working on a project - CARMI - to combine CONDOR with PVM for parallel application support. It was noted that a "fair share" model was the base of this work but that fair shares did not imply equal shares - some users have higher priority than others. + Interactive Load Sharing Evaluation by A.Miotto/CERN The speaker had performed an evaluation of LSF and Loadleveler to test the load sharing features of each in an interactive environment. The current default was to use load balancing at login time (see the ISS talk from HEPiX Saclay - URL http://wwwcn.cern.ch/hepix/meetings/saclay94.html). Load sharing at the user level could be any of the following - o node selection on login based on available resources o starting resource-hungry jobs on another node o starting applications only available on certain architectures o cross-compiling o execute commands on all nodes of a cluster System load sharing was also useful, for example to perform cluster-wide updates or to monitor the whole cluster. Loadleveler offered only 2 relevent commands - to launch a parallel gnu make command and to discover the least-used node of a cluster. Therefore, for Loadleveler, wrappers needed to be written to perform any load sharing. LSF on the other hand offered much more, including support of mixed-architecture clusters. Commands available included login, run on a remote node, a load-sharing tcsh, parallel make and run on all nodes. There were also several monitoring commands but these were found not easy to use on large (more than 10 nodes) clusters. Drawbacks to LSF included - o dealing with setuid binaries in AFS (a local workaround was found) o the AFS port for all architectures was not complete although this is thought to be better in newer LSF releases o some jobs may execute remotely when there is no need if you use the remote shell feature (this affects AFS caching) o there is a lack of support of multiple clusters. [Since the TRIUMF meeting, the evaluation had been completed and the full report can be accessed via AFS at /afs/cern.ch/user/m/miotto/public/lsf-report.ps.] + LAL's Experience with load sharing using LSF by M.Jouvin/LAL LAL considered itself a small site for physics (7 Alpha systems, 3 HP, 150 X terminals and some VMS plus a few nodes dedicated to electronics and mechanical CAD). There was an NFS file server and an FDDI backbone. They had chosen LSF to provide a batch scheme for UNIX as well as interactive facilities, including load sharing between hosts. With LSF they could define a logical cluster of CPUs of mixed architecture and varying power. CPU selection was made according to load and available resources. Some commands could be defined to always run locally or the user could very easily call upon LSF to perform load balancing simply by selecting the correct command or even simply using the LSF-provided tc shell. LAL have further modified xrsh to use lsrun instead of rsh and have added a PVM interface. Resource sharing commands were hidden behind simple aliases. For batch, particular queues were defined for particular groups to run on their own servers plus a public queue running on the interactive nodes in background. LSF permitted to specify criteria for resource allocation, host order selection and expected resource usage. These resources could be static (such as available software) or dynamic (the load index, free memory, etc). Queues could be associated with particular hosts in a cluster; other queue parameters which could be preset included the nice value, a list of allowed users and so on. Lastly, various scheduling policies could be defined including fair shares, user quotas, pre-exemption, etc. In answer to a question, the speaker stated that he thought available memory was probably the most important criterion and that LSF's pre-emptive queuing needed to be improved. + BQS at IN2P3 and POSIX Aspects of Batch by Y.Fouilhe/CCIN2P3 IN2P3 had had some 5 years of experience of BQS the development and deployment of which has been reported at several HEPiX meetings, generally as part of the CCIN2P3 site report; see for example the Prague meeting (URL http://infodan.in2p3.fr/HEPiX/hepixcs.site.html) or Saclay (URL file://ftp.in2p3.fr/pub/CCIN2P3_doc/SR94b.ps). Much of the driving force behind BQS lay in the migration from VM to UNIX. It was originally deployed on the CCIN2P3 BASTA farm and then migrated to ANASTASIE with many ideas taken from BMON. It was now deeply ingrained in the CCIN2P3 computing infrastructure. In advance of the afternoon's discussion, the speaker questionned if the goal was a common batch scheme across HEP or a common interface. In both cases he suggested that we examine the POSIX proposal for batch schemes (1003.2d-1994, Batch Extension Amendment to UNIX Shells and Utilities) since it would provide a standard user interface and also a reference against which different batch schemes could be compared. He noted that BQS was not quite POSIX-compliant because some commands used non-POSIX syntax and/or semantics and some POSIX commands and some functions were missing. On the other hand it had some interesting extensions, for example in the area of queue administration, accounting and AFS token handling. He estimated that modifying BQS to make it POSIX-compliant would be easy in some areas (for example syntax) but difficult in others (e.g. where the BQS and POSIX architectures differed greatly). He concluded that he felt that batch schemes were in general too heavily integrated into the local Informatics infrastructure to move from one to another. Instead he proposed that HEPiX set itself a more modest goal, that of a common user and perhaps also administrator interface. He thought that a comparison of the different batch systems may not be fair and not worthwhile. + LSF for Batch? A user point of view by G.Grosdidier/Orsay On behalf of the DELPHI collaboration at CERN, the speaker stated that NQS was felt to be insufficient for their needs and the material he was about to present was inspired by Randy Melen's talk in the Prague meeting on LSF plus a lot of feedback by several experts in batch systems. He had performed an evaluation on 4 nodes of varying architecture using LSF version 2.2. Installation had thrown up 4 minor problems, all repaired. He then presented his results in tabular form (see URL http://wwwcn.cern.ch/~grodid/hepix/LSF_paper.ps). Major points included o LSF was compatible with AFS as delivered (and DFS); Loadleveler and NQS required some modifications, for example those produced at CERN. o The Digital UNIX port was missing for Loadleveler. o NQS was not useful for interactive servers. He compared the management and parameterisation features, user commands and monitoring options and in going through each, he made a number of comments and suggestions, mainly for LSF. His conclusion as a user was that he preferred LSF although he accepted that for cluster management many administration tasks already developed for NQS would need to be redone for LSF - he believed these should be provided by the producer of LSF, Platform Computing. HEPiX should lead this by agreeing and presenting to Platform a list of which improvements should be added to the current LSF. + A Discussion on Batch Systems, led by R.Melen/SLAC The original impetus behind HEPVM as an organisation had been to provide a common user interface to HEP Computing as it was then (it later expanded its role). Should HEPiX now also look to perform a similar activity in the area of batch by defining a single batch command set or should we aim more for a single scheme, possibly based on the POSIX standard as suggested by Y.Fouilhe. One disadvantage with POSIX was that it was already overtaken by some of the batch schemes under consideration which contained some interesting advanced features (the POSIX standard could be considered as being based largely on NQS). It was noted that there was no current POSIX activity on batch. HEPiX could consider building a set of POSIX-compliant wrappers for the different batch systems in use although G.Grosdidier said that we should also address the area of resource management. A more fundamental question was what problem were we trying to solve - it was stated in reply that many physicists ran jobs at multiple labs and they were unwilling to rewrite their code and/or production scripts. At least the provision of a common user interface and a common syntax would permit remote submission which was possible on each system today but in a different manner. It was noted that in general users should be asked to specify the resources they needed rather than the actual queue in which they thought they wanted to run. At this point, it was realised and accepted that there could be no single HEP-wide batch system that could be run at all HEP sites, even at all labs. The discussion then moved to the possibility of a HEPiX interface to batch. One suggestion was to build a general GUI (graphical user interface) which at each site would show the default at that site. However it was stated that anyway a non-graphical interface would be needed, perhaps an applications programming interface (API). Perhaps we could make use of PVM in this respect. The target of any GUI was to enable a user to submit his or her own jobs as opposed to a production office running the main analysis production stream. Should this GUI (or API) be composed of those commands in common across the systems (the "lowest common denominator") or a superset of all where some entries for some batch schemes would be empty? It was agreed that it would be easier to start with a subset. E.Russell from SLAC was asked to dig out the POSIX command set and publish them on the web. However, at this point the question of who could/would actually do some work on this came to an impasse. Everyone agreed it would be useful but noone admitted to having available time. It was left that an article would be posted in the HEPNET.HEPiX newsgroup asking for volunteers to help on this project, firstly to draw up the list of commands which should be in the subset. __________________________________________________________________ Alan Silverman, 19 July 1996