Years of Computing in hep international Workshop on Large Scale Computing
tarix 22.01.2018 ölçüsü 470 b. #39993
International Workshop on Large Scale Computing Kolkata 8 February 2006 René Brun CERN
Hardware & Software Evolution
Punched cards
Mainframes, workstations,..
, OS, Desktops & Laptops
The 3 technology laws Moore's Law: formulated by Gordon Moore of Intel in the early 70's - the processing power of a microchip doubles every 18 months ; corollary, computers become faster and the price of a given level of computing power halves every 18 months. (well ! Not true anymore, see later) Gilder's Law: proposed by George Gilder, prolific author and prophet of the new technology age - the total bandwidth of communication systems triples every twelve months . New developments seem to confirm that bandwidth availability will continue to expand at a rate that supports Gilder's Law. Metcalfe's Law: attributed to Robert Metcalfe, originator of Ethernet and founder of 3COM: the value of a network is proportional to the square of the number of nodes ; so, as a network grows, the value of being connected to it grows exponentially, while the cost per user remains the same or even reduces. But no laws about Software (well ! Murphy’s law)
Hardware & Compilers
Multi Core CPUs
Program Size (lines of code)
Program Size (RAM)
Time to compile
Files, Classes
Languages
Fortran to C++
app.exe = (main.o) 1955 app.exe = (main.o, x.o, y.o) 1965 app.exe = (main.o, x.o, lib1.a, lib2.a) 1975 app.exe = (main.o, x.o, lib1.a, lib2.so, lib3.so) 1985 app.exe = (main.o, libs.so) + dyn libs.so 1995 app.exe = (main.o,libs.so) + plug-in manager 2005 BOOT + URLs + local caches (interp + comp) 2015 ??
Current ROOT structure & libs
Compiled + interpreted code 1980: zcedex, mini command interpreter 1985: kuip/paw, command and macro interp 1986: Tk/Tcl includes a GUI 1984: comis, Fortran77 interpreter 1994: cint, a C & C++ interpreter 1998: python, OO on top of C++, Java 2002: ruby, better than python? 200x: BOOT (inter->code generation->compiler)
Basic types and modules 1950: basic operators (trig functions part of the application) 1953: trig functions in a library 1954: fortran types (integer, real, hollerith). Subroutines communication only via arguments. 1965: subroutines communicate via a blank common, then labeled common blocks. 1975: communication via a data structure management system 1980: derived types 1988: Object-Oriented programming: classes 1995: parametrized types, templates, STL 1996: Reflexion/RTTI (Java)
Programing models Procedural sequential Parallelism (MPI) Vectorisation Shared memory Multi-threading Client-server Statefull Stateless ->web Corba Distributed parallel computing (asynchronous) Messages. Signal/slots
Problems with Fortran Abuse of common blocks. No data structures No generic machine independent I/O Systems like Hydra(1974), Zbook(1975),Bos(1977),Zebra(1983) designed to overcome these problems.
The Zebra system (1983) Zebra = Zbook + Hydra Main data structure management system used by PAW and Geant3 and also many collaborations. Powerful machine independent I/O FZ: sequential RZ: direct access (PAW ntuples) Nice Data structure documentation system, including an interactive browser DZDOC.
Zebra bank descriptor
Zebra DZDOC
Atlas DZDOC
Zebra pros/cons Programming style archaic Easy to overwrite data structures Shared global store(s) Shared global store(s) Self-describing structures Concept of multi-heap (constants, histograms, event,..) Efficient garbage collection (division wipe) Built-in efficient and machine independent I/O Used by Geant3,PAW and many experiments
Geant 1,2,3,……..4 Geant1 1974 2000 lines of Fortran 4 No physics, no geometry, only a bare framework Geant2 1975 20000 lines of Fortran 4 Some physics for multiple scattering, energy loss, decays, framework for geometry and tracking Geant3 1980,81 1994 ------2006? About 120000 lines of Fortran77 + zebra + paw Electromagnetic physics 4 hadronic packages (Tatina, Gheisha, Fluka, Calor) Powerful geometry package including graphics Hits/Digits framework I/O subsystem (zebra) for all structures including geometry. Used by many experiments. Still a reference!!!
Fluka Fluka Originally developed by safety protection group at CERN (stevenson) + aarnio + ranft) 1985 ? Reengineered by A.Ferrari &co: Rubbia project 1990 Simple geometry The reference for radiation/shielding Written in fortran77 Interfaced with VMC (TFluka) and G4 (Flugg)
Geant4 Started in 1994 Originally a flagship project for the move to C++ A huge investment in manpower About 600000 lines of C++ Validation process in Atlas, CMS and LHCb Physics processes getting better and better But still many limitations Poor interpreter (small subset callable from python) No I/O interface (geometry cannot yet be made persistent) Batch style graphics
The Virtual MC (1998)
Virtual Monte Carlo and ROOT Geometry The ROOT geometry package (TGeo) can be used in detector simulation, reconstruction, graphics, etc. TGeant3 Used in production – native GEANT3 New: TGeant3TGeo – interface to G3 using TGeo geometry No modification required in the user code Validated by Alice Same speed or faster than TGeant3 TGeant4 Used for Geant4 physics validation – G4 native geometry built after g3tog4 conversion No interface yet between G4 and ROOT geometry But Andrei Gheata actively working on it (expected this spring) TFluka Old geometry interface using G4 geometry vis FLUGG Currently a fully validated geometry interface based on TGeo Validated by the Fluka team At least 2 times slower than TGeant3 The VMC framework is currently used by Alice, Opera, Minos, NA48b,Hades, CBM and may be STAR.
PAW: a long saga First version (Jan 1985) by a committee Must use GKS GUI based on VT100 functionality No ntuples June 1985: developers “abolish” the committee Higz: GKS + X11 Row wise ntuples, then ColumnWiseNtuples (1986) Frozen in 1994, but still maintained by ROOT team
Crisis: 1992 1999
Why not F90 after F77? In 1989,90,91 assumption was F90 Some work invested in I/O with F90 (to support derived types). We could not solve this problem, because no formal way to parse the F90 module descriptors. In 1992 many forces pushing towards OO Crisis in Dec 1992 (at least in IT software group) 1/3 in favour of f90 1/3 in favour of commercial solutions 1/3 in favour of C++
1993,1994,1995 ZOO, NextPaw, Geant3.5 proposals rejected ZOO: Zebra in the OO world NextPaw: Paw evolution ->C->C++ Geant3.5: Implement geometry package in C++ Geant4 proposal (June 1994) RD45/Objectivity project (fall 1994) ROOT project starts (in NA49) (Jan 1995)
1996 ROOT chooses the CINT interpreter We had been attracted by Java (Object base class, many common ideas). Work on object persistency based on the dictionary information (introspection). LHC++ project starts (against ROOT)
1997->2000 Getting experience with OO (professional developers). Most users lost in f77->C++ First signs of problems with Objectivity in Babar FNAL RUN II chooses ROOT But C++ seen as a temporary solution waiting for efficient Java at the horizon 2003. ROOT : automatic I/O based on dictionary, automatic schema evolution.
Problems with commercial systems Licensing Deployment Vendor is late to follow with compilers & OS Difficult to request new functionality Difficult to get good people to do support and maintenance. Programmers want to develop code.
Data Analysis Software 1960: Do it yourself 1968: SUMX Histograms and data blocks described in input file. SUMX is the master. 1973: HBOOK Histogram library. User controls the event loop and the selection. 1985: PAW Interactive histograms/fitting. Ntuples 1995: ROOT Same as PAW + persistency for C++ objects. C++ interpreter 2005: PROOF and GRID Distributed analysis: client->Master->Workers (parallelism)
PAW
ROOT
Graphics & GUI evolution Plotters (eg GD3): Calcomp GKS times: screen is the memory PHIGS X11, GL: the winners From graphics attributes set in sequence to Objects With PAW: set color red Now all primitives are red With ROOT: attribute values do not depend on the order they are set => easier to write a graphics editor From Callbacks Messages->Signal&Slots Signal&Slots require an interpreter (see Qt and Root) Scriptable GUIs (a MUST)
Calcomp plotters 1955 First graphics packages (CERN GD3 1970) HPLOT 1975…… HPLOT -> HIGZ GD3 1978 HPLOT -> HIGZ US Core system (fnal) 1981 HPLOT -> HIGZ GKS 1983 HPLOT -> HIGZ PHIGS 1985 HPLOT -> HIGZ X11 1985 PAW ->VT100, GKS, 1985 PAW ->MOTIF 1991 ROOT -> X11 1995 ROOT ->Win32 1996, 2002 ROOT ->Qt 2002, 2006 ROOT -> GL 2002
Graphics and GUI systems (cont) Most graphics/GUI systems that we have used have been based on International standards or de facto standards. All these systems had a limited life time The CORE system : 5 years GKS : 10 years PHIGS : < 10 years X11 : > 20 years MOTIF : < 8 years Qt : ?? So far, no applications built directly on top of these systems were portable to the next generation. A new generation every 8, 10 years
ROOT GUI/graphics interoperability
ROOT GUI/graphics interoperability
Interpreters & dictionaries
Interpreter & Compiler integration
Possible Progress with Interpreters Eliminate the stub interface to call C/C++ functions. This is already possible in CINT with C libraries. It will be possible with C++ when a standard ABI will be available, otherwise compiler&linker dependent. If compiler is fast enough (eg C), use the interpreter only for organizing the top level. If next C++ provides introspection, one could eliminate the header files parser 95 per cent of the dictionary structure in memory A good argument to have the interpreted and compiled code being in the same language! But WHEN ???????
Object Persistency with ROOT Object Persistency with Objectivity
Object Persistency Object Persistency has been a long snake for 10 years or more. Today general agreement to exploit HEP feature of having mainly read-only files and use RDBMS systems only where concurrent write access is required. A lot of work spent in ROOT to understand and design an efficient object streaming system (object-wise and member-wise). I/O system and query system must know each other.
OODBMS (ie Objectivity) Hope: Address one single object in a petabyte data base Resolve all the object catalog issues Reality: Licensing/installation/portability problems 64 bits OID did not scale above 10 terabytes Request for 128 bits OID never implemented Locking problems when many users in read mode. Central DB mismatch with GRID No automatic schema evolution (big problem) No interactivity
OODBMS (ie Objectivity) (2) The OODBMS evangelists (and later RDBMS) passed many wrong messages Commercial data bases will save manpower Commercial data bases can be used for all type of data Performance is OK Reality: Probably more than 100 personyears invested in this exercise To be compared to a few man years for ROOT I/O Performance was not adequate (already spotted by ROOT/Objy comparisons early 1996) Physics analysis requirements were totally ignored Too much weight given to “experts in bookkeeping”
ROOT I/O principles Two main I/O solutions Unix-like file/directory structure with keyed objects OK for histograms, geometries, mag field Special Event data oriented Trees With object streaming and splitting modes Optimized for data analysis Support for network files Exploit advantages of read-only files as much as possible Interface with RDBMS when locking required
ROOT Trees
Data Analysis on the GRID(s) see Fons talk
Some observations
Experience with C++ Very powerful but complex language. Easy to make a complex system with a lot of class dependencies. Changing one class forces a recompilation of many other classes. No garbage collector. Only one heap. ABI(Application Binary Interface) is not yet standardized: a mess on Linux/gcc (C is OK) No introspection: -> develop yours. Too much coupling between data and code. Templates defined statically at compilation time, ie difficult to use in an interactive environment. Slow compilation if abuse of templates and STL
Missing features in C++ Introspection Not possible to compile a class from a dictionary Multi-heap (like Zebra divisions) Would require a garbage collector and a Handle type like in C++/CLI from MS Possibility to add one or more functions without recompiling the class, although this can be easily done in C. Dynamic creation of templated types
Introspection systems Meta information describing all types and functions. Not necessary for languages like f77 having only basic types. I/O in f77 implemented via simple switch statements. Vital for languages supporting derived types for automatic I/O, inspectors, browsers and interpreters. CINT, Java, cint/root/reflex
Why not Java or Python Java strong candidate in 1996->2000 Why experiments moved to C++?
Main software problems seen by large experiments Move to C++ completed (well nearly!) Complex experiment framework Too many dependencies Difficult to install (SCRAM, CMT) Installation time far too long Several unwanted features (eg Atlas Storegate) Coding conventions not followed A code checker is essential Non documented classes and modules
A considerable amount of time is spent in installing software (up to one day for an expert). A considerable amount of time is spent in installing software (up to one day for an expert). Porting to a new platform is non trivial. Dependency problems in case many packages must be installed. Only a small subset of the software is used. The installation may require a huge amount of disk space. Users are scared to download a new version. This is not fitting well with the GRID concept. The GRID should be used to simplify this process and not to make it more complex.
Consequences The fact that only a very small fraction of the total code base is used has important consequences. We must turn this apparent problem into a great feature. BOOT: a proposal to solve this problem.
Spare slides
Tree Friends
File types & Access in 5.06
Typical trends with Experiments frameworks A few gurus design the framework In general adequate for batch processing (simulation and reconstruction). But too complex for the majority of users. Users find simpler individual solutions. Many users work in several experiments and want to use common software. Fights between groups. New management structure put in place.
Experiment Frameworks Starting point
Experiment Frameworks End point
398742 PDF fortran=398729,ansic=13 398742 PDF fortran=398729,ansic=13 146414 PYTHIA6 fortran=140748,cpp=5413,ansic=153,pascal=100 128337 HLT cpp=127601,ansic=605,sh=100,csh=31 128103 ITS cpp=128010,sh=93 105763 MUON cpp=105673,sh=90 94548 DPMJET fortran=94267,cpp=281 72400 STEER cpp=72400 52443 HBTAN cpp=51260,fortran=1183 51489 TPC cpp=51479,sh=10 50932 PHOS cpp=50639,csh=293 46176 TRD cpp=46176 41998 ISAJET fortran=40483,cpp=1494,pascal=21 39407 RALICE cpp=29764,ansic=9355,sh=288 35916 EMCAL cpp=35410,fortran=383,csh=123 31820 ANALYSIS cpp=31820 27751 HERWIG fortran=27246,cpp=477,ansic=28 27025 FMD cpp=27021,sh=4 26667 TOF cpp=26667 24258 EVGEN cpp=24258 21588 HIJING fortran=21099,cpp=489 20562 JETAN cpp=19687,fortran=875 18344 RAW cpp=18344 15232 STRUCT cpp=15232 13142 PMD cpp=13142 12945 RICH cpp=12945 10966 FASTSIM cpp=10966 10944 MONITOR cpp=10944 10659 ZDC cpp=10659
Assumes BOOT already installed on your machine user@xxx.yyy.zzz Assumes BOOT already installed on your machine user@xxx.yyy.zzz Nothing else on the machine except the compiler (no ROOT, etc) Import a ROOT file containing histograms, Trees and other classes (usecase1.root) Browse contents of file Draw an histogram
Dostları ilə paylaş: