The French aci grid* initiative and its latest achievements using Grid'5000



Yüklə 445 b.
tarix26.10.2017
ölçüsü445 b.
#13473


The French ACI GRID* initiative and its latest achievements using Grid'5000

  • Thierry PRIOL

  • Director of the French ACI GRID

  • Thierry.Priol@inria.fr

  • Franck Cappello

  • Director ACI GRID Grid’5000

  • Franck.Cappello@inria.fr


Objectives of the ACI GRID

  • Push the national research effort on grid computing

  • Increase the visibility of French Grid research activities

  • Fund medium and long term research activities in Grid using a bottom-up approach (nothing imposed !)

  • Stimulate synergies between research groups

  • Encourage experimentations with the available grid infrastructure being deployed through national projects

  • Develop new software for experimental grid infrastructures

  • New system and programming environments for distributed computing or large data management



Organisation

  • Programme Director : Thierry Priol

    • since January 2004, M. Cosnard before
  • Scientific council : Brigitte Plateau

  • Budget: ~8 M€* (including 8 PhD grants)

    • This is incentive funding (around 98.3 M€ estimated by GridCoord)


Several kinds of projects

  • Multidisciplinary project

  • Software project

  • Young research team 

  • Collaboration 

  • International

  • Testbed



ACI GRID projects

  • Middleware, tools, environments

    • CGP2P (F. Cappello, LRI/CNRS)
    • ASP (F. Desprez, ENS Lyon/INRIA)
    • EPSN (O. Coulaud, INRIA)
    • PADOUE (A. Doucet, LIP6)
    • MEDIAGRID (C. Collet, IMAG)
    • DARTS (S. Frénot, INSA-Lyon)
    • Grid-TLSE (M. Dayde, ENSEEIHT)
    • RMI (C. Pérez, IRISA)
    • CONCERTO (Y. Maheo, VALORIA)
    • CARAML (G. Hains, LIFO)
  • Algorithms

    • TAG (S. Genaud, LSIIT)
    • ANCG (N. Emad, PRISM)
    • DOC-G (V-D. Cung, UVSQ)
  • Compiler techniques

    • Métacompil (G-A. Silbert, ENMP)
  • Networks and communication

    • RESAM (C. Pham, ENS Lyon)
    • ALTA (C. Pérez, IRISA/INRIA)


GRID ASP: Client/Server Approach for Simulation over the Grid

  • Call 1 (2001 - 2003)

  • Project coordinator: F. Desprez

    • E-mail : Frederic.Desprez@inria.fr
    • Web: http://graal.ens-lyon.fr/ASP/
  • Participants

    • ENS-Lyon, INRIA, LORIA, LIFC, IRCOM, LST, SRSMC, Physique Lyon1
  • Objectives

    • Building a portable set of tools for computational servers in a ASP (Application Service Provider) model
      • DIET (Distributed Interactive Engineering Toolbox)
    • Porting several different applications
      • physic, geology, chemistry, electronic device simulation, robotics, …
    • Focus on issues
      • resource localization (hierarchical) scheduling, performance evaluation (both static and dynamic), data persistence, data redistribution between servers
    • Clients


TLSE : Web expert site for sparse matrices based on grid infrastructure

  • Call 2 (2002 - 2004)

  • Project coordinator: Michel Daydé

    • E-mail : Michel.Dayde@enseeiht.fr
    • Web: http://www.enseeiht.fr/lima/tlse/
  • Participants

    • CERFACS, FéRIA-IRIT, LIP-ENSL, LaBRI, CEA, CNES, EADS, EDF, IFP
  • Objectives

    • Design a Web expertise site for sparse matrices
    • Dissemination of our expertise in sparse linear algebra
    • Easy access and experimentation with software and tools: only statistics are provided, not computing resources
    • Exploitation of the computing power of the grid for parametric studies
    • Contents : Sparse matrix software, Bibliography, Collections of sparse matrices


CGP2P: Global P2P Computing “Fusion of Desktop Grid and P2P systems”

  • Call 1 (2001 - 2003)

  • Coordinator: Franck Cappello,

    • email: fci@lri.fr
    • Web: www.lri.fr/~fci
  • Participants: LRI, LIFL, ID IMAG, LARIA, LAL, EADS



RMI: Programming the Grid with distributed Objects

  • Call 1 (2001 - 2003)

  • Project coordinator: C. Pérez

    • E-mail : Christian.Perez@irisa.fr
    • Web: http://www.irisa.fr/Grid-RMI/en/
  • Participants

    • IRISA, ENS-Lyon, LIFL, INRIA, EADS
  • Objectives

    • Provide a framework to combine various communication middleware and runtimes
      • For parallel programming:
        • Message based runtimes (MPI, PVM, …)
        • DSM-based runtimes (TreadMarks, …)
      • For distributed programming
        • RPC/RMI based middleware (DCE, CORBA, Java)
        • Middleware for discrete-event based simulation (HLA)
    • Get the maximum performance from the network!
      • Offer zero-copy mechanism to middleware/runtime


HydroGrid: distributed code coupling in hydrogeology, using software components

  • Call 2 (2002 - 2004)

  • Project coordinator: M. Kern

    • E-mail : Michel.Kern@inria.fr
    • Web:http://www-rocq.inria.fr/~kern/ HydroGrid/HydroGrid-en.html.
    • Participants: INRIA Rocquencourt, INRIA Rennes, IMFS Strasbourg, Geosciences Rennes
  • Objectives

    • Simulate flow and transport of pollutants in the subsurface
    • Take into account couplings between different physical phenomena
    • Couple parallel codes on a grid, software from ACI GRID RMI project
    • Links between numerical and software coupling
    • Example applications: reactive transport (top), density driven flow (bottom), fractured media


Main feedback from call1 & call2 projects

  • Lack of a large scale testbed available for experiments

    • Several small scale testbeds at the regional level
    • Duplication of effort when setting up testbeds
  • Various type of Grids

  • Need to be able to experiment various software layers

    • Incompatible with a production Grid


How to proceed…



The Grid’5000 project



Grid’5000 Objective

  • Deploy an experimental large scale computing infrastructure to allow any kind of experiments

  • Experiments of any kind of grids (Virtual Supercomputer, Desktop Grid, …)

    • Experimental conditions
    • Configuration of the entire software stack
      • from the application to the operating system


The Grid’5000 Project

  • Building a nation wide experimental platform for Large scale Grid & P2P experiments

    • 9 geographically distributed sites
    • Every site hosts a cluster (from 256 CPUs to 1K CPUs)
    • All sites are connected by RENATER (French Res. and Edu. Net.)
    • RENATER hosts probes to trace network load conditions
    • Design and develop a system/middleware environment for safely test and repeat experiments
  • Use the platform for Grid experiments in real life conditions

    • Port and test applications, develop new algorithms
    • Address critical issues of Grid system/middleware:
      • Programming, Scalability, Fault Tolerance, Scheduling
    • Address critical issues of Grid Networking
      • High performance transport protocols, Qos
    • Investigate original mechanisms


Planning



Grid’5000 foundations: Measurements and condition injection

  • Quantitative metrics :

    • Performance: Execution time, throughput, overhead, QoS (Batch, interactive, soft real time, real time).
    • Scalability:Resource occupation (CPU, memory, disc, network), Applications algorithms, Number of users, Number of resources.
    • Fault-tolerance:Tolerance to very frequent failures (volatility), tolerance to massive failures (a large fraction of the system disconnects), Fault tolerance consistency across the software stack.
  • Experimental Condition injection :

    • Background workloads: CPU, Memory, Disk, network, Traffic injection at the network edges.
    • Stress: high number of clients, servers, tasks, data transfers,
    • Perturbation: artificial faults (crash, intermittent failure, memory corruptions, Byzantine), rapid platform reduction/increase, slowdowns, etc.


Grid’5000 principle: A highly reconfigurable experimental platform



Experiment workflow



Grid’5000 map





Hardware Configuration



Grid’5000 network provided by RENATER



Grid’5000 as an Instrument

  • A high security for Grid’5000 and the Internet, despite the deep reconfiguration feature

    • Grid’5000 is confined: communications between sites are isolated from the Internet and Vice versa (level2 MPLS, Dedicated lambda).
  • A software infrastructure allowing users to access Grid’5000 from any Grid’5000 site and have simple view of the system

    • A user has a single account on Grid’5000, Grid’5000 is seen as a cluster of clusters, 9 (1 per site) unsynchronized home directories
  • A reservation/scheduling tools allowing users to select nodes and schedule experiments

    • a reservation engine + batch scheduler (1 per site) + OAR Grid (a co-reservation scheduling system)
  • A user toolkit to reconfigure the nodes software image deployment and node reconfiguration tool



OS Reconfiguration techniques Reboot OR Virtual Machines





Community: Grid’5000 users



About 230+ Experiments



About 200 Publications



A series of Events



Grid@work (Octobre 10-14 2005)



Experiment: Geophysics: Seismic Ray Tracing in 3D mesh of the Earth



Jxta DHT scalability



Fully Distributed Batch Scheduler

  • Motivation : evaluation of a fully distributed resource allocation service (batch scheduler)

  • Vigne : Unstructured network, flooding (random walk optimized for scheduling).

  • Experiment: a bag of 944 homogeneous tasks / 944 CPU

    • Synthetic sequential code (monte carlo application).
    • Measure of the mean execution time for a task (computation time depends on the resource)
    • Measure the overhead compared with an ideal execution (central coordinator)
    • Objective: 1 task per CPU.
  • Tested configuration:

  • Result :



Large Scale experiment of DIET: A GridRPC environment



Solving the Flow-Shop Scheduling Problem



TCP limits over 10Gb/s links

  • Highlighting TCP stream interaction issues in very high bandwidth links (congestion colapse) and poor bandwidth fairness

  • Grid’5000 10Gb/s connections evaluation

  • Evaluation of TCP variants over Grid’5000 10Gb/s links (BIC TCP, H-TCP, weswood…)



Grid’5000 main achievements in 2006

  • A large scale and highly reconfigurable Grid experimental platform

  • Used by Master student Ph. D., PostDoc and researchers (and results are presented in their reports, thesis, papers, etc.)

  • Grid’5000 offers in 2006:

    • 9 clusters distributed over 9 sites in France,
    • about 10 Gigabit/s (directional) of bandwidth
    • the capability for all users to reconfigure the platform [protocols/OS/Middleware/Runtime/Application]
  • Grid’5000 results in 2006:

  • Grid’5000 is opened to French Grid researchers since July 2005

    • Grid’5000 is opened to others communities in 2006 (CoreGRID)
  • Grid’5000 winter school (Philippe d’Anfray, ~January 2007)

  • Connection to other Grid experimental platforms

    • Netherlands (from October 2006), Japan (under discussion)
  • Sustainability ensured by INRIA after 2007



Concluding remarks

  • GRID in its wider definition

    • Computing, data and knowledge Grids, P2P
    • Not only focusing on the use of Supercomputers… neither on Globus…
    • An emphasis on middleware but also on applications/algorithms to make them Grid-aware
  • The French ACI GRID lead to many European initiatives

    • Several groups of the ACI GRID projects are involved in EU funded projects (almost absent in FP5, involved in 10 projects in FP6 and leader of 3 projects)
    • The idea to set up a Network of Excellence in Grid Research came from the ACI GRID (M. Cosnard)
    • On-going discussions to have a European dimension of Grid’5000 funded under the 7th Framework Programme
  • Funding of Grid research yet available

    • Through the “Agence National de la Recherche”
  • To get more information about the ACI-GRID

    • http://www-sop.inria.fr/aci/grid
    • Thierry.Priol@inria.fr


Announcement Project consultation Meeting



Yüklə 445 b.

Dostları ilə paylaş:




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©muhaz.org 2025
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin