Mecagrid progress Report



Yüklə 42,34 Kb.
tarix05.01.2018
ölçüsü42,34 Kb.
#37093

MECAGRID Progress Report

Period: 01/02/2003-01/07/2003

Stephen Wornom




Summary

This report covers the four-month period of the MECAGRID Project beginning February 1, 2003 and ending June 1, 2003. The project is moving in the right direction but at a slower pace than the author envisioned. In order to determine what steps are needed to advance the project more rapidly, the work to date is separated into different phases and each phase ie reviewed separately.




1-Introduction


The development of « Grid Computing » began in the mid 1990’s. The idea of Grid Computing is to make available different computing resources, data resources, and other computer-related expertises on a computational Grid in order to realize scientific projects that would otherwise be impossible or not practical at a single computing site. One could envision, for example, thousands of scientists in many different locations in the world pooling their resources to analysis the data of a major experiment at the European high energy physics laboratory in CERN, Swizerland (The European Data Grid or EDG). Another senario would involve the coupling of codes developed at different sites each with specific expertises. For example, the coupling of Aero3d two-phase fluid dynamics code developed at INRIA-Sophia with a structure code developed at another institution (The MECAGRID).
Efficient Grid Computing is a difficult challenge because the technology to make the idea possible was not initially available. Progress continues as technology improves and more developers work on the project. The wide-spread acceptance of Grid computing was evident at the Globus World 2003 Conference which was attended by more than 500 Grid engineers representing 25 countries ; a two-fold increase from the same conference held in 2002.
The purpose of the MECAGRID is to provide the Provence-Alpes-Côte d'Azur (PACA) region with a means to make large parallel computations in fluid mechanics using hundreds and eventually thousands of processors. This possibility can be realized by creating a large computational Grid comprised of clusters at different sites in the PACA region. The initial phase of the MECAGRID brings together clusters from INRIA-Sophia, INRIA-Rhone-Alpes, CEMEF, and the IUSTI. Each partner has their own different scenarios as to how they would use the capability provided by the MECAGRID. The overall goal is that each partner of the MECAGRID would have a means of computation far superior to that which exists at any individual partner site. Eventually the MECAGRID will be expanded to include other clusters so that in the near future thousands of processors would be available for very large computations that today are, in general, impossible. In the forseeable future, MECAGRID will become a member of larger GRIDS, like the EDG, that contain partners like MECAGRID and the world of computing will attain heights that one can only dream of today.
Even though Grid Computing has been under development for almost a decade, the challenges involved in creating « efficient » computational Grids are many and today near plug-and-play software exists only for special types of Grid Computing like Desktop Computing, and Entreprise Grids which are simplified by being located at a single site with local networks and local administrators. Plug-and-play software does not yet exist for what are known as Virtual Organizations (VO) Grids like the MECAGRID. VOs are institutions or groups that use totally independent entities under different controlling bodies. However, Grid software like the Globus software is sufficiently developed to permit the creation of VO Grids over a 3-4 month period under proper conditions. The proper conditions are on the human side rather than the software side and can be the most difficult to resolve even among partners who agree to become part of the same Grid as they require changes at each VO site. These changes are primarily root installations of the software and public IP addresses.
VO Grids are difficult for many reasons. First, the computing resources are located at different institutions usually at different geopraphical locations. Each institution is independent of each other institution, each has its own priorities, own system administrators, batch systems, queue priorities, security procedures to protect its computing machines, different hardware its own user constituency. Each institution operates independently of the other institutions. Changes at any particular VO site are made in a timely manner as the changes are under control of a local system administrator who directs the effort and fixes the delays, …etc. The local system administrator leads his local project. The difficulty arises when the VOs becomes part of a Grid where local VO decisions need to be coordinated at the Grid level in the same delay. Immediately, one encounters the situation that local VO policies may be in conflict with Grid policies.
Secondly, and the most difficult obstacle to creating a Grid, is the human aspect. The MECAGRID project is a volentary collaboration between partners and progress depends on the level of motivation of each partner. The most important person in creating a Grid are the system administrators commonly referred to as root administrators. It is the root administrators who control access to exterior sites and who determines what software is to be installed on their clusters and sets the time frame. Five different MECAGRID partners means five different, independent system administrators. Creating a Grid requires the acceptance of changes by different system administrators at each VO site in a common time frame : this is an extremely difficult task !

2-Discussion


We now examine the different phases of MECAGRID to date. One can identify four distinst phases.
Phase I : Organizational phase (2002-2003)

Phase II : Initial active phase (01/02/2003 – 01/04/2003)

Phase III : Analysis phase (01/04/2003 – 01/06/2003 )

Phase IV : Start of realization phase (July 2003)


Phase I- Organizational phase (2002-2003)

1-Michel Cosnard selected as the director for the MECAGRID

2-Herve Guillard selected as the person responsible for MEGAGRID

3-Search for full-time engineer for project

2-Search for MECAGRID partners


  • INRIA Sophia Antipolis

  • INRIA Rhone Alpes

  • CEMEF

  • IUSTI

3-Presentation of MECAGRID

  • Herve Guillard, 18/07/2002

  • Official Launch of MECAGRID 11/2002

Phase I saw the selection of Michel Cosnard as the Director and Herve Guillard as the person responsible for the MECAGRID. The five partners for the MECAGRID were contacted and accepted to be part of the MECAGRID Project. Herve Guillard presented the project in July 2002 and the project was officially launched in November, 2002. A full-time engineer was recruited.


Phase II-Initial Active phase (01/02/2003 – 01/04/2003)

  • Start of Initial active work

  • 02/2003 Employment of Stephen Wornom, consultant (1/5 time)

  • 03/2003 Employment of Patrick Nivet, Engineer (full time)

  • Selection of Globus software to create the MECAGRID

  • Installation of Globus software as nonroot

  • Creation of simpleCA

  • Obtaining user, host, ldap certificates

  • Installation of mpich-1.2.5 as nonroot

  • Configuring the globus2 device

  • Learning to work with the Globus software

  • Testing and validation of Globus software

  • Progress meeting with INRIA Sophia, CEMEF and IUSTI

  • Identification of necessary requirements by partners

  • Public IP addresses

  • Root installation of the Globus software

Phase II can be termed the Initial Active Phase and covers the February-March 2003 period. Stephen Wornom was employed as a 1/5 time Exterior Collaborator starting February and Patrick Nivet joined the project March 6th as the full-time engineer on the project.


Two possible software approaches were examined before deciding to use the Globus software to create the Grid. Phase II saw the Globus software installed on each of the MECAGRID clusters as nonroot. The most advanced software for creating computational Grids is the Globus software created by the Globus Project starting around 1995. Funded primarily by the U.S. Department of Energy through grants to Argonne National Laboratory, University of Chicago, the University of California, and Northern Illnois, the Globus software is an free open source software used to create over 13,000 Grids to date and the Globus World 2003 Conference was attended by over 500 developers from 25 different countries. The Globus software is open source and free and developers from the 25 countries are actively developing and using the software.
Discussion as to whether to use the Certificate Authority (CA) from the U.S. or the CNRS, concluded with the creation of a simple certificate authority created at INRIA-Sophia. User and host certificates were issued using this CA. Testing and validation of the Globus software, as nonroot, was successfully completed on each of the clusters at the end of March 2003.
Phase II identified two requirements for the different clusters to communicate with each other. First, a root installation of the Globus software and public IP addresses to run mpi programs. None of the MECAGRID members were prepared to install the Globus software as root. Only INRIA-Sophia and INRIA Rhone-Alpes provided public IP addresses. The necessity of these two requirements was communicated to the different MECAGRID system administrators. Another software necessary for MPI computations is the installation of mpich-1.2.5 software with the globus2 device configured. In general, this only requires an upgrade from earlier installed versions of the mpich suftware.All the VO need to make this installation.

Problems in Phase II



The source bundles shown in Table I were installed following the installation instructions on the www.globus.org site. At INRIA-Sophia, the installation fails because of the version of the bison utility (1.875). The work around was to install bison-1.32 in my home directory and use v1.32 to complete the installation. CEMEF and IUSTI use bison-1.2.8 which installs correctly.

Table I : Source bundles


BUNDLE

FLAVORS

Data Management Client

gcc32dbg

Data Management SDK

gcc32dbg

Data Management Server

gcc32dbg

Information Services Client

gcc32dbgpthr

Information Services SDK

gcc32dbgpthr

Information Services Server

gcc32dbgpthr

Resource Management Client

gcc32dbg

Resource Management SDK

gcc32dbg

Resource Management Server

gcc32dbg

The Globus user e-mail lists are quite helpful in resolving problems that occur. Basic questions to the list have responses sometimes in the same day or the next day. The problem with the installation failure was resolved the same day indicating the bug fix for the problem (bison version was bad). Most of the problems with the MPI test cases which occurred and could not be resolved were a direct result of the nonroot installation. Questions usually have two classes of responders, full-time developers funded by the Globus Project Office in the U.S. and other developers using the Globus software. The full-time Globus developers said that the test cases should work even for nonroot installation of the Globus software. In hindsight, the starting point for the MECAGRID Project should have been the installation of the Globus software as root but the project was new and it was not realized the problems that would occur as the result on a nonroot installation.



Phase III-Analysis phase (01/04/2003 to 01/06/2003)

  • Progress limited to issues of:

  • Root installations of the Globus software

  • Implementing LSF and PBS job managers

  • Solving problems introduced by hardware changes

  • Preparation of Aero3d code for the MECAGRID

  • Preparation of Globus documentation for root administartors (Nivet)

  • Globus run scripts for individual clusters

  • Creating RSL scripts

  • Finding exterior sites running Globus willing to work with INRIA

Phase III marked a pause or analysis stage. It was evident in mid-March that the Grid partners administrators were not prepared to advance the Grid further. As of 01/04/2003 none of the partners had accepted to implement the changes necessary to advance the Grid : 1) installation of the Globus software as root, and 2) public IP addresses (only CEMEF and IUSTI were not prepared to make their IP address public). Thus « major » progress was blocked. The nonroot Sophia personnel redirected their efforts to the other Grid related topics listed above. At the suggestion of Herve Guillard, a search for other possible collaborators who had root installations and public IP address was made. To this end, Creatis a team of INSA-Lyon accepted to work with INRIA and the details of this collaboration are in progress. INSA-Lyon has a globus-2.2.4 root installation with public addresses. Table II summarizes the status of the MECAGRID as of 01/06/2003. As of June 1, 2003, Table II shows that : 1) no MECAGRID partner has a root installation of the Globus software which is the miminmun requirement for the MECAGRID to be operational, and 2) to run MPI codes on the Grid, partners must have public IP addresses or special device as routing encapsulation and mpich-1.2.5 installed with the globus2 device configured. Only the two INRIA sites have public IP addresses and none of the partners have mpich-1.2.5 installed as root with the globus2 device configured.



Table II. Status of the MECAGRID Project as of June 1, 2003

Organization

Name of cluster


MECAGRID partner

Public IP Address

Root installation

Globus version

Mpich-1.2.5 globus2 device


IUSTI

Cluster

yes

no

No

GT-2.2.4

Yes (nonroot)

CEMEF

Cluster

yes

no

No

GT-2.2.4

Yes (nonroot)

INRIA Sophia

Nina

yes

yes

No

GT-2.2.4

Yes (nonroot)

INRIA Grenoble

Icluster

yes

yes

No

GT-2.2.4

Yes (nonroot)

INSA-Lyon

Cluster

no

yes

Yes

GT-2.2.4

Yes (root)

Phase IV Start of the realization phase (June-July 2003)

In this phase real progress in the MECAGRID may be possible. INRIA-Sophia will provide a root installation of the Globus software sometime during the first two weeks of June (06/06/03). The installation of mpich-1.2.5 with the globus2 device is under discussion. After these installations have been completed, a two-member Grid will become a reality (INRIA-Sophia and INSA-Lyon) .


Installation of globus as root was performed at IUSTI by J.Massoni et P. Nivet on July 1st 2003 and at CEMEF by C.Torrin and P.Nivet on July 24th 2003.
On July 23 th 2003, tests were realized between INSA and INRIA with MPI programs of benchmarking. We can say the connection between the two sites is good but we noticed irregularity performances on INSA’s nodes (lack of scheduler ?).

3-Next objectives

Two main objectives have to be purchased :



  • extension of the grid to other partners (CEMEF and IUSTI): it means enable the tunnel (encapsulated routing) to be able to use nodes on private networks for running MPI programs throught GLOBUS at CEMEF and IUSTI.

  • Compilation and run of scientific programs as AERO3D and STOKES with GLOBUS/MPICH-G2.

Yüklə 42,34 Kb.

Dostları ilə paylaş:




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©muhaz.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin