Work project report
Assia Mermouri
CERN
Summer Student Program ‘16
Table of Contents
I.Introduction 3
II.Presentation of CERN 3
III.Project proposal 3
IV.Contribution 4
a.Stage 1: Parsing 4
b.Stage 2: Cross-referencing 5
c.Stage 3: WEB application development 5
d.Improvements 6
V.Evaluation 6
VI.Conclusion 6
Table of Figures
Introduction
As an engineer-student in my 4th year of networking and telecommunications studies at the French engineering school INSA de Lyon, I decided to apply for the CERN summer programme of 2016 on my own initiative. My appliance has been pushed by the desire of living an amazing experience in an international environment while spending my summer in a very smart way.
I have been assigned a project where half of it was focused on engineering and the other half on computing. The overall goal was to develop a WEB interface that provides the access rights of users and E-groups to the different Subversion repositories. I worked under the supervision of Mr. Brice Copy in the BE-ICS-CIC division located on the Prévessin site of CERN.
-
CERN, which stands for “Conseil Européen pour la Recherche Nucléaire“, is the largest physics particles accelerator in the world. It is located few kilometres away from Geneva, in Switzerland, at the French-Swiss border in Meyrin. The rings of the Large Hadron Collider (LHC), which lied in a tunnel of 27 kilometres, extend on the French cities of Saint-Genis-Pouilly and Ferney-Voltaire.
Figure : One of the first Cisco routers introduced in Europe
Although CERN reputation is mostly based on its physics researches, it has an important place in the development of informatics technologies. The most famous one is certainly the creation of the World Wide Web at the beginning the 80’s by Tim Berners-Lee and Robert Cailliau. First developed to facilitate the exchange of information between searchers, it is now largely used all over the world.
CERN has also participated to the introduction of the Internet in Europe by installing the first two routers provided by Cisco (see Figure 1) on the European continent in 1987.
In addition, CERN developed technologies related to grid computing in order to process the important amount of data produced by the different physics experiences. Enabling Grids for e-Science (EGEE) is currently the most advanced project whose aim is to process the data generated by the LHC. It uses more than 41,000 processors belonging to 45 countries.
Project proposal
As introduced at the beginning of this paper, the aim of the project was to develop a WEB interface that provides the access right information of the Subversion (abbreviated SVN) repositories. SVN, distributed by Apache and BSD, is a software that permits to handle different versions of one or more project. The web-based interface, used by the CERN users, should offered the possibility to issue three different types of query: one to know the specific access rights to a user, another one to an E-group, and the last one to a repository.
Contribution
As can be seen on Figure 2, the project has been divided into three main stages:
-
The first stage consisted of parsing the AUTHZ file (which is used to manage the SVN repositories) to find out which repositories the E-groups and users have access to and what are their access rights to it;
-
The second stage consisted of cross referencing the previously extracted information from the SVN with the LDAP information in order to know to which E-group a user belongs to or which users form an E-group;
-
The third and last stage consisted of developing the web interface to permit the users to query the access rights.
Figure : Stages of the project
Stage 1: Parsing
Firstly, to extract the information from the SVN, I had to parse the AUTHZ file. Administrators use this file to specify the users’ access rights for different SVN repositories. The beginning of the file starts by listing the E-groups, followed access rights by the default and ending by specifying particular access rights for some repositories.
The E-groups forming the list are automatically filled in by their respective members three times a day. However, in my case, if a CERN member generates a query to know to which E-group a user belongs to, the response is retrieved from the LDAP server and not from the AUTHZ file.
In order to parse the AUTHZ file I used two of the many command line options provided by the Subversion User Authorization File Editor (SUAFE). This script, available on GitHub and developed by Shaun Johnson, allowed me to query the AUTHZ file to get the list of E-groups and the list of access rights rules.
Once I have extracted the list of E-groups from the SVN configuration file, I used the ldap3 Python library to query the CERN's LDAP server (reachable at xldap.cern.ch). I coded a Python program that requests the members of each E-groups appearing in the AUTHZ file.
Stage 2: Cross-referencing
Figure : Overview of the NoSQL database
Secondly, in order to cross-reference the extracted information coming from the SVN and the LDAP, I gather everything into a single NoSQL database using the MongoDB cross-platform. As the extracted data was coming from different sources and with different formats, I used a data pipeline (Logstash) that normalizes it into the JavaScript Object Notation (JSON) format. I made the choice of outputting the information into JSON since MongoDB uses JSON format to store records.
As can be seen on Figure 3, the database called “database” is made of two collections: one for the SUAFE data and the other one for the LDAP data. Each collection is then made of several unique documents, where one document has three distinct fields. However, I found it troublesome to filter the data from its source to JSON because Logstash presents some lack logging information. Anyway, I managed to obtain the desired result by completely personalizing the grok filter provided by Logstash.
Stage 3: WEB application development
Finally, I developed a very simple web interface by using MongoDB and CherryPy for the back end and HTML for the front end. I tried to stick with the representational state transfer (REST) while building the application.
Improvements
My final solution was perfectly working but was still too simple regarding the users’ expectations. Thus, I spent my remaining time (one week) to improve it. First, I started by “dockerizing” my project through the creation of four containers (one for each jobs: SUAFE, ldap3, Logstash and MongoDB) via Docker. This cutting allowed me then to automate the back end process by using the software Jenkins.
Evaluation
From stage 1 to stage 2 I successfully managed to meet the project’s requirements. However, I found it troublesome to work with Docker. As it was the first time I heard about this platform it took me some times to get used to it, e.g., understanding how to build a container and run an application in it, as well as how the different containers are communicating between each other.
During the first stage I also had some problems with SUAFE as it was containing bugs. Its developer did not try every possibilities its script is offering, therefore I had to contact him to get the bugs fixed.
The second stage has been particularly challenging as well because I had to learn how to master Logstash. The output in JSON was not in the format I was expected, thus, I needed to filter the input files to eliminate some fields by personalizing the grok filter. This filter is using regular expressions, which took me time to accommodate with.
Finally, I could not be able to properly finish the WEB application. I have some doubt about the architecture of my solution as it might not be RESTful.
Future works should imply to double check if the WEB application is respecting the REST architecture. The web interface needs to be completely rethink. For example it could directly provide the information of the user opening the web page, suggesting to autocomplete the form the user fills in and displaying the access rights of a user or an e-group in a tree form with some red and green lights.
-
As a summer student at CERN, the main goal of my work project was to develop a WEB application that provides information about access rights’ SVN repositories. Although I did not have enough time to embellish the website I created and, more important, to deploy it and put it into production, I still have learned lot of new skills. This project brought me the chance to gain real-life experience with web development and information processing, using modern programming languages and tools. I also learnt about CERN computing infrastructure resources.
Through the lectures and workshops organized by the CERN summer team I also gain knowledge in IT and expand my general culture in physics. I had the chance to meet amazing people from all over world that influence my behaviour in a positive way.
Dostları ilə paylaş: |