A new Grid workflow for data analysis within the ALICE project using containers and modern Cloud technologies

Year
2022
Degree
PhD
Author
Storetvedt, Maksim Melnik
Mail
maksim.melnik.storetvedt@cern.ch
Institution
CERN
Abstract

Grid computing is a method for automatically distributing batch-style computing tasks on a global network of heterogeneous computing centres. ALICE (A Large Ion Collider Experiment) – one of the four large experiments at the CERN LHC – uses Grid resources to process large quantities of its collected data. While often compared to the more centralised and often commercial Cloud Computing paradigm, Grid resources tend to be geographically spread across multiple sites, containing computer clusters of different characteristics. Loosely coupled and generally heterogeneous, both in terms of hardware and software, these clusters come together to form a distributed system across multiple administrative domains. Heterogeneous clusters as found in Grid computing may create challenges from having to cater to multiple system requirements, configurations and deployment practices. To alleviate some of the challenges that may arise, concepts from Cloud Computing have in recent times been applied to “cloudify” the Grid infrastructure. Through the use of technologies such as virtualisation, a desired hardware and software environment can be simulated on a range of underlying configurations. This allows for the creation of homogeneous environments within an otherwise heterogeneous Grid – an approach that is today utilised on numerous Grid sites. However, while virtualisation has gained ground within the Grid, new practices and technologies have since emerged. Specifically, containers and elasticity have rapidly risen in both adoption and popularity, and today are often found used within the Cloud. While often compared to traditional virtualisation, containers create an isolated environment on the same operating system kernel, thus avoiding virtualisation overhead. The ALICE experiment is currently exploring the use of newer Cloud concepts within its Grid infrastructure. This comes as part of the development of a new Grid middleware framework (JAliEn – the Java ALICE Environment), which presents an opportunity to integrate these concepts directly within the core at a middleware level. This potential integration forms the base of the underlying research question and consequent contribution of the current thesis – to investigate ways to make ALICE offline computing more flexible and easier to administer, through the use of Cloud concepts and technologies. In turn constructing a new and optimised Grid workflow that not only alleviates present challenges, but may also help satisfy the requirements of the ALICE collaboration in the upcoming LHC Run 3.

Supervisors
Kileng, Bjarte (Western Norway University of Applied Sciences)
Report number
CERN-THESIS-2022-387
Date of last update
2023-11-07