GSoC GANGA - 2019 Project Report

During Google Summer of Code 2019 I contributed to GANGA. The last three months I spent during the period have been rather fun-filled and productive as I spent my summer implementing flyweight design patterns and learning effective ways of reducing memory consumption. Google Summer of Code has been a great learning experience for me and has helped me in developing my programming skills. While exploring and learning the different areas of GANGA, I also learned how to maintain the quality of the code and how to structure it properly.

Ganga

Ganga is a tool to make it easy to run data analysis jobs along with managing associated data files.

Ganga is a job-management tool that offers a simple, efficient and consistent user analysis tool in a variety of heterogeneous environments: from local clusters to global Grid systems. Experiment specific plug-ins allow Ganga to be customised for each experiment.

Project:


Flyweight Design Pattern

Flyweight Patterns

Flyweight design pattern uses sharing to support large numbers of fine-grained objects efficiently.

A flyweight is a shared object that can be used in multiple contexts simultaneously. The flyweight acts as an independent object in each context—it’s indistinguishable from an instance of the object that’s not shared. Flyweights cannot make assumptions about the context in which they operate. The key concept here is the distinction between intrinsic and extrinsic state. Intrinsic state is stored in the flyweight; it consists of information that’s independent of the flyweight’s context, thereby making it sharable. Extrinsic state depends on and varies with the flyweight’s context and therefore can’t be shared. Client objects are responsible for passing extrinsic state to the flyweight when it needs it.

On other hand Copy-on-write is an optimization strategy wherein the fundamental idea is that if multiple callers ask for resources which are initially indistinguishable, you can give them pointers to the same resource. This function can be maintained until a caller tries to modify its “copy” of the resource, at which point a true private copy is created to prevent the changes becoming visible to everyone else.

We take the concept of flyweight pattern and extend it to use Copy-On-Write principles. Thus every attribute is a pointer and memory is reduced in that manner.

I have broadly covered the following areas during the course of this project:

1. Resource Profiling:

Implemented decorators to:

  • profile CPU Consumption: Used python package cProfiler
  • profile Memory consumption: Used python package memory_profile
  • count number of function calls.

The logs are recorded and saved in the gangadir in the system.

memory-profile

PR #1471 (merged)

2. Reduce duplication of memory:

  • Implemented flyweight design pattern extending principles of Copy on Write to reduce duplication of memory.
  • Migrated code to python3
  • Integrated Unit Tests for Copy on Write with Ganga.

Implementation Details

  1. We use a HashMap that stores reference to the object which have already been created
  2. Every object is associated with a key.
  3. When a client wants to create an object, they simply have to pass a key associated with it.
  4. If the object has already been created we simply get the reference to that object else it creates a new object and then returns it reference to the client.
  5. Both parent and child processes to initially share the same object in memory.
  6. If either process modifies a shared object, only then is the object copied.

PR #1480 (open)

To Do


  • Get PR merged after intense testing on CERN infrastructure

  • Extend the principles of Copy On Write to persistant memory to reduce the data duplication on disk.

  • Continue contributing to GANGA

Acknowledgement


I would like to thank my mentors Prof. Ulrik Egede, Dr. Alexander Richards and Dr. Mark Smith for their constant guidance, code reviews, timely feedback, help and most importantly for their dedicated guidance and encouragement throughout the duration of GSoC. I would want to continue contributing to GANGA in future and give back to the community that has taught me so much.

I also express my gratitude to the people behind Google Summer of Code. I had a very enjoyable experience working in GSoC 2019.

References