GSoC GANGA - 2019 Project Report
Ganga is a tool to make it easy to run data analysis jobs along with managing associated data files.
Ganga is a job-management tool that offers a simple, efficient and consistent user analysis tool in a variety of heterogeneous environments: from local clusters to global Grid systems. Experiment specific plug-ins allow Ganga to be customised for each experiment.
Project:
Flyweight Design Pattern
Flyweight design pattern uses sharing to support large numbers of fine-grained objects efficiently.
A flyweight is a shared object that can be used in multiple contexts simultaneously. The flyweight acts as an independent object in each context—it’s indistinguishable from an instance of the object that’s not shared. Flyweights cannot make assumptions about the context in which they operate. The key concept here is the distinction between intrinsic and extrinsic state. Intrinsic state is stored in the flyweight; it consists of information that’s independent of the flyweight’s context, thereby making it sharable. Extrinsic state depends on and varies with the flyweight’s context and therefore can’t be shared. Client objects are responsible for passing extrinsic state to the flyweight when it needs it.
On other hand Copy-on-write is an optimization strategy wherein the fundamental idea is that if multiple callers ask for resources which are initially indistinguishable, you can give them pointers to the same resource. This function can be maintained until a caller tries to modify its “copy” of the resource, at which point a true private copy is created to prevent the changes becoming visible to everyone else.
We take the concept of flyweight pattern and extend it to use Copy-On-Write principles. Thus every attribute is a pointer and memory is reduced in that manner.
I have broadly covered the following areas during the course of this project:
1. Resource Profiling:
Implemented decorators to:
- profile CPU Consumption: Used python package cProfiler
- profile Memory consumption: Used python package memory_profile
- count number of function calls.
The logs are recorded and saved in the gangadir in the system.
2. Reduce duplication of memory:
- Implemented flyweight design pattern extending principles of Copy on Write to reduce duplication of memory.
- Migrated code to python3
- Integrated Unit Tests for Copy on Write with Ganga.
Implementation Details
- We use a HashMap that stores reference to the object which have already been created
- Every object is associated with a key.
- When a client wants to create an object, they simply have to pass a key associated with it.
- If the object has already been created we simply get the reference to that object else it creates a new object and then returns it reference to the client.
- Both parent and child processes to initially share the same object in memory.
- If either process modifies a shared object, only then is the object copied.
To Do
Get PR merged after intense testing on CERN infrastructure
Extend the principles of Copy On Write to persistant memory to reduce the data duplication on disk.
Continue contributing to GANGA
Acknowledgement
I would like to thank my mentors Prof. Ulrik Egede, Dr. Alexander Richards and Dr. Mark Smith for their constant guidance, code reviews, timely feedback, help and most importantly for their dedicated guidance and encouragement throughout the duration of GSoC. I would want to continue contributing to GANGA in future and give back to the community that has taught me so much.
I also express my gratitude to the people behind Google Summer of Code. I had a very enjoyable experience working in GSoC 2019.
References
- Design Patterns and Python Flyweight Pattern
- Design Patterns: Elements of Reusable Object-Oriented Software (Erich Gamma, John Vlissides, Ralph Johnson, and Richard Helm)