The SibIASA laboratory uses a computing cluster specially designed to solve large-scale global optimization (LSGO) problems. The cluster is designed and administered by Alexey Vakhnin, an staff member of the institute.
The cluster is used for solving various problems of data analysis and modeling; however, the main application is the development and research of new global optimization algorithms for problems represented by a black box model (there is no information about the objective function, the problem is given in the form of an algorithm, simulation, etc.). Such problems require quite a long time for the optimization algorithm to run, and traditional algorithms cannot find an acceptable solution to the problem. When using a computing cluster, it becomes possible to test a large number of new hypotheses in a reasonable time and to carry out more calculations when studying and comparing algorithms, which cannot be done using standard, even high-performance stationary PCs.
To manage the cluster, specialized software has been developed for providing the process of parallelization of computational experiments and software based on MPI technology. A feature of the research problems being solved is that they can be represented only with coarse-grained parallelization; in the cluster, each computing unit solves an independent optimization problem.
The cluster runs on Ubuntu system, has a remote access and control system, as well as a mode of automatic collection and analysis of data obtained in the course of numerical experiments, and a system for generating reports. The software is implemented in the C++ programming language with optimization for the cluster architecture to increase the speed of calculations.
The cluster is a single-level distributed system (GRID-system), connected by a local network, consisting of 8 high-performance machines. One of the nodes acts as a server.
Some features:
- The total number of CPU cores is 64, the total number of computational threads is 128.
- The total amount of RAM is 72 GB.
- The distributed file system NFS of 300 GB is used.
Current configuration:
- Master PC: AMD Ryzen 7 Pro 3700 (8 cores), 16 RAM
- Slave-1, Slave-2: AMD Ryzen 7 Pro 3700 (8 cores), 8 RAM
- Slave-3 -Slave-7: AMD Ryzen 7 1700X (8 cores), 8 RAM