6. Using Multiple Processors

This section of the tutorial shows all the work that is needed to distribute operations in deap. Distribution relies on serialization of objects and serialization is usually done by pickling, thus all objects that are distributed (functions and arguments, e.g. individuals and parameters) must be pickleable. This means modifications made to an object on a distant processing unit will not be made available to the other processing units (including the master one) if it is not explicitly communicated through function arguments and return values.

6.1. Scalable Concurent Operations in Python (SCOOP)

SCOOP is a distributed task module allowing concurrent parallel programming on various environments, from heterogeneous grids to supercomputers. It has an interface similar to the concurrent.futures module introduced in Python 3.2. Its two simple functions submit() and map() allow to distribute computation efficiently and easily over a grid of computers.

In the last section a complete algorithm was exposed with the toolbox.map() left to the default map(). In order to distribute the evaluations, we will replace this map by the one from SCOOP.

from scoop import futures

toolbox.register("map", futures.map)

Once this line is added, your program absolutly needs to be run from a main() function as mentionned in the scoop documentation. To run your program, use scoop as the main module.

$ python -m scoop your_program.py

That is it, your program has been run in parallel on all available processors.

6.2. Distributed Task Manager

Deprecated since version 0.9: Use SCOOP instead, it is a fork of DTM and developped by the same group.

Distributing tasks on multiple computers is taken care of by the distributed task manager module dtm. Its API similar to the multiprocessing module allows it to be very easy to use. In the last section a complete algorithm was exposed with the toolbox.map() left to the default map(). In order to distribute the evaluations, simply replace this map with the one provided by the dtm module and tell to dtm which function is the main program here it is the main() function.

from deap import dtm

toolbox.register("map", dtm.map)

def main():
    # My evolutionary algorithm
    pass

if __name__ == "__main__":
    dtm.start(main)

That’s it. The map operation contained in the toolbox will now be distributed. The next time you run the algorithm, it will run on the number of cores specified to the mpirun command used to run the python script. The usual bash command to use dtm will be :

$ mpirun [options] python my_script.py

6.3. Multiprocessing Module

Using the multiprocessing module is exactly similar to using the distributed task manager. Again in the toolbox, replace the appropriate function by the distributed one.

import multiprocessing

pool = multiprocessing.Pool()
toolbox.register("map", pool.map)

# Continue on with the evolutionary algorithm

Warning

As stated in the multiprocessing guidelines, under Windows, a process pool must be protected in a if __name__ == "__main__" section because of the way processes are initialized.

Note

While Python 2.6 is required for the multiprocessing module, the pickling of partial function is possible only since Python 2.7 (or 3.1), earlier version of Python may throw some strange errors when using partial function in the multiprocessing multiprocessing.Pool.map(). This may be avoided by creating local function outside of the toolbox (in Python version 2.6).

Note

The pickling of lambda function is not yet available in Python.