.. _dtm_onemax:

==========================================
DTM + EAP = DEAP : a Distributed Evolution
==========================================

As part of the DEAP framework, EAP offers an easy DTM integration. As the EAP algorithms use a map function stored in the toolbox to spawn the individuals evaluations (by default, this is simply the traditional Python :func:`map`), the parallelization can be made very easily, by replacing the map operator in the toolbox : ::
    
    from deap import dtm
    tools.register("map", dtm.map)
    
Thereafter, ensure that your main code is in enclosed in a Python function (for instance, main), and just add the last line : ::
    
    dtm.start(main)
    
For instance, take a look at the short version of the onemax. This is how it may be parallelized : ::
    
    from deap import dtm

    creator.create("FitnessMax", base.Fitness, weights=(1.0,))
    creator.create("Individual", array.array, typecode='b', fitness=creator.FitnessMax)

    toolbox = base.Toolbox()

    # Attribute generator
    toolbox.register("attr_bool", random.randint, 0, 1)

    # Structure initializers
    toolbox.register("individual", tools.initRepeat, creator.Individual, toolbox.attr_bool, 100)
    toolbox.register("population", tools.initRepeat, list, toolbox.individual)

    def evalOneMax(individual):
        return sum(individual),

    toolbox.register("evaluate", evalOneMax)
    toolbox.register("mate", tools.cxTwoPoints)
    toolbox.register("mutate", tools.mutFlipBit, indpb=0.05)
    toolbox.register("select", tools.selTournament, tournsize=3)
    toolbox.register("map", dtm.map)

    def main():
        random.seed(64)
        
        pop = toolbox.population(n=300)
        hof = tools.HallOfFame(1)
        stats = tools.Statistics(lambda ind: ind.fitness.values)
        stats.register("Avg", tools.mean)
        stats.register("Std", tools.std)
        stats.register("Min", min)
        stats.register("Max", max)
        
        algorithms.eaSimple(toolbox, pop, cxpb=0.5, mutpb=0.2, ngen=40, stats=stats, halloffame=hof)
        logging.info("Best individual is %s, %s", hof[0], hof[0].fitness.values)
        
        return pop, stats, hof

    dtm.start(main)     # Launch the first task

As one can see, parallelization requires almost no changes at all (an import, the selection of the distributed map and the starting instruction), even with a non-trivial program. This program can now be run on a multi-cores computer, on a small cluster or on a supercomputer, without any changes, as long as those environments provide a MPI implementation.

.. note::
    In this specific case, the distributed version would be actually *slower* than the serial one, because of the extreme simplicity of the evaluation function (which takes *less than 0.1 ms* to execute), as the small overhead generated by the serialization, load-balancing, treatment and transfer of the tasks and the results is not balanced by a gain in the evaluation time. In more complex, real-life problems (for instance sorting networks), the benefit of a distributed version is fairly noticeable.