CGP acceleration

Principle

In order to evaluate the fitness function the candidate solution has to be simulated. This step involves the interpretation of a CGP genotype for each vector. Since the fitness evaluation time dominates the time of whole evolutionary process in vast majority of problems and moreover a large number of evaluations is usually needed to achieve satisfactory result, it is important to simulate candidate solutions effectively.

In order to maximize the overall performance, we have replaced the interpreter with corresponding optimized native machine code that directly calculates response for a single training vector. The machine code is in linear form, i.e., without conditional branches hurting performance. We have developed a method which allows the users to significantly accelerate the evaluation of candidate solutions without having hardly any knowledge of assembly language or target machine code. Moreover, the integration of the machine code compiler requires modifying only a few lines of code.

As it can be seen on a simple case study CGP can process several times higher number of evaluations on the same machine introducing native evaluation. Even if we used optimized C implementation of CGP, we were able to evaluate more than five times higher number of candidate solutions using the same machine. Even when we used a highly optimized C implementation of the interpreted CGP, we were still able to evaluate more than five times greater amount of candidate solutions by using the native code version.

Details can be found in our paper Efficient Phenotype Evaluation in Cartesian Genetic Programming presented at EuroGP 2012. This paper contains evaluation in fixed as well as floating-point domain where one can achieve even better results.

How to integrate native evaluation

The process of fitness value calculation consists of two phases. In the first phase the CGP chromosome is compiled or more precisely translated to machine code that resides in the application's address space. The second phase involves the calculation of response for the training vectors and fitness value.

The whole process is designed to maximize the performance a minimize overhead caused by translation. Thus the evaluation procedure consists of a simple call only which executes the obtained machine code for each training vector. It means that the process of compilation is executed once per fitness evaluation.

To integrate the native evaluation one has to modify function that evaluates fitness value of the whole population. The modification consists of calling code compiler and replacing the common interpreter with simple call that invokes correct compiled machine code. Apart from that it is of course necessary to initialize the arrays utilised by the native evaluation.

The changes you need to make are highlighted in the following code snippet.

+ #include "native.h"
 ...
+ //code generated using this website
+ unsigned char* consts;
+ void cgp_init_consts()
+ {
+ ...
+ }

+ unsigned char* code[MAX_POPSIZE];
+ #define MAXCODESIZE 20
+ void cgp_compile(unsigned char* pcode, chromosome p_chrom, int* isused) 
+ {
+ ...
+ }

void calc_fitness(int parentidx) 
{
    //variable declarations
    ...
    //initialize
    for (int i=0; i < params.popsize; i++) 
    {
        if (i == parentidx) continue;
        #ifdef DONOTEVALUATEUNUSEDNODES
        usednodes[i] = used_nodes((chromosome) population[i], isused[i]);
        #endif
        fitvalues[i] = 0;
+       cgp_compile(code[i], (chromosome) population[i], isused[i]);
    }
    for (int l=0; l < params.trainingvectors; l++) 
    {
        //copy the first part of a training vector to the primary inputs
        memcpy(nodeoutput, ptraindata, params.inputs*sizeof(nodetype));
        ptraindata += params.inputs;
        //determine and process the response of each candidate solution
        for (int i=0; i < params.popsize; i++) 
        {
            if (i == parentidx) continue;
-           cgp_eval((chromosome) population[i], isused[i]);
+           ((evalfunc *)(code[i]))();
            //compare the output values against training vector (calculate fit value)
	    ...
            fitvalues[i] += fit;
        }
        //next training vector
        ptraindata += params.outputs; 
    }
}
...
int main(int argc, char* argv[])
{
    ...
    //memory allocation
    for (int i=0; i < params.popsize; i++) 
    {
        population[i] = new chromosome [params.geneoutidx + params.outputs];
        isused[i] = new int [params.genes];
+       code[i] = malloc_aligned(params.cols*params.rows*MAXCODESIZE + 64);
    }
+   cgp_init_consts();
    ...
}

If you don't believe that the proposed acceleration technique is really simple to integrate then just compare source codes which are available in the next section.

Case study (evolutionary design of logic circuits)

The impact of the proposed method on the performance is evaluated using one of common CGP applications - evolutionary design of combinational circuits. Even if software implementations of CGP (intended for evolution of logic circuits) can strongly benefit from the so-called parallel simulation we can easily demonstrate that the performance can be increased by introducing native evaluation.

Results

We have implemented common CGP and accelerated version of CGP and evaluated the performance using common benchmark problem - evolutionary design of multipliers at gate-level. The tables below show the average number of evaluations evaluated per a second. The average has been calculated using twenty independent runs. Note that the flag DONOTEVALUATEUNUSEDNODES was not set to provide statistically significant results.

32-bit implementation

Hamming distance utilized in fitness function was calculated using either four lookups into 8-bit LUT table (LUT) or 32-bit popcnt instruction available in SSE4 (POPCNT)

benchmark accelerated CGP implementation common CGP implementation
LUT POPCNT LUT POPCNT
multiplier 5x5 71 000(5.3x) 78 568(5.7x) 13 318 13 675
multiplier 6x6 22 432(6.8x) 26 137(7.7x) 3 290 3 391
multiplier 7x7 5 576(6.9x) 6 657(7.9x) 808 842
CGP parameters: C=80, R=1, L=40 Node functions: BUF, NOT, AND, OR, XOR, NAND, NOR, XNOR
CPU: Intel Core 2 DUO P8400 @2.26 GHz OS: Win XP, 32-bit

The speedup vary between 5x and 8x depending on the problem size. The larger number of training vectors, the higher speedup. This effect is primarily caused by the overhead introduced by the compilation of CGP phenotypes into machine code.

64-bit implementation

Hamming distance utilized in fitness function was calculated using either eight lookups into 8-bit LUT table (LUT) or 64-bit popcnt instruction available in SSE4 (POPCNTL)

benchmark accelerated CGP implementation common CGP implementation
LUT POPCNT LUT POPCNT
multiplier 5x5 107 051(3.3x) 136 449(3.8x) 32 677 36 016
multiplier 6x6 40 318(5.0x) 62 794(6.9x) 8 126 9 123
multiplier 7x7 10 614(5.3x) 19 137(8.5x) 1 992 2 246
multiplier 8x8 2 442(5.0x) 4 714(8.4x) 492 560
CGP parameters: C=80, R=1, L=40 Node functions: BUF, NOT, AND, OR, XOR, NAND, NOR, XNOR
CPU: Intel Xeon X5650 @2.66 GHz (12 cores, 15 MB cache) OS: CentOS Linux, 64-bit

Similarily to the previous results the speedup vary between 3x and 8x depending on the problem size. If we compare the average number of evaluations evaluated per a second of the 32-bit and 64-bit common CGP implementation, the resulting ratio is approximately 2.4 which corresponds with our expectation as a) the 64-bit implementation evaluates 64 inputs vectors instead of 32 and b) the 64-bit CPU is running at approx. 1.17 times higher frequency.

Content

  1. Principle
  2. How to integrate
  3. Case study

Get code

Evolutionary design of combinational circuits Symbolic regression

Licence

Available code is licensed under BUT Open Source Licence

Source code (32-bit implementation)

show / hide source code annotation

cgp.cpp

Common CGP implementation (cgp.cpp)

Cartesian genetic programming, introduced by Julian Miller and Peter Thomson in 2000, is a variant of genetic programming where the genotype is represented as a list of integers that are mapped to directed oriented graphs. In order to evaluate the fitness function the response for each training vector has to be calculated. This step involves the interpretation of a CGP genotype for each vector. In this implementation we are using linear interpreter that interprets CGP chromosome.

/*===============================================================================
  cgp.cpp: Evolutionary design of combinational circuits using CGP
  ===============================================================================
  Copyright (C) 2012 Brno University of Technology,
                     Faculty of Information Technology

  Author(s): Zdenek Vasicek <vasicek AT fit.vutbr.cz>
  =============================================================================*/

/// #define HAVE_POPCNT
/// #define DONOTEVALUATEUNUSEDNODES
#define MAX_POPSIZE 100
typedef int fitvaltype;
typedef int nodetype;

#include "cgp.h"
#include "circ.h"

pchromosome population[MAX_POPSIZE];  
fitvaltype  fitvalues[MAX_POPSIZE];             
/// default parameters
tparams params = {5000000 /*generations*/, 5 /*pop.size*/, 1 /*mut.genes*/, 0, 0, 80 /*cols*/, 1 /*rows*/, 40 /*lback*/, 2, 1, 8 /*functions*/};

nodetype*  nodeoutput; //array of node outputs used for CGP evaluation
int*       isused[MAX_POPSIZE]; //array of marked used nodes for each individual
int        usednodes[MAX_POPSIZE]; //number of used nodes for each individual
int*       data;  //training data
col_validvalues **col_values; //valid gene values for each column 


CGP mutation operator

This function modifies params.mutations randomly selected genes of a given genotype. The modification is done in place.

The implementation of the mutation operator has to ensure that the modifications are legal and lead to a valid phenotype. This is done using col_values array which contains valid values that can occur in each column of CGP array.


inline void cgp_mutate(chromosome p_chrom) 
{
    int rnd = rand() % params.mutations;
    int genes = rnd + 1; 
    for (int j = 0; j < genes; j++) 
    {
        int i = rand() % (params.geneoutidx + params.outputs); 
        int col = (int) (i / params.genespercol);

        if (i < params.geneoutidx) 
        { 
           if ((i % params.nodeios) < params.nodeinputs) 
           {  //mutate a gene that encodes connection
              do { rnd = col_values[col]->values[(rand() % (col_values[col]->items))]; } while (rnd == p_chrom[i]);
              p_chrom[i] = rnd;
           } 
           else 
           {  //mutate a gene that encodes function
              if (params.nodefuncs > 1) {
                 do { rnd = rand() % params.nodefuncs; } while (rnd == p_chrom[i]);
                 p_chrom[i] = rnd;
              }
           }
        } 
        else 
        {  //mutate a primary output
           do { rnd = rand() % params.lastnodeidx; } while (rnd == p_chrom[i]);
           p_chrom[i] = rnd;
        }
    }
}


CGP evaluation

This function simulates the encoded candidate solution and calculates response for single input vector. The response for each CGP node is stored in array nodeoutput. The first params.inputs items contain input vector.

The interpreter consists of a loop that calculates the response for each CGP node according to the genes of CGP chromosome. The execution of the encoded graph starts from the first node and continues according to the increasing node index. This scheme represents the most efficient implementation as it does not introduce any overhead due to function calling that have to manipulate with stack. In contrast with the recursive approach, all the output values are calculated in one pass. In order to improve the performance, a simple preprocessing step that marks the utilized nodes only can be introduced (by enabling DONOTEVALUATEUNUSEDNODES parameter). Only the marked nodes are subsequently evaluated.

inline int cgp_eval(chromosome p_chrom, int* isused) 
{
    int in1,in2,fce,i,j;
    int *pnodeout = nodeoutput + params.inputs; 

    #ifdef DONOTEVALUATEUNUSEDNODES
    isused += params.inputs;
    #endif

    /// Evaluate the response of each node
    for (i=0; i < params.cols; i++)
        for (j=0; j < params.rows; j++) 
        {
            #ifdef DONOTEVALUATEUNUSEDNODES
            if (!*isused++) 
            {  //This node is not used, skip it
               p_chrom += 3;
               pnodeout++;
               continue;
            }
            #endif

            in1 = nodeoutput[*p_chrom++];
            in2 = nodeoutput[*p_chrom++];
            fce = *p_chrom++;
            switch (fce) 
            {
              case 0: *pnodeout++ = in1; break;          //in1
              case 1: *pnodeout++ = ~in1; break;         //not in1
              case 2: *pnodeout++ = in1 & in2; break;    //and
              case 3: *pnodeout++ = in1 | in2; break;    //or
              case 4: *pnodeout++ = in1 ^ in2; break;    //xor
              case 5: *pnodeout++ = ~(in1 & in2); break; //nand
              case 6: *pnodeout++ = ~(in1 | in2); break; //nor
              case 7: *pnodeout++ = ~(in1 ^ in2); break; //xnor
              default: abort();
            }
        }
}


Calculate fitness values

This function determines the fitness value of each candidate solution except parental solution parentidx. The calculated fitness values are stored in array denoted as fitvalues.

In order to calculate the fitness value of each candidate solution the response for each training data has to be evaluated. In this implementation the fitness value correspons with the Hamming distance between the calculated and desired response.

The fitness function calls cgp_eval procedure that, given the CGP calculated outputs data and a candidate solution, evaluates the response for a single input vector (i.e. single fitness case stored in array denoted as ptraindata). Then, according to the information about output connections stored in chromosome, the fitness value of a genotype for the utilized input vector is calculated. These steps are repeated until last training vector is evaluated.

inline void calc_fitness(int parentidx) 
{
    chromosome p_chrom;
    register int fit, vysl;
    int *ptraindata = data;

    for (int i=0; i < params.popsize; i++) 
    {
        if (i == parentidx) continue;

        #ifdef DONOTEVALUATEUNUSEDNODES
        usednodes[i] = used_nodes((chromosome) population[i], isused[i]);
        #endif

        fitvalues[i] = 0;
    }

    for (int l=0; l < params.trainingvectors; l++) 
    {
        ///copy the first part of a training vector to the primary inputs
        memcpy(nodeoutput, ptraindata, params.inputs*sizeof(nodetype));
        ptraindata += params.inputs;

        ///determine and check response of each candidate solution
        for (int i=0; i < params.popsize; i++) 
        {
            if (i == parentidx) continue;
            
            cgp_eval((chromosome) population[i], isused[i]);

            ///compare the output values against training vector (specification)
            fit = 0;
            p_chrom = (chromosome) population[i] + params.geneoutidx;
            for (int k=0; k < params.outputs; k++) 
            {  
                vysl = (nodeoutput[*p_chrom++] ^ *(ptraindata +k));
                fit += zeroscount(vysl);
            }

            fitvalues[i] += fit;
        }

        ///next training vector
        ptraindata += params.outputs; 
    }
}


Number of utilized nodes

This function calculates the number of nodes that contribute to the resulting phenotype that is encoded by a given genotype.

int used_nodes(chromosome p_chrom, int* isused) 
{
    int in, fce, idx, used = 0;
    int *pchrom;

    memset(isused, 0, params.lastnodeidx*sizeof(int));


mark nodes connected to the primary outputs

    pchrom = p_chrom + params.geneoutidx;
    for (int i=0; i < params.outputs; i++) 
        isused[*pchrom++] = 1;


go throught the cgp array

    pchrom = p_chrom + params.geneoutidx - 1;
    idx = params.lastnodeidx-1;
    for (int i=params.cols; i > 0; i--) 
    {
        for (int j=params.rows; j > 0; j--,idx--) 
        {
            fce = *pchrom--; //fce
            if (isused[idx] == 1) 
            {  

the current node is marked, mark also the connected nodes

               in = *pchrom--; // in2
               if (fce > 1)    // 2-input functions
                  isused[in] = 1;
               in = *pchrom--; // in1
               isused[in] = 1;

               used++;
            } else {

the current node is not market, skip it

               pchrom -= params.nodeinputs;
            }
        }
    }

    return used;
}


Main application

Evolutionary design of combinational circuits using Cartesian Genetic Programming.

Usage:

cgp truth_table.txt [-c COLUMNS] [-r ROWS] [-l LBACK] \
[-g GENERATIONS] [-p POPSIZE] [-m MUTATEDGENES]
int main(int argc, char* argv[])
{
    using namespace std; 
    int blk, bestblk, data_items, parentidx, fittest_idx;
    unsigned long int generation; 
    fitvaltype  bestfitval;

    strcpy(params.datafname, "data.txt"); 
    parse_options(argc, argv);
    ///load training data
    if ((data_items = parsefile(params.datafname, NULL, NULL, NULL)) < 1) 
    { 
       printf("Invalid data\n"); return 0; 
    }
    data = new int[data_items];
    parsefile(params.datafname, data, &params.inputs, &params.outputs);
    params.trainingvectors = data_items / (params.inputs+params.outputs); //Spocitani poctu pruchodu pro ohodnoceni
    params.maxfitval = params.trainingvectors*params.outputs*32;         //Vyitems max. fitness
    printf("Training data:\n   file:%s, items:%d, inputs:%d, outputs:%d\n", params.datafname, data_items, params.inputs, params.outputs);

    init_paramsandluts();
   
    ///memory allocation
    for (int i=0; i < params.popsize; i++) 
    {
        population[i] = new chromosome [params.geneoutidx + params.outputs];
        isused[i] = new int [params.genes];
    }
    nodeoutput = new nodetype [params.genes];
    memset(nodeoutput, 0, sizeof(nodetype) * params.genes);

    srand((unsigned) time(NULL));


Initial population and its evaluation

The initial population which consists of params.popsize individuals is generated randomly.


    for (int i=0; i < params.popsize; i++) 
    {
        chromosome p_chrom = (chromosome) population[i];
        for (int j=0; j < params.cols*params.rows; j++) 
        {
            int col = (int)(j / params.rows);
            for (int k=0; k < params.nodeinputs; k++)  // node inputs
                *p_chrom++ = col_values[col]->values[(rand() % (col_values[col]->items))];
            *p_chrom++ = rand() % params.nodefuncs; // node function
        }
        for (int j=0; j < params.outputs; j++) // primary outputs
            *p_chrom++ = rand() % params.lastnodeidx;
    }
  

Evaluate the initial population and find the fittest candidate solution that becomes parent

    calc_fitness(-1);
    bestfitval = fitvalues[0]; fittest_idx = 0;
    for (int i=1; i < params.popsize; i++) 
        if (fitvalues[i] > bestfitval) 
        {
           bestfitval = fitvalues[i];
           fittest_idx = i;
        }

    printf("CGP parameters:\n   l-back=%d, rows=%d, cols=%d, functions=%d\n", params.lback, params.rows, params.cols, params.nodefuncs);
    printf("   popsize=%d, mutgenes=%d, generations=%d\n", params.popsize, params.mutations, params.maxgenerations);
    #ifdef DONOTEVALUATEUNUSEDNODES
    printf("   evalunused=false\n");
    #endif
    printf("Evolutionary run:\n   initial fitness=%d, best fitness=%d\n", bestfitval, params.maxfitval);
    double time = cpuTime(); 


Evolutionary loop

    for (int generation = 0; generation < params.maxgenerations; generation++) 
    {


Step 1

Generate offsprings of the fittest individual

        for (int i=0; i < params.popsize;  i++) 
        {
            if (fittest_idx == i) continue;
            cgp_mutate(copy_chromozome(population[fittest_idx], population[i]));
        }


Step 2

Evaluate population and calculate fitness values

        calc_fitness(fittest_idx);


Step 3

Check if there is an offspring that should replace parental solution

        int newparidx = fittest_idx;
        for (int i=0; i < params.popsize; i++) 
        { 
            fitvaltype fit = fitvalues[i];
            if ((i == fittest_idx) || (fit < bestfitval)) continue; 


current individual is at least of the same quality as its parent

            if (fit < params.maxfitval) 
            {  //current individual does not 
               if (fit > bestfitval) 
                  printf("Generation: %-8d\tFitness: %d/%d\n",generation, fit, params.maxfitval);
               bestfitval = fit;
               bestblk = params.nodes;
               newparidx = i;
            } 
            else 
            {  //second optimization criterion - the number of utilized nodes
               #ifdef DONOTEVALUATEUNUSEDNODES
               blk = usednodes[i];
               #else
               blk = used_nodes((chromosome) population[i], isused[i]);
               #endif
               if (blk <= bestblk) 
               {
                  if ((blk < bestblk) || (fit > bestfitval))
                     printf("Generation: %-8d\tFitness: %d/%d\tNodes: %d\n",generation, fit, params.maxfitval, blk);

                  bestfitval = fit;
                  bestblk = blk;
                  newparidx = i;
               }
            }
            
        }

        fittest_idx = newparidx;
    }


End of evolution

    time = cpuTime() - time;
    printf("Best fitness: %d/%d\n",bestfitval,params.maxfitval);
    printf("Best individual: ");
    print_chrom(stdout, (chromosome)population[fittest_idx]);
    printf("Duration: %f Evaluations per sec: %f", time, params.maxgenerations*(params.popsize-1) / time);
    
    if (bestfitval == params.maxfitval) {

save the solution

        FILE *chrfil = fopen("result.chr","wb");
        print_chrom(chrfil, (chromosome)population[fittest_idx]);
        fclose(chrfil);
    }

    return 0;
}

show / hide source code annotation

cgp_native.cpp

Efficient CGP implementation (cgp_native.cpp)

Cartesian genetic programming, introduced by Julian Miller and Peter Thomson in 2000, is a variant of genetic programming where the genotype is represented as a list of integers that are mapped to directed oriented graphs. In order to evaluate the fitness function, the response for each training vector has to be calculated. This step involves the interpretation of a CGP genotype for each vector. To maximize the overall performance, we introduced JIT compilation of genotype to binary machine code.

/*===============================================================================
  cgp_native.cpp: Evolutionary design of combinational circuits using CGP
  ===============================================================================
  Copyright (C) 2012 Brno University of Technology,
                     Faculty of Information Technology

  Author(s): Zdenek Vasicek <vasicek AT fit.vutbr.cz>
  =============================================================================*/

/// #define HAVE_POPCNT
/// #define DONOTEVALUATEUNUSEDNODES
#define MAX_POPSIZE 100
typedef int fitvaltype;
typedef int nodetype;

#include "cgp.h"
#include "circ.h"
#include "native.h"

pchromosome population[MAX_POPSIZE];  
fitvaltype  fitvalues[MAX_POPSIZE];             
///default parameters
tparams params = {5000000 /*generations*/, 5 /*pop.size*/, 1 /*mut.genes*/, 0, 0, 80 /*cols*/, 1 /*rows*/, 40 /*lback*/, 2, 1, 8 /*functions*/};

nodetype*  nodeoutput; //array of node outputs used for CGP evaluation
int*       isused[MAX_POPSIZE]; //array of marked used nodes for each individual
int        usednodes[MAX_POPSIZE]; //number of used nodes for each individual
int*       data;  //training data
col_validvalues **col_values; //valid gene values for each column 


CGP mutation operator

This function modifies params.mutations randomly selected genes of a given genotype. The modification is done in place.

The implementation of the mutation operator has to ensure that the modifications are legal and lead to a valid phenotype. This is done using col_values array which contains valid values that can occur in each column of CGP array.

inline void cgp_mutate(chromosome p_chrom) 
{
    int rnd = rand() % params.mutations;
    int genes = rnd + 1; 
    for (int j = 0; j < genes; j++) 
    {
        int i = rand() % (params.geneoutidx + params.outputs); 
        int col = (int) (i / params.genespercol);

        if (i < params.geneoutidx) 
        { 
           if ((i % params.nodeios) < params.nodeinputs) 
          {  //mutate a gene that encodes connection
              do { rnd = col_values[col]->values[(rand() % (col_values[col]->items))]; } while (rnd == p_chrom[i]);
              p_chrom[i] = rnd;
           } 
           else 
         {  //mutate a gene that encodes function
              if (params.nodefuncs > 1) {
                 do { rnd = rand() % params.nodefuncs; } while (rnd == p_chrom[i]);
                 p_chrom[i] = rnd;
              }
           }
        } 
        else 
        {  //mutate a primary output
           do { rnd = rand() % params.lastnodeidx; } while (rnd == p_chrom[i]);
           p_chrom[i] = rnd;
        }
    }
}

unsigned char* consts;
void cgp_init_consts()
{
    #define B(val)  *(pconst++)=(val)
    unsigned char* pconst;
    pconst = consts = malloc_aligned(1024);
    ///Initialize constants
}


#define MAXCODESIZE 20
typedef void evalfunc(void); 
unsigned char* code[MAX_POPSIZE];


Native code generation

The process of CGP genotype compilation contains three sections.

int cgp_compile(unsigned char* pcode, chromosome p_chrom, int* isused) 
{
    #define C(val)  *(pcode++)=(val)
    #define CI(val) {*((uint32_t *)(pcode))= (uint32_t)(val); pcode += sizeof(uint32_t);}

    int pnodeout = 0;
    int in1,in2,fce;
    int out = params.inputs - 1;


Initialize stack

The code that preserves the utilized registers and handles the stack is generated in the first section.

    C(0x53); //push ebx 
    
    /// Load chromosome ptr to ebx
    C(0xbb);CI(nodeoutput);
    

Native code generation

Then, the CGP chromosome is interpreted and corresponding machine code is generated. The procedure that translates CGP chromosome to the machine code looks nearly the same as the structure of linear CGP interpreter. Instead of evaluation of each node, corresponding machine code is generated.

    for (int i=0; i < params.cols; i++) 
        for (int j=0; j < params.rows; j++) 
        { 
            in1 = *p_chrom++; in2 = *p_chrom++; fce = *p_chrom++; out++;
            #ifdef DONOTEVALUATEUNUSEDNODES
            if (!isused[out]) continue;
            #endif


The machine code for the given set of primitive functions was obtained using provided online generator. Register %ebx holds the pointer to first item of nodeoutput array which stores the node outputs. Variables in1 and in2 contain the index of a node that is connected to the first input and second input respectivelly. Variable out contains the index of current node.

            switch (fce) 
            {
              case 0: // BUFFER
                 /// data[out] = data[in1];
                 C(0x8b);C(0x83);CI(4*in1);                                  //mov    in1(%ebx),%eax
                 C(0x89);C(0x83);CI(4*out);                                  //mov    %eax,out(%ebx)
                 ///code size: 12, stack requirements: 0
                 ///registers: %ebx, %eax
                 break;

              case 1: // NOT
                 ///data[out] = ~data[in1];
                 C(0x8b);C(0x83);CI(4*in1);                                  //mov    in1(%ebx),%eax
                 C(0xf7);C(0xd0);                                            //not    %eax
                 C(0x89);C(0x83);CI(4*out);                                  //mov    %eax,out(%ebx)
                 ///code size: 14, stack requirements: 0
                 ///registers: %ebx, %eax
                 break;
                      
              case 2: // AND
                 ///data[out] = data[in1] & data[in2];
                 C(0x8b);C(0x83);CI(4*in1);                                  //mov    in1(%ebx),%eax
                 C(0x23);C(0x83);CI(4*in2);                                  //and    in2(%ebx),%eax
                 C(0x89);C(0x83);CI(4*out);                                  //mov    %eax,out(%ebx)
                 ///code size: 18, stack requirements: 0
                 ///registers: %ebx, %eax
                 break;

              case 3: // OR
                 ///data[out] = data[in1] | data[in2];
                 C(0x8b);C(0x83);CI(4*in1);                                  //mov    in1(%ebx),%eax
                 C(0x0b);C(0x83);CI(4*in2);                                  //or     in2(%ebx),%eax
                 C(0x89);C(0x83);CI(4*out);                                  //mov    %eax,out(%ebx)
                 ///code size: 18, stack requirements: 0
                 ///registers: %ebx, %eax
                 break;

              case 4: // XOR
                 ///data[out] = data[in1] ^ data[in2];
                 C(0x8b);C(0x83);CI(4*in1);                                  //mov    in1(%ebx),%eax
                 C(0x33);C(0x83);CI(4*in2);                                  //xor    in2(%ebx),%eax
                 C(0x89);C(0x83);CI(4*out);                                  //mov    %eax,out(%ebx)
                 ///code size: 18, stack requirements: 0
                 ///registers: %ebx, %eax
                 break;

              case 5: // NAND
                 ///data[out] = ~(data[in1] & data[in2]);
                 C(0x8b);C(0x83);CI(4*in1);                                  //mov    in1(%ebx),%eax
                 C(0x23);C(0x83);CI(4*in2);                                  //and    in2(%ebx),%eax
                 C(0xf7);C(0xd0);                                            //not    %eax
                 C(0x89);C(0x83);CI(4*out);                                  //mov    %eax,out(%ebx)
                 ///code size: 20, stack requirements: 0
                 ///registers: %ebx, %eax
                 break;

              case 6: // NOR
                 ///data[out] = ~(data[in1] | data[in2]);
                 C(0x8b);C(0x83);CI(4*in1);                                  //mov    in1(%ebx),%eax
                 C(0x0b);C(0x83);CI(4*in2);                                  //or     in2(%ebx),%eax
                 C(0xf7);C(0xd0);                                            //not    %eax
                 C(0x89);C(0x83);CI(4*out);                                  //mov    %eax,out(%ebx)
                 ///code size: 20, stack requirements: 0
                 ///registers: %ebx, %eax
                 break;

              case 7: // XNOR
                 ///data[out] = ~(data[in1] ^ data[in2]);
                 C(0x8b);C(0x83);CI(4*in1);                                  //mov    in1(%ebx),%eax
                 C(0x33);C(0x83);CI(4*in2);                                  //xor    in2(%ebx),%eax
                 C(0xf7);C(0xd0);                                            //not    %eax
                 C(0x89);C(0x83);CI(4*out);                                  //mov    %eax,out(%ebx)
                 ///code size: 20, stack requirements: 0
                 ///registers: %ebx, %eax
                 break;

              default:
                 abort();
            }
    }


Finalize stack and return

Finally, the instructions restoring the values of previously stored registers are generated.

    C(0x5B); //pop ebx 

    C(0xc3); //ret
}


Calculate fitness values

This function determines the fitness value of each candidate solution except parental solution parentidx. The calculated fitness values are stored in array denoted as fitvalues.

The process of fitness function calculation consists of two phases. In the first phase, the CGP chromosome is translated to machine code that resides in the application's address space. The process of translation exhibits linear time complexity with respect to the number of CGP nodes. The second phase involves the calculation of response for the training vectors and fitness value. This procedure executes the obtained machine code stored in array code for each training vector.

inline void calc_fitness(int parentidx) 
{
    chromosome p_chrom;
    register int fit, vysl;
    int *ptraindata = data;

    ///generate native code for each candidate solution
    for (int i=0; i < params.popsize; i++) 
    {
        if (i == parentidx) continue;

        #ifdef DONOTEVALUATEUNUSEDNODES
        usednodes[i] = used_nodes((chromosome) population[i], isused[i]);
        #endif
        cgp_compile(code[i], (chromosome) population[i], isused[i]);
        fitvalues[i] = 0;
    }
    
    for (int l=0; l < params.trainingvectors; l++) 
    {
        ///copy the first part of a training vector to the primary inputs
        memcpy(nodeoutput, ptraindata, params.inputs*sizeof(nodetype));
        ptraindata += params.inputs;

        ///determine and check response of each candidate solution
        for (int i=0; i < params.popsize; i++) 
        {
            if (i == parentidx) continue;
            
            /// cgp_eval((chromosome) population[i], isused[i]);
            ((evalfunc *)(code[i]))();

            ///compare the output values against training vector (specification)
            fit = 0;
            p_chrom = (chromosome) population[i] + params.geneoutidx;
            for (int k=0; k < params.outputs; k++) 
            {  
                vysl = (nodeoutput[*p_chrom++] ^ *(ptraindata +k));
                fit += zeroscount(vysl);
            }

            fitvalues[i] += fit;
        }

        ///next training vector
        ptraindata += params.outputs; 
    }
}


Number of utilized nodes

This function calculates the number of nodes that contribute to the resulting phenotype that is encoded by a given genotype.

int used_nodes(chromosome p_chrom, int* isused) 
{
    int in, fce, idx, used = 0;
    int *pchrom;

    memset(isused, 0, params.lastnodeidx*sizeof(int));


mark nodes connected to the primary outputs

    pchrom = p_chrom + params.geneoutidx;
    for (int i=0; i < params.outputs; i++) 
        isused[*pchrom++] = 1;


go throught the cgp array

    pchrom = p_chrom + params.geneoutidx - 1;
    idx = params.lastnodeidx-1;
    for (int i=params.cols; i > 0; i--) 
    {
        for (int j=params.rows; j > 0; j--,idx--) 
        {
            fce = *pchrom--; //fce
            if (isused[idx] == 1) 
            {  

the current node is marked, mark also the connected nodes

               in = *pchrom--; // in2
               if (fce > 1)    // 2-input functions
                  isused[in] = 1;
               in = *pchrom--; // in1
               isused[in] = 1;

               used++;
            } else {

the current node is not market, skip it

               pchrom -= params.nodeinputs;
            }
        }
    }

    return used;
}



Main application

Evolutionary design of combinational circuits using Cartesian Genetic Programming (32-bit i686 implementation).

Usage:

cgp_native truth_table.txt [-c COLUMNS] [-r ROWS] [-l LBACK] \
[-g GENERATIONS] [-p POPSIZE] [-m MUTATEDGENES]
int main(int argc, char* argv[])
{
    using namespace std; 
    int blk, bestblk, data_items, parentidx, fittest_idx;
    unsigned long int generation; 
    fitvaltype  bestfitval;

    strcpy(params.datafname, "data.txt"); 
    parse_options(argc, argv);
    ///load training data
    if ((data_items = parsefile(params.datafname, NULL, NULL, NULL)) < 1) 
    { 
       printf("Invalid data\n"); return 0; 
    }
    data = new int[data_items];
    parsefile(params.datafname, data, &params.inputs, &params.outputs);
    params.trainingvectors = data_items / (params.inputs+params.outputs); //Spocitani poctu pruchodu pro ohodnoceni
    params.maxfitval = params.trainingvectors*params.outputs*32;         //Vyitems max. fitness
    printf("Training data:\n   file:%s, items:%d, inputs:%d, outputs:%d\n", params.datafname, data_items, params.inputs, params.outputs);

    init_paramsandluts();
   
    ///memory allocation
    for (int i=0; i < params.popsize; i++) 
    {
        population[i] = new chromosome [params.geneoutidx + params.outputs];
        isused[i] = new int [params.genes];
        code[i] = malloc_aligned(params.cols*params.rows*MAXCODESIZE + 64);
    }
    nodeoutput = new nodetype [params.genes];
    memset(nodeoutput, 0, sizeof(nodetype) * params.genes);
    cgp_init_consts();

    srand((unsigned) time(NULL));


Initial population and its evaluation

The initial population which consists of params.popsize individuals is generated randomly.


    for (int i=0; i < params.popsize; i++) 
    {
        chromosome p_chrom = (chromosome) population[i];
        for (int j=0; j < params.cols*params.rows; j++) 
        {
            int col = (int)(j / params.rows);
            for (int k=0; k < params.nodeinputs; k++)  // node inputs
                *p_chrom++ = col_values[col]->values[(rand() % (col_values[col]->items))];
            *p_chrom++ = rand() % params.nodefuncs; // node function
        }
        for (int j=0; j < params.outputs; j++) // primary outputs
            *p_chrom++ = rand() % params.lastnodeidx;
    }
  

Evaluate the initial population and find the fittest candidate solution that becomes parent

    calc_fitness(-1);
    bestfitval = fitvalues[0]; fittest_idx = 0;
    for (int i=1; i < params.popsize; i++) 
        if (fitvalues[i] > bestfitval) 
        {
           bestfitval = fitvalues[i];
           fittest_idx = i;
        }

    printf("CGP parameters:\n   l-back=%d, rows=%d, cols=%d, functions=%d\n", params.lback, params.rows, params.cols, params.nodefuncs);
    printf("   popsize=%d, mutgenes=%d, generations=%d\n", params.popsize, params.mutations, params.maxgenerations);
    #ifdef DONOTEVALUATEUNUSEDNODES
    printf("   evalunused=false\n");
    #endif
    printf("Evolutionary run:\n   initial fitness=%d, best fitness=%d\n", bestfitval, params.maxfitval);
    double time = cpuTime(); 


Evolutionary loop

    for (int generation = 0; generation < params.maxgenerations; generation++) 
    {


Step 1

Generate offsprings of the fittest individual

        for (int i=0; i < params.popsize;  i++) 
        {
            if (fittest_idx == i) continue;
            cgp_mutate(copy_chromozome(population[fittest_idx], population[i]));
        }


Step 2

Evaluate population and calculate fitness values

        calc_fitness(fittest_idx);


Step 3

Check if there is an offspring that should replace parental solution

        int newparidx = fittest_idx;
        for (int i=0; i < params.popsize; i++) 
        { 
            fitvaltype fit = fitvalues[i];
            if ((i == fittest_idx) || (fit < bestfitval)) continue; 


current individual is at least of the same quality as its parent

            if (fit < params.maxfitval) 
            {  //current individual does not 
               if (fit > bestfitval) 
                  printf("Generation: %-8d\tFitness: %d/%d\n",generation, fit, params.maxfitval);
               bestfitval = fit;
               bestblk = params.nodes;
               newparidx = i;
            } 
            else 
            {  //second optimization criterion - the number of utilized nodes
               #ifdef DONOTEVALUATEUNUSEDNODES
               blk = usednodes[i];
               #else
               blk = used_nodes((chromosome) population[i], isused[i]);
               #endif
               if (blk <= bestblk) 
               {
                  if ((blk < bestblk) || (fit > bestfitval))
                     printf("Generation: %-8d\tFitness: %d/%d\tNodes: %d\n",generation, fit, params.maxfitval, blk);

                  bestfitval = fit;
                  bestblk = blk;
                  newparidx = i;
               }
            }
            
        }

        fittest_idx = newparidx;
    }


End of evolution

    time = cpuTime() - time;
    printf("Best fitness: %d/%d\n",bestfitval,params.maxfitval);
    printf("Best individual: ");
    print_chrom(stdout, (chromosome)population[fittest_idx]);
    printf("Duration: %f Evaluations per sec: %f", time, params.maxgenerations*(params.popsize-1) / time);
    
    if (bestfitval == params.maxfitval) {

save the solution

        FILE *chrfil = fopen("result.chr","wb");
        print_chrom(chrfil, (chromosome)population[fittest_idx]);
        fclose(chrfil);
    }

    return 0;
}