CGP acceleration

Principle

In order to evaluate the fitness function the candidate solution has to be simulated. This step involves the interpretation of a CGP genotype for each vector. Since the fitness evaluation time dominates the time of whole evolutionary process in vast majority of problems and moreover a large number of evaluations is usually needed to achieve satisfactory result, it is important to simulate candidate solutions effectively.

In order to maximize the overall performance, we have replaced the interpreter with corresponding optimized native machine code that directly calculates response for a single training vector. The machine code is in linear form, i.e., without conditional branches hurting performance. We have developed a method which allows the users to significantly accelerate the evaluation of candidate solutions without having hardly any knowledge of assembly language or target machine code. Moreover, the integration of the machine code compiler requires modifying only a few lines of code.

As it can be seen on a simple case study CGP can process several times higher number of evaluations on the same machine introducing native evaluation. Even if we used optimized C implementation of CGP, we were able to evaluate more than five times higher number of candidate solutions using the same machine. Even when we used a highly optimized C implementation of the interpreted CGP, we were still able to evaluate more than five times greater amount of candidate solutions by using the native code version.

Details can be found in our paper Efficient Phenotype Evaluation in Cartesian Genetic Programming presented at EuroGP 2012. This paper contains evaluation in fixed as well as floating-point domain where one can achieve even better results.

How to integrate native evaluation

The process of fitness value calculation consists of two phases. In the first phase the CGP chromosome is compiled or more precisely translated to machine code that resides in the application's address space. The second phase involves the calculation of response for the training vectors and fitness value.

The whole process is designed to maximize the performance a minimize overhead caused by translation. Thus the evaluation procedure consists of a simple call only which executes the obtained machine code for each training vector. It means that the process of compilation is executed once per fitness evaluation.

To integrate the native evaluation one has to modify function that evaluates fitness value of the whole population. The modification consists of calling code compiler and replacing the common interpreter with simple call that invokes correct compiled machine code. Apart from that it is of course necessary to initialize the arrays utilised by the native evaluation.

The changes you need to make are highlighted in the following code snippet.

+ #include "native.h"
 ...
+ //code generated using this website
+ unsigned char* consts;
+ void cgp_init_consts()
+ {
+ ...
+ }

+ unsigned char* code[MAX_POPSIZE];
+ #define MAXCODESIZE 20
+ void cgp_compile(unsigned char* pcode, chromosome p_chrom, int* isused) 
+ {
+ ...
+ }

void calc_fitness(int parentidx) 
{
    //variable declarations
    ...
    //initialize
    for (int i=0; i < params.popsize; i++) 
    {
        if (i == parentidx) continue;
        #ifdef DONOTEVALUATEUNUSEDNODES
        usednodes[i] = used_nodes((chromosome) population[i], isused[i]);
        #endif
        fitvalues[i] = 0;
+       cgp_compile(code[i], (chromosome) population[i], isused[i]);
    }
    for (int l=0; l < params.trainingvectors; l++) 
    {
        //copy the first part of a training vector to the primary inputs
        memcpy(nodeoutput, ptraindata, params.inputs*sizeof(nodetype));
        ptraindata += params.inputs;
        //determine and process the response of each candidate solution
        for (int i=0; i < params.popsize; i++) 
        {
            if (i == parentidx) continue;
-           cgp_eval((chromosome) population[i], isused[i]);
+           ((evalfunc *)(code[i]))();
            //compare the output values against training vector (calculate fit value)
	    ...
            fitvalues[i] += fit;
        }
        //next training vector
        ptraindata += params.outputs; 
    }
}
...
int main(int argc, char* argv[])
{
    ...
    //memory allocation
    for (int i=0; i < params.popsize; i++) 
    {
        population[i] = new chromosome [params.geneoutidx + params.outputs];
        isused[i] = new int [params.genes];
+       code[i] = malloc_aligned(params.cols*params.rows*MAXCODESIZE + 64);
    }
+   cgp_init_consts();
    ...
}

If you don't believe that the proposed acceleration technique is really simple to integrate then just compare source codes which are available in the next section.

Case study (evolutionary design of logic circuits)

The impact of the proposed method on the performance is evaluated using one of common CGP applications - evolutionary design of combinational circuits. Even if software implementations of CGP (intended for evolution of logic circuits) can strongly benefit from the so-called parallel simulation we can easily demonstrate that the performance can be increased by introducing native evaluation.

Results

We have implemented common CGP and accelerated version of CGP and evaluated the performance using common benchmark problem - evolutionary design of multipliers at gate-level. The tables below show the average number of evaluations evaluated per a second. The average has been calculated using twenty independent runs. Note that the flag DONOTEVALUATEUNUSEDNODES was not set to provide statistically significant results.

32-bit implementation

Hamming distance utilized in fitness function was calculated using either four lookups into 8-bit LUT table (LUT) or 32-bit popcnt instruction available in SSE4 (POPCNT)

benchmark	accelerated CGP implementation		common CGP implementation
benchmark	LUT	POPCNT	LUT	POPCNT
multiplier 5x5	71 000(5.3x)	78 568(5.7x)	13 318	13 675
multiplier 6x6	22 432(6.8x)	26 137(7.7x)	3 290	3 391
multiplier 7x7	5 576(6.9x)	6 657(7.9x)	808	842
CGP parameters: C=80, R=1, L=40 Node functions: BUF, NOT, AND, OR, XOR, NAND, NOR, XNOR CPU: Intel Core 2 DUO P8400 @2.26 GHz OS: Win XP, 32-bit

The speedup vary between 5x and 8x depending on the problem size. The larger number of training vectors, the higher speedup. This effect is primarily caused by the overhead introduced by the compilation of CGP phenotypes into machine code.

64-bit implementation

Hamming distance utilized in fitness function was calculated using either eight lookups into 8-bit LUT table (LUT) or 64-bit popcnt instruction available in SSE4 (POPCNTL)

benchmark	accelerated CGP implementation		common CGP implementation
benchmark	LUT	POPCNT	LUT	POPCNT
multiplier 5x5	107 051(3.3x)	136 449(3.8x)	32 677	36 016
multiplier 6x6	40 318(5.0x)	62 794(6.9x)	8 126	9 123
multiplier 7x7	10 614(5.3x)	19 137(8.5x)	1 992	2 246
multiplier 8x8	2 442(5.0x)	4 714(8.4x)	492	560
CGP parameters: C=80, R=1, L=40 Node functions: BUF, NOT, AND, OR, XOR, NAND, NOR, XNOR CPU: Intel Xeon X5650 @2.66 GHz (12 cores, 15 MB cache) OS: CentOS Linux, 64-bit

Similarily to the previous results the speedup vary between 3x and 8x depending on the problem size. If we compare the average number of evaluations evaluated per a second of the 32-bit and 64-bit common CGP implementation, the resulting ratio is approximately 2.4 which corresponds with our expectation as a) the 64-bit implementation evaluates 64 inputs vectors instead of 32 and b) the 64-bit CPU is running at approx. 1.17 times higher frequency.

Get code

Evolutionary design of combinational circuits

cgp.zip (common implementation)
cgp_native.zip (accelerated version)

Symbolic regression

cgp.zip (common implementation)
cgp_native.zip (accelerated version)

Licence

Available code is licensed under BUT Open Source Licence

Source code (32-bit implementation)

show / hide source code annotation

cgp.cpp
¶ Common CGP implementation (cgp.cpp) Cartesian genetic programming, introduced by Julian Miller and Peter Thomson in 2000, is a variant of genetic programming where the genotype is represented as a list of integers that are mapped to directed oriented graphs. In order to evaluate the fitness function the response for each training vector has to be calculated. This step involves the interpretation of a CGP genotype for each vector. In this implementation we are using linear interpreter that interprets CGP chromosome.	/=============================================================================== cgp.cpp: Evolutionary design of combinational circuits using CGP =============================================================================== Copyright (C) 2012 Brno University of Technology, Faculty of Information Technology Author(s): Zdenek Vasicek <vasicek AT fit.vutbr.cz> =============================================================================/ /// #define HAVE_POPCNT /// #define DONOTEVALUATEUNUSEDNODES #define MAX_POPSIZE 100 typedef int fitvaltype; typedef int nodetype; #include "cgp.h" #include "circ.h" pchromosome population[MAX_POPSIZE]; fitvaltype fitvalues[MAX_POPSIZE]; /// default parameters tparams params = {5000000 /generations/, 5 /pop.size/, 1 /mut.genes/, 0, 0, 80 /cols/, 1 /rows/, 40 /lback/, 2, 1, 8 /functions/}; nodetype* nodeoutput; //array of node outputs used for CGP evaluation int* isused[MAX_POPSIZE]; //array of marked used nodes for each individual int usednodes[MAX_POPSIZE]; //number of used nodes for each individual int* data; //training data col_validvalues **col_values; //valid gene values for each column
¶ CGP mutation operator This function modifies `params.mutations` randomly selected genes of a given genotype. The modification is done in place. The implementation of the mutation operator has to ensure that the modifications are legal and lead to a valid phenotype. This is done using `col_values` array which contains valid values that can occur in each column of CGP array.	inline void cgp_mutate(chromosome p_chrom) { int rnd = rand() % params.mutations; int genes = rnd + 1; for (int j = 0; j < genes; j++) { int i = rand() % (params.geneoutidx + params.outputs); int col = (int) (i / params.genespercol); if (i < params.geneoutidx) { if ((i % params.nodeios) < params.nodeinputs) { //mutate a gene that encodes connection do { rnd = col_values[col]->values[(rand() % (col_values[col]->items))]; } while (rnd == p_chrom[i]); p_chrom[i] = rnd; } else { //mutate a gene that encodes function if (params.nodefuncs > 1) { do { rnd = rand() % params.nodefuncs; } while (rnd == p_chrom[i]); p_chrom[i] = rnd; } } } else { //mutate a primary output do { rnd = rand() % params.lastnodeidx; } while (rnd == p_chrom[i]); p_chrom[i] = rnd; } } }
¶ CGP evaluation This function simulates the encoded candidate solution and calculates response for single input vector. The response for each CGP node is stored in array `nodeoutput`. The first `params.inputs` items contain input vector. The interpreter consists of a loop that calculates the response for each CGP node according to the genes of CGP chromosome. The execution of the encoded graph starts from the first node and continues according to the increasing node index. This scheme represents the most efficient implementation as it does not introduce any overhead due to function calling that have to manipulate with stack. In contrast with the recursive approach, all the output values are calculated in one pass. In order to improve the performance, a simple preprocessing step that marks the utilized nodes only can be introduced (by enabling `DONOTEVALUATEUNUSEDNODES` parameter). Only the marked nodes are subsequently evaluated.	inline int cgp_eval(chromosome p_chrom, int* isused) { int in1,in2,fce,i,j; int pnodeout = nodeoutput + params.inputs; #ifdef DONOTEVALUATEUNUSEDNODES isused += params.inputs; #endif /// Evaluate the response of each node for (i=0; i < params.cols; i++) for (j=0; j < params.rows; j++) { #ifdef DONOTEVALUATEUNUSEDNODES if (!isused++) { //This node is not used, skip it p_chrom += 3; pnodeout++; continue; } #endif in1 = nodeoutput[p_chrom++]; in2 = nodeoutput[p_chrom++]; fce = p_chrom++; switch (fce) { case 0: pnodeout++ = in1; break; //in1 case 1: pnodeout++ = ~in1; break; //not in1 case 2: pnodeout++ = in1 & in2; break; //and case 3: pnodeout++ = in1 \| in2; break; //or case 4: pnodeout++ = in1 ^ in2; break; //xor case 5: pnodeout++ = ~(in1 & in2); break; //nand case 6: pnodeout++ = ~(in1 \| in2); break; //nor case 7: *pnodeout++ = ~(in1 ^ in2); break; //xnor default: abort(); } } }
¶ Calculate fitness values This function determines the fitness value of each candidate solution except parental solution `parentidx`. The calculated fitness values are stored in array denoted as `fitvalues`. In order to calculate the fitness value of each candidate solution the response for each training data has to be evaluated. In this implementation the fitness value correspons with the Hamming distance between the calculated and desired response. The fitness function calls `cgp_eval` procedure that, given the CGP calculated outputs data and a candidate solution, evaluates the response for a single input vector (i.e. single fitness case stored in array denoted as `ptraindata`). Then, according to the information about output connections stored in chromosome, the fitness value of a genotype for the utilized input vector is calculated. These steps are repeated until last training vector is evaluated.	inline void calc_fitness(int parentidx) { chromosome p_chrom; register int fit, vysl; int ptraindata = data; for (int i=0; i < params.popsize; i++) { if (i == parentidx) continue; #ifdef DONOTEVALUATEUNUSEDNODES usednodes[i] = used_nodes((chromosome) population[i], isused[i]); #endif fitvalues[i] = 0; } for (int l=0; l < params.trainingvectors; l++) { ///copy the first part of a training vector to the primary inputs memcpy(nodeoutput, ptraindata, params.inputssizeof(nodetype)); ptraindata += params.inputs; ///determine and check response of each candidate solution for (int i=0; i < params.popsize; i++) { if (i == parentidx) continue; cgp_eval((chromosome) population[i], isused[i]); ///compare the output values against training vector (specification) fit = 0; p_chrom = (chromosome) population[i] + params.geneoutidx; for (int k=0; k < params.outputs; k++) { vysl = (nodeoutput[p_chrom++] ^ (ptraindata +k)); fit += zeroscount(vysl); } fitvalues[i] += fit; } ///next training vector ptraindata += params.outputs; } }
¶ Number of utilized nodes This function calculates the number of nodes that contribute to the resulting phenotype that is encoded by a given genotype.	int used_nodes(chromosome p_chrom, int* isused) { int in, fce, idx, used = 0; int pchrom; memset(isused, 0, params.lastnodeidxsizeof(int));
¶ mark nodes connected to the primary outputs	pchrom = p_chrom + params.geneoutidx; for (int i=0; i < params.outputs; i++) isused[*pchrom++] = 1;
¶ go throught the cgp array	pchrom = p_chrom + params.geneoutidx - 1; idx = params.lastnodeidx-1; for (int i=params.cols; i > 0; i--) { for (int j=params.rows; j > 0; j--,idx--) { fce = *pchrom--; //fce if (isused[idx] == 1) {
¶ the current node is marked, mark also the connected nodes	in = pchrom--; // in2 if (fce > 1) // 2-input functions isused[in] = 1; in = pchrom--; // in1 isused[in] = 1; used++; } else {
¶ the current node is not market, skip it	pchrom -= params.nodeinputs; } } } return used; }
¶ Main application Evolutionary design of combinational circuits using Cartesian Genetic Programming. Usage: `cgp truth_table.txt [-c COLUMNS] [-r ROWS] [-l LBACK] \ [-g GENERATIONS] [-p POPSIZE] [-m MUTATEDGENES]`	int main(int argc, char* argv[]) { using namespace std; int blk, bestblk, data_items, parentidx, fittest_idx; unsigned long int generation; fitvaltype bestfitval; strcpy(params.datafname, "data.txt"); parse_options(argc, argv); ///load training data if ((data_items = parsefile(params.datafname, NULL, NULL, NULL)) < 1) { printf("Invalid data\n"); return 0; } data = new int[data_items]; parsefile(params.datafname, data, &params.inputs, &params.outputs); params.trainingvectors = data_items / (params.inputs+params.outputs); //Spocitani poctu pruchodu pro ohodnoceni params.maxfitval = params.trainingvectorsparams.outputs32; //Vyitems max. fitness printf("Training data:\n file:%s, items:%d, inputs:%d, outputs:%d\n", params.datafname, data_items, params.inputs, params.outputs); init_paramsandluts(); ///memory allocation for (int i=0; i < params.popsize; i++) { population[i] = new chromosome [params.geneoutidx + params.outputs]; isused[i] = new int [params.genes]; } nodeoutput = new nodetype [params.genes]; memset(nodeoutput, 0, sizeof(nodetype) * params.genes); srand((unsigned) time(NULL));
¶ Initial population and its evaluation The initial population which consists of `params.popsize` individuals is generated randomly.	for (int i=0; i < params.popsize; i++) { chromosome p_chrom = (chromosome) population[i]; for (int j=0; j < params.colsparams.rows; j++) { int col = (int)(j / params.rows); for (int k=0; k < params.nodeinputs; k++) // node inputs p_chrom++ = col_values[col]->values[(rand() % (col_values[col]->items))]; p_chrom++ = rand() % params.nodefuncs; // node function } for (int j=0; j < params.outputs; j++) // primary outputs p_chrom++ = rand() % params.lastnodeidx; }
¶ Evaluate the initial population and find the fittest candidate solution that becomes parent	calc_fitness(-1); bestfitval = fitvalues[0]; fittest_idx = 0; for (int i=1; i < params.popsize; i++) if (fitvalues[i] > bestfitval) { bestfitval = fitvalues[i]; fittest_idx = i; } printf("CGP parameters:\n l-back=%d, rows=%d, cols=%d, functions=%d\n", params.lback, params.rows, params.cols, params.nodefuncs); printf(" popsize=%d, mutgenes=%d, generations=%d\n", params.popsize, params.mutations, params.maxgenerations); #ifdef DONOTEVALUATEUNUSEDNODES printf(" evalunused=false\n"); #endif printf("Evolutionary run:\n initial fitness=%d, best fitness=%d\n", bestfitval, params.maxfitval); double time = cpuTime();
¶ Evolutionary loop	for (int generation = 0; generation < params.maxgenerations; generation++) {
¶ Step 1 Generate offsprings of the fittest individual	for (int i=0; i < params.popsize; i++) { if (fittest_idx == i) continue; cgp_mutate(copy_chromozome(population[fittest_idx], population[i])); }
¶ Step 2 Evaluate population and calculate fitness values	calc_fitness(fittest_idx);
¶ Step 3 Check if there is an offspring that should replace parental solution	int newparidx = fittest_idx; for (int i=0; i < params.popsize; i++) { fitvaltype fit = fitvalues[i]; if ((i == fittest_idx) \|\| (fit < bestfitval)) continue;
¶ current individual is at least of the same quality as its parent	if (fit < params.maxfitval) { //current individual does not if (fit > bestfitval) printf("Generation: %-8d\tFitness: %d/%d\n",generation, fit, params.maxfitval); bestfitval = fit; bestblk = params.nodes; newparidx = i; } else { //second optimization criterion - the number of utilized nodes #ifdef DONOTEVALUATEUNUSEDNODES blk = usednodes[i]; #else blk = used_nodes((chromosome) population[i], isused[i]); #endif if (blk <= bestblk) { if ((blk < bestblk) \|\| (fit > bestfitval)) printf("Generation: %-8d\tFitness: %d/%d\tNodes: %d\n",generation, fit, params.maxfitval, blk); bestfitval = fit; bestblk = blk; newparidx = i; } } } fittest_idx = newparidx; }
¶ End of evolution	time = cpuTime() - time; printf("Best fitness: %d/%d\n",bestfitval,params.maxfitval); printf("Best individual: "); print_chrom(stdout, (chromosome)population[fittest_idx]); printf("Duration: %f Evaluations per sec: %f", time, params.maxgenerations*(params.popsize-1) / time); if (bestfitval == params.maxfitval) {
¶ save the solution	FILE *chrfil = fopen("result.chr","wb"); print_chrom(chrfil, (chromosome)population[fittest_idx]); fclose(chrfil); } return 0; }

show / hide source code annotation

cgp_native.cpp
¶ Efficient CGP implementation (cgp_native.cpp) Cartesian genetic programming, introduced by Julian Miller and Peter Thomson in 2000, is a variant of genetic programming where the genotype is represented as a list of integers that are mapped to directed oriented graphs. In order to evaluate the fitness function, the response for each training vector has to be calculated. This step involves the interpretation of a CGP genotype for each vector. To maximize the overall performance, we introduced JIT compilation of genotype to binary machine code.	/=============================================================================== cgp_native.cpp: Evolutionary design of combinational circuits using CGP =============================================================================== Copyright (C) 2012 Brno University of Technology, Faculty of Information Technology Author(s): Zdenek Vasicek <vasicek AT fit.vutbr.cz> =============================================================================/ /// #define HAVE_POPCNT /// #define DONOTEVALUATEUNUSEDNODES #define MAX_POPSIZE 100 typedef int fitvaltype; typedef int nodetype; #include "cgp.h" #include "circ.h" #include "native.h" pchromosome population[MAX_POPSIZE]; fitvaltype fitvalues[MAX_POPSIZE]; ///default parameters tparams params = {5000000 /generations/, 5 /pop.size/, 1 /mut.genes/, 0, 0, 80 /cols/, 1 /rows/, 40 /lback/, 2, 1, 8 /functions/}; nodetype* nodeoutput; //array of node outputs used for CGP evaluation int* isused[MAX_POPSIZE]; //array of marked used nodes for each individual int usednodes[MAX_POPSIZE]; //number of used nodes for each individual int* data; //training data col_validvalues **col_values; //valid gene values for each column
¶ CGP mutation operator This function modifies `params.mutations` randomly selected genes of a given genotype. The modification is done in place. The implementation of the mutation operator has to ensure that the modifications are legal and lead to a valid phenotype. This is done using `col_values` array which contains valid values that can occur in each column of CGP array.	inline void cgp_mutate(chromosome p_chrom) { int rnd = rand() % params.mutations; int genes = rnd + 1; for (int j = 0; j < genes; j++) { int i = rand() % (params.geneoutidx + params.outputs); int col = (int) (i / params.genespercol); if (i < params.geneoutidx) { if ((i % params.nodeios) < params.nodeinputs) { //mutate a gene that encodes connection do { rnd = col_values[col]->values[(rand() % (col_values[col]->items))]; } while (rnd == p_chrom[i]); p_chrom[i] = rnd; } else { //mutate a gene that encodes function if (params.nodefuncs > 1) { do { rnd = rand() % params.nodefuncs; } while (rnd == p_chrom[i]); p_chrom[i] = rnd; } } } else { //mutate a primary output do { rnd = rand() % params.lastnodeidx; } while (rnd == p_chrom[i]); p_chrom[i] = rnd; } } } unsigned char* consts; void cgp_init_consts() { #define B(val) (pconst++)=(val) unsigned char pconst; pconst = consts = malloc_aligned(1024); ///Initialize constants } #define MAXCODESIZE 20 typedef void evalfunc(void); unsigned char* code[MAX_POPSIZE];
¶ Native code generation The process of CGP genotype compilation contains three sections.	int cgp_compile(unsigned char* pcode, chromosome p_chrom, int* isused) { #define C(val) (pcode++)=(val) #define CI(val) {((uint32_t *)(pcode))= (uint32_t)(val); pcode += sizeof(uint32_t);} int pnodeout = 0; int in1,in2,fce; int out = params.inputs - 1;
¶ Initialize stack The code that preserves the utilized registers and handles the stack is generated in the first section.	C(0x53); //push ebx /// Load chromosome ptr to ebx C(0xbb);CI(nodeoutput);
¶ Native code generation Then, the CGP chromosome is interpreted and corresponding machine code is generated. The procedure that translates CGP chromosome to the machine code looks nearly the same as the structure of linear CGP interpreter. Instead of evaluation of each node, corresponding machine code is generated.	for (int i=0; i < params.cols; i++) for (int j=0; j < params.rows; j++) { in1 = p_chrom++; in2 = p_chrom++; fce = *p_chrom++; out++; #ifdef DONOTEVALUATEUNUSEDNODES if (!isused[out]) continue; #endif
¶ The machine code for the given set of primitive functions was obtained using provided online generator. Register `%ebx` holds the pointer to first item of `nodeoutput` array which stores the node outputs. Variables `in1` and `in2` contain the index of a node that is connected to the first input and second input respectivelly. Variable `out` contains the index of current node.	switch (fce) { case 0: // BUFFER /// data[out] = data[in1]; C(0x8b);C(0x83);CI(4in1); //mov in1(%ebx),%eax C(0x89);C(0x83);CI(4out); //mov %eax,out(%ebx) ///code size: 12, stack requirements: 0 ///registers: %ebx, %eax break; case 1: // NOT ///data[out] = ~data[in1]; C(0x8b);C(0x83);CI(4in1); //mov in1(%ebx),%eax C(0xf7);C(0xd0); //not %eax C(0x89);C(0x83);CI(4out); //mov %eax,out(%ebx) ///code size: 14, stack requirements: 0 ///registers: %ebx, %eax break; case 2: // AND ///data[out] = data[in1] & data[in2]; C(0x8b);C(0x83);CI(4in1); //mov in1(%ebx),%eax C(0x23);C(0x83);CI(4in2); //and in2(%ebx),%eax C(0x89);C(0x83);CI(4out); //mov %eax,out(%ebx) ///code size: 18, stack requirements: 0 ///registers: %ebx, %eax break; case 3: // OR ///data[out] = data[in1] \| data[in2]; C(0x8b);C(0x83);CI(4in1); //mov in1(%ebx),%eax C(0x0b);C(0x83);CI(4in2); //or in2(%ebx),%eax C(0x89);C(0x83);CI(4out); //mov %eax,out(%ebx) ///code size: 18, stack requirements: 0 ///registers: %ebx, %eax break; case 4: // XOR ///data[out] = data[in1] ^ data[in2]; C(0x8b);C(0x83);CI(4in1); //mov in1(%ebx),%eax C(0x33);C(0x83);CI(4in2); //xor in2(%ebx),%eax C(0x89);C(0x83);CI(4out); //mov %eax,out(%ebx) ///code size: 18, stack requirements: 0 ///registers: %ebx, %eax break; case 5: // NAND ///data[out] = ~(data[in1] & data[in2]); C(0x8b);C(0x83);CI(4in1); //mov in1(%ebx),%eax C(0x23);C(0x83);CI(4in2); //and in2(%ebx),%eax C(0xf7);C(0xd0); //not %eax C(0x89);C(0x83);CI(4out); //mov %eax,out(%ebx) ///code size: 20, stack requirements: 0 ///registers: %ebx, %eax break; case 6: // NOR ///data[out] = ~(data[in1] \| data[in2]); C(0x8b);C(0x83);CI(4in1); //mov in1(%ebx),%eax C(0x0b);C(0x83);CI(4in2); //or in2(%ebx),%eax C(0xf7);C(0xd0); //not %eax C(0x89);C(0x83);CI(4out); //mov %eax,out(%ebx) ///code size: 20, stack requirements: 0 ///registers: %ebx, %eax break; case 7: // XNOR ///data[out] = ~(data[in1] ^ data[in2]); C(0x8b);C(0x83);CI(4in1); //mov in1(%ebx),%eax C(0x33);C(0x83);CI(4in2); //xor in2(%ebx),%eax C(0xf7);C(0xd0); //not %eax C(0x89);C(0x83);CI(4out); //mov %eax,out(%ebx) ///code size: 20, stack requirements: 0 ///registers: %ebx, %eax break; default: abort(); } }
¶ Finalize stack and return Finally, the instructions restoring the values of previously stored registers are generated.	C(0x5B); //pop ebx C(0xc3); //ret }
¶ Calculate fitness values This function determines the fitness value of each candidate solution except parental solution `parentidx`. The calculated fitness values are stored in array denoted as `fitvalues`. The process of fitness function calculation consists of two phases. In the first phase, the CGP chromosome is translated to machine code that resides in the application's address space. The process of translation exhibits linear time complexity with respect to the number of CGP nodes. The second phase involves the calculation of response for the training vectors and fitness value. This procedure executes the obtained machine code stored in array `code` for each training vector.	inline void calc_fitness(int parentidx) { chromosome p_chrom; register int fit, vysl; int ptraindata = data; ///generate native code for each candidate solution for (int i=0; i < params.popsize; i++) { if (i == parentidx) continue; #ifdef DONOTEVALUATEUNUSEDNODES usednodes[i] = used_nodes((chromosome) population[i], isused[i]); #endif cgp_compile(code[i], (chromosome) population[i], isused[i]); fitvalues[i] = 0; } for (int l=0; l < params.trainingvectors; l++) { ///copy the first part of a training vector to the primary inputs memcpy(nodeoutput, ptraindata, params.inputssizeof(nodetype)); ptraindata += params.inputs; ///determine and check response of each candidate solution for (int i=0; i < params.popsize; i++) { if (i == parentidx) continue; /// cgp_eval((chromosome) population[i], isused[i]); ((evalfunc )(code[i]))(); ///compare the output values against training vector (specification) fit = 0; p_chrom = (chromosome) population[i] + params.geneoutidx; for (int k=0; k < params.outputs; k++) { vysl = (nodeoutput[p_chrom++] ^ *(ptraindata +k)); fit += zeroscount(vysl); } fitvalues[i] += fit; } ///next training vector ptraindata += params.outputs; } }
¶ Number of utilized nodes This function calculates the number of nodes that contribute to the resulting phenotype that is encoded by a given genotype.	int used_nodes(chromosome p_chrom, int* isused) { int in, fce, idx, used = 0; int pchrom; memset(isused, 0, params.lastnodeidxsizeof(int));
¶ mark nodes connected to the primary outputs	pchrom = p_chrom + params.geneoutidx; for (int i=0; i < params.outputs; i++) isused[*pchrom++] = 1;
¶ go throught the cgp array	pchrom = p_chrom + params.geneoutidx - 1; idx = params.lastnodeidx-1; for (int i=params.cols; i > 0; i--) { for (int j=params.rows; j > 0; j--,idx--) { fce = *pchrom--; //fce if (isused[idx] == 1) {
¶ the current node is marked, mark also the connected nodes	in = pchrom--; // in2 if (fce > 1) // 2-input functions isused[in] = 1; in = pchrom--; // in1 isused[in] = 1; used++; } else {
¶ the current node is not market, skip it	pchrom -= params.nodeinputs; } } } return used; }
¶ Main application Evolutionary design of combinational circuits using Cartesian Genetic Programming (32-bit i686 implementation). Usage: `cgp_native truth_table.txt [-c COLUMNS] [-r ROWS] [-l LBACK] \ [-g GENERATIONS] [-p POPSIZE] [-m MUTATEDGENES]`	int main(int argc, char* argv[]) { using namespace std; int blk, bestblk, data_items, parentidx, fittest_idx; unsigned long int generation; fitvaltype bestfitval; strcpy(params.datafname, "data.txt"); parse_options(argc, argv); ///load training data if ((data_items = parsefile(params.datafname, NULL, NULL, NULL)) < 1) { printf("Invalid data\n"); return 0; } data = new int[data_items]; parsefile(params.datafname, data, &params.inputs, &params.outputs); params.trainingvectors = data_items / (params.inputs+params.outputs); //Spocitani poctu pruchodu pro ohodnoceni params.maxfitval = params.trainingvectorsparams.outputs32; //Vyitems max. fitness printf("Training data:\n file:%s, items:%d, inputs:%d, outputs:%d\n", params.datafname, data_items, params.inputs, params.outputs); init_paramsandluts(); ///memory allocation for (int i=0; i < params.popsize; i++) { population[i] = new chromosome [params.geneoutidx + params.outputs]; isused[i] = new int [params.genes]; code[i] = malloc_aligned(params.colsparams.rowsMAXCODESIZE + 64); } nodeoutput = new nodetype [params.genes]; memset(nodeoutput, 0, sizeof(nodetype) * params.genes); cgp_init_consts(); srand((unsigned) time(NULL));
¶ Initial population and its evaluation The initial population which consists of `params.popsize` individuals is generated randomly.	for (int i=0; i < params.popsize; i++) { chromosome p_chrom = (chromosome) population[i]; for (int j=0; j < params.colsparams.rows; j++) { int col = (int)(j / params.rows); for (int k=0; k < params.nodeinputs; k++) // node inputs p_chrom++ = col_values[col]->values[(rand() % (col_values[col]->items))]; p_chrom++ = rand() % params.nodefuncs; // node function } for (int j=0; j < params.outputs; j++) // primary outputs p_chrom++ = rand() % params.lastnodeidx; }
¶ Evaluate the initial population and find the fittest candidate solution that becomes parent	calc_fitness(-1); bestfitval = fitvalues[0]; fittest_idx = 0; for (int i=1; i < params.popsize; i++) if (fitvalues[i] > bestfitval) { bestfitval = fitvalues[i]; fittest_idx = i; } printf("CGP parameters:\n l-back=%d, rows=%d, cols=%d, functions=%d\n", params.lback, params.rows, params.cols, params.nodefuncs); printf(" popsize=%d, mutgenes=%d, generations=%d\n", params.popsize, params.mutations, params.maxgenerations); #ifdef DONOTEVALUATEUNUSEDNODES printf(" evalunused=false\n"); #endif printf("Evolutionary run:\n initial fitness=%d, best fitness=%d\n", bestfitval, params.maxfitval); double time = cpuTime();
¶ Evolutionary loop	for (int generation = 0; generation < params.maxgenerations; generation++) {
¶ Step 1 Generate offsprings of the fittest individual	for (int i=0; i < params.popsize; i++) { if (fittest_idx == i) continue; cgp_mutate(copy_chromozome(population[fittest_idx], population[i])); }
¶ Step 2 Evaluate population and calculate fitness values	calc_fitness(fittest_idx);
¶ Step 3 Check if there is an offspring that should replace parental solution	int newparidx = fittest_idx; for (int i=0; i < params.popsize; i++) { fitvaltype fit = fitvalues[i]; if ((i == fittest_idx) \|\| (fit < bestfitval)) continue;
¶ current individual is at least of the same quality as its parent	if (fit < params.maxfitval) { //current individual does not if (fit > bestfitval) printf("Generation: %-8d\tFitness: %d/%d\n",generation, fit, params.maxfitval); bestfitval = fit; bestblk = params.nodes; newparidx = i; } else { //second optimization criterion - the number of utilized nodes #ifdef DONOTEVALUATEUNUSEDNODES blk = usednodes[i]; #else blk = used_nodes((chromosome) population[i], isused[i]); #endif if (blk <= bestblk) { if ((blk < bestblk) \|\| (fit > bestfitval)) printf("Generation: %-8d\tFitness: %d/%d\tNodes: %d\n",generation, fit, params.maxfitval, blk); bestfitval = fit; bestblk = blk; newparidx = i; } } } fittest_idx = newparidx; }
¶ End of evolution	time = cpuTime() - time; printf("Best fitness: %d/%d\n",bestfitval,params.maxfitval); printf("Best individual: "); print_chrom(stdout, (chromosome)population[fittest_idx]); printf("Duration: %f Evaluations per sec: %f", time, params.maxgenerations*(params.popsize-1) / time); if (bestfitval == params.maxfitval) {
¶ save the solution	FILE *chrfil = fopen("result.chr","wb"); print_chrom(chrfil, (chromosome)population[fittest_idx]); fclose(chrfil); } return 0; }

CGP acceleration

Principle

How to integrate native evaluation

Case study (evolutionary design of logic circuits)

Results

32-bit implementation

64-bit implementation

Content

Get code

Licence

Source code (32-bit implementation)

cgp.cpp

Common CGP implementation (cgp.cpp)

CGP mutation operator

CGP evaluation

Calculate fitness values

Number of utilized nodes

Main application

Initial population and its evaluation

Evolutionary loop

Step 1

Step 2

Step 3

End of evolution

cgp_native.cpp

Efficient CGP implementation (cgp_native.cpp)