Cartesian genetic programming (CGP), introduced by Julian Miller and Peter Thomson, is a form of genetic programming where the candidate solutions are represented as a string of integers of fixed length that is mapped to directed oriented graph. CGP can efficiently represent common computational structures including mathematical equations, computer programs, neural networks and generally digital circuits.
CGP encodes a candidate solution (typically a circuit or a program) using an array consisting of C x R programmable nodes. The C determines the number of columns whereas R determines the number of rows. Each programmable node has fixed number of inputs (NI) and outputs (NO) (usually NI=2 and NO=1) and can implement one of F predefined primitive functions. Each node input can be connected either to the output of a node placed in the previous L columns or to one of the program inputs. Because of the complicated evaluation, feedback is not allowed in the standard version of CGP. The main feature of CGP is that all the parameters including the number of programmable nodes are fixed. It means that the array of programmable nodes can be encoded as a string of integers which has the fixed number of items.
The candidate solutions are encoded using C x R x (NI + 1) + O integers, where O is the number of primary outputs. The first C x R (NI + 1)-tuples encode the configuration of the CGP nodes (i.e. connection of their inputs and their functions), the last O integers encode the connection of the primary outputs. The main advantage of CGP encoding is that even if the size of chromosome is fixed, the size of phenotype is variable since some nodes need not to be used.
The fitness function usually takes one of two forms. For the symbolic regression problems, a training set is used. The goal is to minimize the difference between the output of a candidate program and required output. For evolution of logic circuits, all possible input combinations are applied at the candidate circuit inputs, the outputs are collected and the goal to minimize the difference between obtained truth table and required truth table.
In both cases the response for each training vector has to be calculated. This step involves the interpretation of a CGP genotype for each vector. One of the key features of CGP encoding is that it can directly be used as an intermediate code that is processed by an interpreter. Two types of interpreters are usually utilized. The interpreter based on recursion and linear interpreter.
The advantage of recursion-based interpreter is that the unconnected nodes are not processed and do not affects the performance of the evaluation. The main disadvantage is large overhead caused by recursive processing. The linear interpreter represents more efficient approach which does not introduce any overhead due to function calling that have to manipulate with stack. In contrast with the recursive approach, all the output values are calculated in one pass. However, all the nodes are evaluated even if they are not connected. In order to improve the performance, a simple preprocessing step that marks the utilized nodes can be introduced. Only the marked nodes are subsequently evaluated.
CGP has been utilized in many areas, the typical problems include
10/2012 64-bit system supported
6/2012 CGP generator released (beta)
4/2012 Presentation at EuroGP2012
Available code is licensed under BUT Open Source Licence