English DNA Computing Czech

DNA Computing

The founder of DNA (deoxyribonucleic acid) computing, one of the most revolutionary discipline of present computer science, is an American mathematician Leopard Adleman. In 1994, he demonstrated the solution of the HPP (Hamiltonian Path Problem), that is, to solve the HPP for a given directed graph, with the aid of DNA strands.

The major benefit of L. Adleman’s idea lies in DNA molecule applications as a data medium. Operations, which can be performed with DNA molecules by means of molecular biology methods, can be used for a massive search of the state space. Theoretically, the molecular computers can surpass existing supercomputers. The molecular computers are estimated to surpass the existing computers not only in operating rate, but also in economical data storage and in low power consumption. However, the practical implementation of the operation with DNA meets many problems that are already familiar among professional geneticists.

Structure of DNA

DNA (deoxyribonucleic acid), found in every living creatures, is a storage medium for a genetic information. DNA is a polymer made of monomers called deoxyribonucleotides (for short nucleotides). Each of nucleotides consists of three basic items: dexyribose sugar, phosphate group and nitrogenous base. Dexyribose comprises five carbon atoms. The phosphate group is attached to the carbon atom 5', the nitrogenous base to the carbon atom 1'. A hydroxyl group (OH) is attached to the carbon atom 3'.

Struktura DNA
Fig.1.1: Structure of nucleotide with thymine base (T)

Various nucleotides differ from each other only with particular bases. There are two types: purins and pyrimidins. Purins are adenine (A) and guanine (G). Pyrimidins are cytosine (C) and thymine (T). Since the base is the only feature through which the nucleotides differ from each other, the abbreviations A, T, C and G are used to identify the individual nucleotides.

There are two ways the nucleotides are bound together: As for the first one, the phosphate group attached to the fifth dexyribose carbon atom is bound with the hydroxyl group attached to the third carbon atom of the next nucleotide. This covalent bond is called phosphodiester. The resulting molecule consists of 5'-phosphate group attached to one nucleotide and of 3'-hydroxil group attached to another nucleotide. In this manner, the direction of the molecule is defined. We speak about the 5'-3' or 3'-5' direction.

The latter way of nucleotide binding is the connection of two particular nucleotide bases by means of the hydrogen bond according to Watson-Crick complementarity. Complementary bases are A/T and C/G. This type of a bond is weak and can be split by mere warming of the molecule soup. All DNA computing applications are based on Watson-Crick complementarity.

Model DNA
Fig.1.2: A DNA molecule model

DNA is naturally not in the form of linear nucleotide clusters (only a phosphodiester bond connection). Under appropriate conditions, two linear DNA clusters are bound together by a hydrogen bond forming a double stranded DNA molecule.

Hamiltonian Path Problem (HPP)

This problem is defined in this way: Let G be an oriented graph with a denoted initial node Vin and a final node Vout. The path from the node Vin to the node Vout is called Hamiltonian one if and only if it includes every node of the graph G only once. Generally, the Hamiltonian Path Problem is formulated as a decision whether or not the given oriented graph contains the Hamiltonian path.

The Hamiltonian Path Problem is NP-complete, which means that the efficient solution of this problem, achievable in polynomial time, probably does not exist and that all general solutions of the problem lead to a massive search of the state space. The number of steps needed to solve the problem increases exponentially according to the size of the graph.

Orientovaný graf
Fig.2.1: Oriented graph G

For instance, let the initial node Vin = 0 and the final node Vout = 6 be in the graph (Fig. 2.1). The Hamiltonian path is created by the edges (0,1) (1,2) (2,3) (3,4) (4,5) and (5,6).

Generally, the HPP is known as a salesman problem. The salesman leaves his town (the initial node) and visits every existing town (node) just once. However, he may move along denoted paths only (along the edges of the graph). The destination of his journey is in a town (the final node) designated in advance. The problem is to find out if the salesman can visit all the towns and if so, which paths he has to move along.

Adleman’s experiment

Adleman’s experiment solves the Hamiltonian Path Problem (HPP) in an oriented graph. The principle lies in coding of information (nodes, edges) in DNA clusters and in use of enzymes for a simulation of simple calculations. The key for the solution of the HPP with the aid of DNA is a massive parallelism DNA calculation and Watson-Crick complementarity.

Orientovaný graf
Fig.3.1: Oriented graph used in Adleman’s experiment

In the experiment, L. Adleman used the following algorithm to solve HPP set by the figure 3.1:

The HPP solving algorithm

  • Input: oriented graph G with n vertices, the initial node Vin (0) and the final node Vout (0) are denoted in the graph.
  • Step 1: generate random paths in the graph (large quantity).
  • Step 2: remove all the paths that do not start in the vertex Vin and do not end in the vertex Vout.
  • Step 3: remove all the paths that do not involve exactly n vertices.
  • Step 4: for each of the n vertices v, remove all the paths that do not involve vertex v.
  • Output: if the set of remaining paths is empty, the result of HPP states „No, the Hamiltonian path does not exist“, otherwise „Yes, the Hamiltonian paths does exist“ and the resulting set contains the solution of HPP.

L. Adleman carried out individual steps of described algorithm with the aid of DNA molecules and of operations with them.

The HPP solution with the aid of DNA molecules

Oligonucleotid s(i) (a shorter artificially created linear DNA cluster) of length of 20 nucleotides (20-mer) is assigned to every node of the graph G. These oligonucleotides have random structure (random base arrangement) and direction 5'-3'. However, they must not be mutually identical.

JAVA applet no.1

App.3.1: Creation of oligonucleotides representing the nodes of the graph G

Also to every edge of the graph G the oligonucleotide e(i,j) of length 20-mer is assigned. Let us divide every oligonucleotide representing the node of the graph into two sections of length 10-mer. Then every existing edge is coded as an oligonucleotide whose first half (10-mer) is complementary to the node, where the edge ends. The orientation of the edges is thus ensured.

JAVA applet no.2

App.3.2: Creation of oligonucleotides representing the edges of the graph G

In the experiment, Adleman mixed a large quantity (50 pmol) of oligonucleotides e(I,j) and s(i) in a solution together with DNA-ligase. From these components, he let the DNA-ligase create different-long double stranded DNA molecules. Due to the Watson-Crick complementarity, the edge oligonucleotides e(i,j) interconnect the node oligonucleotides s(i) that are also interconnected in the actual graph, see Fig. 3.1. Due to the large quantity of the molecules s(i) and e(i,j), a massive parallel process takes place resulting in an amount of molecules coding random passages through the graph G – step 1. There is a possibility that the Hamiltonian path is among the generated paths.

JAVA applet no.3

App.3.3: DNA ligation reaction

Molecules coding the paths in the graph G that have been generated in the previous step are multiplied by a PCR technique (a polymerase chain reaction). This technique consists of three basic steps: denaturation, adding primers and extension. During the denaturation phase, all double stranded DNA molecules are divided by means of high temperature into two complementary molecules. During the primer adding phase, the primers are being tied up to these molecules. We shall use the primers h(s0) and s6 that are complementary to the node s0 resp. to the edge e(i,6). During the extension phase, every partial double stranded DNA molecule is complemented to the complete cluster. This procedure is repeated n-times, which gives a rise to 2^n copies of the molecules that code the paths starting in the node s0 and ending in the node s6 – step 2.

JAVA applet no.4

App.3.4: Polymerase chain reaction (PCR)

Products of the step 2 are sorted according to a size by a gel electrophoresis method (a molecule screen). Electrophoresis is based on the following principle: The DNA molecules are negatively charged. In a electrical field, they move towards the positive electrode. If they move in a solution (a gel) that exhibits the resistance against passing molecules, then the different-long molecules will move with a different velocity. After certain time, the electrical field is turned off and the molecules are located in clearly recognizable bands. The bands uniquely correspond with the lengths of the molecules. In Adleman’s experiment, there is a need to detect a molecul band of length of 140 base couples (7 nodes * 20-mer) coding the paths in the graph G of length 7. We then extract the molecules from the appropriate band – step 3.

JAVA applet no.5

App.3.5: Gel electrophoresis

We repeatedly apply an affinity purification process. First, we disunite the coding path double strands in the graph G to single-stranded DNA molecules. Next, we repeat this procedure for every vertex of the graph G: we generate a complementary copy of every vertex v(i) of the graph G and we attach this copy to magnetic particles. We let the molecules we get in the previous step reassociate with the sequences on the magnetic particles. Only the molecules coding the path passing through the vertex v(i) attach to the magnetic particles. Other molecules are washed away from the solution – step 4.

JAVA applet no.6

App.3.6: Affinity purification

We verify the existence of the molecule coding the Hamiltonian path in the graph G by multiplying the acquired molecules by means of the PCR (polymerase chain reaction) technique and by consequent use of the gel electrophoresis method. Both techniques have already been described (see the text above). If the set of molecules in the solution is empty, the result of the HPP runs: “No, there is no Hamiltonian path”, otherwise “Yes, there is a Hamiltonian path” and the solution contains just the molecules coding the HPP solution.

SAT problem

Work of L. Adleman was extended by an American mathematician R. J. Lipton. He investigated the solution of the satisfiability of propositional formulas – SAT Problem (Satisfiability Problem). A SAT problem is a classic example of NP-complete problem, i.e. it does not have the solution in polynomial time. The only method of how to solve SAT problem is a massive search of the state space of all parameters for a given propositional formula. It is another example pointing to new opportunities of DNA calculations.

Formally, this problem can be put as follows: Let F(x1,…,xn) be the propositional formula with length s. Let us resolve whether truth-values of Boolean variables x1,…,xn exist so that the formula F(x1,…,xn) is true.

Solution of a SAT problem with the aid of DNA molecules

Just as Adleman generated all possible passings through the graph, Lipton generated all possible evaluations of variables x1,…,xn in a similar way. Locations encoding the value of the variable xi are fixed. At every i-th location of DNA molecule only two different oligonucleotides can occur with one oligonucleotid encoding information xi = 0 and another one encoding information xi = 1. All oligonucleotides have to differ from each other. DNA molecules representing nodes ai serve as binary bit delimiters x1, x2,… ,xn.

Orientovaný graf
Fig.4.1: An oriented graph with paths representing n-bit binary number x1, x2,…,xn

To solve SAT problem the following information is required:

  • Extract – select all DNA molecules that include given base sequence.
  • Detect – find out whether or not a DNA molecule is present in the solution.
  • Multiply – multiply count of DNA molecule copies.

We have dealt with these operations by the description of Adleman’s experiment. The exact description of algorithm for a solution of SAT problem is beyond the scope of this problem explication. Note that the number of steps required for answering a question relating to satisfiability of arbitrary propositional formula is linear function of its size.

© 2004, DNA Computing
Tomaš Kantor (TEK), FIT VUT Brno, opv@quick.cz