# A systematic approach to the description of fault-tolerant systems

#### Jakub Lojda

#### Supervisor: Doc. Ing. Zdeněk Kotásek, CSc.

Brno University of Technology, Faculty of Information Technology Božetěchova 1/2, 612 66 Brno - Královo Pole ilojda@fit.vutbr.cz



December 17, 2015



- Fault-tolerance
- Formal definition of digital system
- Algorithms for transformation of digital system to its fault-tolerant version
  - TMR (Triple Modular Redundancy)
  - DwC (Duplex with Comparison)

### Fault-tolerance

T FIT

- Raising the level of integration on chip
- Increasing pressure on reliability
- Implementing fault-tolerance into digital circuits

• **Definition 1.1:** Fault tolerant system is a system, that has the capability to perform its function even in the presence of *faults* or *errors*.

#### Fault-tolerance



- **Definition 1.2:** Fault is a phenomenon of losing a capability to perform a function.
- **Definition 1.3:** Error is a concept of dissimilarity of measured value and proper value.

#### Fault-tolerance principles

- TMR (Triple Modular Redundancy)
  - The main circuit is triplicated
  - New component added voter
    - voter performs majority function
- DwC (Duplication with Comparison)
  - The main circuit is duplicated
  - New component added comparator
    - Basically exclusive or (XOR) per output

(and many others...)



- Digital system is generally composed of *pins*, *signals* and *components* performing a particular function.
- **Definition 2.1:** *Pin* is a place of a circuit allowing communication of a particular component with its neighborhood. Pin is represented by a *symbol* from *alphabet of pins*.
- **Definition 2.2:** *Signal* is a wire connecting pins of components. Signal is a pair  $(p_1, p_2), p_1, p_2 \in P, P$  is an *alphabet of pins*.



#### • Definition 2.3: Let

- I be a finite set of input pins,
- O be a finite set of output pins,
- Ta fully defined logic table of logic gate with |I| inputs, |O| outputs and 2\*s state variables, s ∈ N<sub>0</sub>,

then an ordered triple G = (I, O, T) is a basic logic gate.

• Example of basic logic gate G<sub>AND</sub>





#### • **Definition 2.4:** Let

- *I* be a finite set of input pins,
- *O* be a finite set of output pins,
  - I and O are mutually exclusive, I U O form alphabet of pins,
- S be a finite set of connecting signals,
- *U* be a finite set of components performing a particular function, a component can be
  - a digital system,
  - a basic logic gate,

then an ordered quadruple D = (I, O, S, U) is a digital system.

• Recursion allows us to choose a level of abstraction.

### Digital system



• Example of a simple digital system D = (I, O, S, U)performing the function  $o = (a \land b) \lor c$ 



 $G_{\mbox{\scriptsize OR}}$  defined in a similar way

#### Rename algorithm

- For better clarity we shall denote the application of algorithm 2.1 on a basic logic gate *G* and string *x* as **rename\_gate**(*G*, *x*).
- Algorithm 2.1:
  - Input: logic gate G = (I, O, T), arbitrarily chosen string X
  - Output: logic gate  $G_X = (I_X, O_X, T_X)$ , with all symbols relabeled according to string X
  - The steps of the algorithm:

1) 
$$I_x = \{i_x \mid i \in I\}$$

- $2) \quad O_{X} = \{ o_{X} \mid o \in O \}$
- 3) Developing  $T_x$  containing logic table T with all variable names renamed from V to  $V_x$  for each variable name.



#### Rename algorithm

- For better clarity we shall denote the application of algorithm 2.2 on digital system *D* and string *x* as **rename(***D***,** *x***)**.
- Algorithm 2.2:
  - Input: digital system D = (I, O, S, U), arbitrarily chosen string X
  - Output: digital system D<sub>x</sub> = (I<sub>x</sub>, O<sub>x</sub>, S<sub>x</sub>, U<sub>x</sub>), with all symbols relabeled according to string X
  - The steps of the algorithm:



### Rename algorithm

T FIT

- Both rename algorithms can be used for
  - making an instance of digital system or basic logic gate from a *template* object,
  - copying an instance of an object.



• The basic principles of TMR:





#### • Let

- $I_V = \{a, b, c\}$
- $O_V = \{0\}$
- $S_V = \{(a, AND1B), (a, AND2A), (b, AND2B), (b, AND3A), (c, AND3B), (c, AND1A), (AND10, ORA), (AND20, ORB), (AND30, ORC), (OR0, o)\}$
- $U_V = \{ DAND1, DAND2, DAND3, DOR \}$

then an ordered quadruple  $V_V = (I_V, O_V, S_V, U_V)$  represents



=  $(T_{V}, O_{V}, S_{V}, O_{V})$  represents a circuit performing a Boolean *majority function* (median operator).



#### • Algorithm 2.3:

- Input: number n ∈ N of output signals of the original component,
- Output: component of voter  $D_{VOT} = (I_{VOT}, O_{VOT}, S_{VOT}, U_{VOT})$ ,
- The steps of the algorithm:

1) 
$$i = 1; I_{VOT} O_{VOT} S_{VOT} U_{VOT} = \emptyset$$
  
2)  $I_{VOT} = I_{VOT} \cup \{v_{Ai'}, v_{Bi'}, v_{Ci}\}$   
3)  $O_{VOT} = O_{VOT} \cup \{w_i\}$   
4)  $U_{VOT} = U_{VOT} \cup \text{rename}(V_{I'}, i)$   
5)  $S_{VOT} = S_{VOT} \cup \{(v_{Ai'}, a_{i}), (v_{Bi'}, b_{i}), (v_{Ci'}, c_{i}), (o, w_{i})\}$   
6) If  $(i \neq n)$  then  $i = i + 1$ ; go to 2

• For better clarity we shall denote the application of algorithm 2.3 on number *n* of output signals as **voter**(*n*).

## • Algorithm 2.4:

- Input: not fault-tolerant digital system D = (I, O, S, U)
- Output: fault-tolerant digital system D' = (I, O, S', U'), protected by TMR
- The steps of the algorithm:

1) 
$$D_A = \text{rename}(D, A); D_B = \text{rename}(D, B); D_C = \text{rename}(D, C)$$

2) 
$$V_D = \text{voter}(|O|)$$

$$3) \quad U' = \{ D_A, D_B, D_C, V_D \}$$

4) 
$$S_{IN} = \bigcup_{i \in I} \{ (i, i_A), (i, i_B), (i, i_C) \}$$

5) Lets arbitrarily choose bijection f:  $O \rightarrow \{1, ..., |O|\}$ 

6) 
$$S_{VOT} = \bigcup_{o \in O} \{ (o_A, v_{Af(o)}), (o_B, v_{Bf(o)}), (o_C, v_{Cf(o)}) \}$$

7) 
$$S_{OUT} = \bigcup_{o \in O} \{ (w_{f(o)}, o) \}$$

8) 
$$S' = S_{IN} \cup S_{VOT} \cup S_{OUT}$$



• The basic principles of DwC:



- a) Input signals of components
- b) Component copies
- c) Output signals of components to comparator
- d) Comparator
- e) Output signals of one component and error vector of comparator



## • Algorithm 2.5:

- Input: number *n* of outputs,  $n \in \mathbb{N}$
- Output: component of comparator

 $D_{CMP} = (I_{CMP}, O_{CMP}, S_{CMP}, U_{CMP}).$ 

• The steps of the algorithm:

1) 
$$i = 1; I_{CMP}, O_{CMP}, S_{CMP}, U_{CMP} = \emptyset$$
  
2)  $I_{CMP} = I_{CMP} \cup \{v_{A,i}, v_{B,i}\}$   
3)  $O_{CMP} = O_{CMP} \cup \{e_i\}$   
4)  $U_{CMP} = U_{CMP} \cup \text{rename_gate}(G_{XOP}, i)$   
5)  $S_{CMP} = S_{CMP} \cup \{(v_{A,i}, XORA_i), (v_{B,i}, XORB_i), (XORO_i, w_i)\}$   
6) If  $(i \neq n)$  then  $i = i + 1$ ; go to 2

• Again, we shall denote the application of algorithm 2.5 on number *n* of output signals as **comparator**(*n*).

## • Algorithm 2.6:

- Input: not fault-tolerant digital system D = (I, O, S, U)
- Output: fault-tolerant digital system D' = (I, O', S', U'), protected by DwC
- The steps of the algorithm:

1) 
$$U' = \{\text{rename}(D, A), \text{rename}(D, B), \text{comparator}(|O|)\}$$

$$2) \quad E_{OUT} = \bigcup_{o \in O} \{e_o\}$$

$$3) \quad O' = E_{OUT} \cup O$$

4) 
$$S_{IN} = \bigcup_{i \in I} \{ (i, i_A), (i, i_B) \}$$

5) Lets arbitrarily choose bijection f:  $O \rightarrow \{1, ..., |O|\}$ 

6) 
$$S_{CO} = \bigcup_{o \in O} \{ (o_A, v_{Af(o)}), (o_B, v_{Bf(o)}), (o_A, o) \}$$

7) 
$$S_E = \bigcup_{o \in O} \{ (w_{f(o)}, e_o) \}$$

8)  $S' = S_{IN} \cup S_{CO} \cup S_E$ 





- Formal model of a digital system
- Two transformation algorithms working with proposed formal model
  - TMR (Triple Modular Redundancy)
  - DwC (Duplication with Comparison)



- Martin Straka: *Methodology of highly reliable systems design*, PhD thesis, Brno, FIT BUT, 2013
- Hlavička, J., Racek, S., Golan, P., Blažek, T.: Číslicové systémy odolné proti poruchám, ČVUT, Prague, 1992, ISBN 80-01-00852-5

## Thank You For Your Attention !