Syntax-Directed Translation in C++

In this appendix, we demonstrate how to implement an important compiler part in this completely realistic way. Specifically, we implement a precedence parser that guides a syntax-directed translation of infix logical expressions to its postfix equivalents. We program this implementation in C++, which definitely belongs to the most popular programming languages today.

We implement the syntax-directed translation in the following way. The input grammar may content an identifiers and an integer numbers as operands. Every positive integer represents the logical value true while the zero means false. Lexemes and tokens specifying logical operators and, or and not are denoted by &, |, and , ! respectively. The syntax directed translation is defined by this attribute grammar

C -> !C		{POSTFIX(!)}
C -> C | C 	{POSTFIX(|)}
C -> C & C 	{POSTFIX(&)}
C -> (C) 	{}
C -> i{$1}	{POSTFIX($1)}
C -> #{$1}	{POSTFIX($1)}

where #’s attributes are 0 or 1, representing true or false, respectively. For instance, this syntax directed translation converts 0 | ! (day & night) to #{0} i{?day} i{?night} & ! |.

Synopsis. First, we conceptualize the implementation in Concept. Then, in C++ Source Code, we give a C++ code of this implementation. As opposed to the previous parts of this book, we present both sections in somewhat schematic way because computer program code is usually commented in this way.

Concept

The implementation is conceptualized as an object oriented system based upon these classes Lex recognizes the next lexeme and produces its token that specifies the lexeme. Each identifier is recorded in the symbol table and represented by a token with an attribute, which is a symbol table address to the symbol table record of the identifier.

PushdownAutomaton performs the fundamental pushdown operations.

SymbolTable stores identifiers.

PrecedenceParser makes the syntax analysis of the tokenized infix expression, obtained by Lex, by using the precedence parsing method. In addition, it controls the semantic actions to produce the postfix equivalent to the infix expression. PrecedenceParser holds the instances of the Lex, PushdownAutomaton, and SymbolTable classes.

The following definitions represent the tokens together with the corresponding lexemes. They are specified in the Defines.h file as follows:

typedef enum PD_SYMBOLS
{
    UNDEF         = -1, // used as end of the rule marker
    TOKEN_IDENT   =  0, // 'i' identificator
    TOKEN_INTEGER =  1, // '#' integer
    TOKEN_LOG_AND =  2, // '&' disjunction
    TOKEN_LOG_OR  =  3, // '|' conjunction
    TOKEN_LOG_NEG =  4, // '!' negation
    TOKEN_B_LEFT  =  5, // '(' left parenthesis
    TOKEN_B_RIGHT =  6, // ')' right parenthesis
    BOTTOM        =  7, // '$' used as the pushdown bottom
    N_COND        =  8, // 'C' nonterminal CONDITION
    LSHARP        =  9, // '<' <
    RSHARP        =  10 // '>' >
};

UNDEF symbol is used as an endmarker, which marks, for instance, the end of a rule. LSHARP and RSHARP are pushdown symbols. The other symbols correspond to the input tokens.

Class interface

Lex performs

  • T_Token * Lex::getNextToken(): getting the next token
  • void Lex::setToken(int type, string name, int value): setting token with a given values
  • definition of the instance variable ‘token’ representing input symbol
typedef struct T_Token
{
    int    type;
    string name;
    int    value;
};
T_token token;

PushdownAutomaton is implemented as

// Definition of the pushdown symbol
typedef struct T_PdSymbol
{
    PD_SYMBOLS symbol; // type of the symbol
    T_Token *  token;  // pointer to the corresponding token if any
};
// definitions of the pushdown structure
typedef vector<T_PdSymbol *> T_PdString;
typedef T_PdString T_PdAutomaton;
typedef T_PdString::iterator T_PdAutomatonIter;
typedef T_PdString::reverse_iterator T_PdAutomatonIterRev;
T_PdAutomaton pdAutomaton;

PushdownAutomaton performs

  • void PushdownAutomaton::push(T_PdSymbol * symbol): pushing a symbol onto the pushdown top;
  • void pop(): popping the topmost pushdown symbol;
  • T_PdSymbol * PushdownAutomaton::getTopmostTerm(): getting the topmost pushdown terminal;
  • bool PushdownAutomaton::switchTop(T_PdString oldTop, T_PdString newTop): switching a string for another string on the pushdown top;
  • bool PushdownAutomaton::switchSymbol(PD_SYMBOLS symbol, T_PdString newTop): replacing the topmost pushdown symbol with a string;
  • bool PushdownAutomaton::compareTop(T_PdString top): comparing a topmost pushdown string with another string; if they coincide, it returns true, and if they do not, it returns false;
  • T_PdString PushdownAutomaton::getSymbolsToReduce(): getting the pushdown string that starts at the topmost occurrence of symbol < and continues up to the very top of the pushdown;
  • void PushdownAutomaton::setSymbol(T_PdSymbol * pdSymbol, PD_SYMBOLS symbol, T_Token * token): setting pdSymbol with a given values.

SymbolTable is implemented as

struct T_SymbolInfo
{
    int length;
};
typedef map<string, T_SymbolInfo> T_SymbolTable;
typedef map<string, T_SymbolInfo>::iterator T_SymbolTableIter;
T_SymbolTable symbolTable;''

SymbolTable performs

  • void SymbolTable::installID(string lexeme): storing new lexeme into the symbol table;
  • T_SymbolTableIter SymbolTable::getID(string lexeme): return of the symbol-table index addressing the given lexeme.

PrecedenceParser class stores the grammatical rules in this way

typedef vector<PD_SYMBOLS>                 T_Rule;
typedef vector<PD_SYMBOLS>::iterator       T_RuleIter;
typedef map<int, T_Rule>                   T_RuleTable;
typedef map<int, T_Rule>::iterator         T_RuleTableIter;

In PrecedenceParser, an operator precedence table is implemented as

char precedenceTable[PTABLEROW][PTABLECOL];

The sematic actions corresponding to the rule used during reductions are implemented as

char postfixAction[PTABLEROW];

Finally, PrecedenceParser contains

  • int PrecedenceParser::findRightSide(T_PdString rightSide): finding the rule according to its right-hand side;
  • T_Rule PrecedenceParser::getRule(int index): getting the rule according to its table index.
  • int PrecedenceParser::runParsing(): implementation of the syntax-directed translation based upon the precedence parser;

Demonstration

Here you can see demonstration of the compiler.

C++ Source Code

In this section we give the code of the 2 key compiler methods.

PrecedenceParser::PrecedenceParser()
{
    // initialization of the precedence table
    char precedenceTableInit[PTABLEROW][PTABLECOL] =
        { /*         i   #   &   |   ~   (   )   $  */
          /* i */  {'_','_','>','>','_','_','>','>'},
          /* # */  {'_','_','>','>','_','_','>','>'},
          /* & */  {'<','<','>','>','<','<','>','>'},
          /* | */  {'<','<','<','>','<','<','>','>'},
          /* ~ */  {'<','<','>','>','<','<','>','>'},
          /* ( */  {'<','<','<','<','<','<','=','_'},
	          /* ) */  {'_','_','>','>','_','_','>','>'},
          /* $ */  {'<','<','<','<','<','<','_','_'}
        };

    // filling PrecedenceTable with precedenceTableInit  
    int size = PTABLEROW * PTABLECOL * sizeof(char);
    memcpy(precedenceTable, precedenceTableInit, size);

    // Actions that generate the postfix notation. Index to the postfixAction array 
    // corresponds to the index of a rule in ruleTable. In this way, we associate each 
    // rule with its semantic action that generates the postfix notation.
    postfixAction[0] = '!';
    postfixAction[1] = '|';
    postfixAction[2] = '&';
    postfixAction[3] = '_'; // no postfix action defined for brackets
    postfixAction[4] = 'i';
    postfixAction[5] = '#';


    // Initialization of the grammatical rules follows next. The first symbol of 
    // each rule detones the left-hand side of the rule.
    PD_SYMBOLS rules[RULECOUNT][RULELENGTH]=  {
          { N_COND, TOKEN_LOG_NEG, N_COND       , UNDEF        , UNDEF },
          { N_COND, N_COND       , TOKEN_LOG_OR , N_COND       , UNDEF },
          { N_COND, N_COND       , TOKEN_LOG_AND, N_COND       , UNDEF },
          { N_COND, TOKEN_B_LEFT , N_COND       , TOKEN_B_RIGHT, UNDEF },
          { N_COND, TOKEN_IDENT  , UNDEF        , UNDEF        , UNDEF },
          { N_COND, TOKEN_INTEGER, UNDEF        , UNDEF        , UNDEF }
    };

    // Filling ruleTable with the rules 
    initRules(rules, RULECOUNT);

    // Initialization of SymbolTable, Lexical analyzer and pushdown automaton.
    symbolTable = new SymbolTable;
    lex = new Lex(symbolTable);
    pdAutomaton = new PushdownAutomaton();
} 


int PrecedenceParser::runParsing()
{
    T_Token    *token;
    T_PdSymbol *pdSymbol;
    T_PdSymbol *pdSymbolIns;

    PD_SYMBOLS inSymbol;
    T_PdString rightSide;
    T_PdString newTop;
    T_PdString oldTop;
    T_PdString stringToReduce;
    T_Rule     reduceRule;
    T_PdString leftSide;
    T_PdString::iterator iter2;
    T_PdSymbol * pdSymbolTmp;
    
    // pdSymbol represents a pushdown symbol. 
    pdSymbol = new T_PdSymbol;
    setSymbol(pdSymbol, BOTTOM, NULL);
    
    // pushing symbol $, which marks the pushdown bottom  
    pdAutomaton->push(pdSymbol);
    
    // getting the first token 
    token = lex->getNextToken();

    // Token is stored into inSymbol.
    inSymbol = (PD_SYMBOLS)token->type;

    do{
        // pdSymbol records the topmost terminal.
        pdSymbol = pdAutomaton->getTopmostTerm();

        // inSymbol and pdSymbol determines the corresponding precedenceTable 
        // entry, which then this entry species the parsing action to perform. 
        switch(precedenceTable[pdSymbol->symbol][inSymbol])
        {
            case '=': /* two computational actions are performed: 
                            (1):  push(inSymbol) & 
                            (2):  get the next token 
                      */
                
                /* (1):  push(inSymbol) */
                pdSymbol = new T_PdSymbol;
                setSymbol(pdSymbol, inSymbol, token);
                pdAutomaton->push(pdSymbol);
                
                /* (2): read the next token */
                token = lex->getNextToken();
                inSymbol = (PD_SYMBOLS)token->type;
                break;

            case '<': /* three computational actions are performed:
                      (1): replacement of pdSymbol with string consisting of
                           pdSymbol followed by symbol <
                      (2): push(inSymbol)
                      (3): reading the next token
                      */

                /* (1): replacement of pdSymbol with a string consisting of
                           pdSymbol followed by symbol <  */

                // preparing the string consisting of
                // pdSymbol followed by symbol <   
                newTop.clear();
                newTop.push_back(pdSymbol);
                pdSymbolIns = new T_PdSymbol;
                setSymbol(pdSymbolIns, LSHARP, NULL);
                newTop.push_back(pdSymbolIns); 

                // replacing pdSymbol with the string prepared above
                pdAutomaton->switchSymbol(pdSymbol->symbol, newTop)

                /* (2): push(inSymbol) */
                pdSymbolIns = new T_PdSymbol;
                setSymbol(pdSymbolIns, inSymbol, token);
                pdAutomaton->push(pdSymbolIns);

                /* (3): read the next token */
                token = lex->getNextToken();
                inSymbol = (PD_SYMBOLS)token->type;

                break;

            case '>': /* two computational actions are performed:
                      (1):  if the topmost occurrence of symbol < is followed by
                            string y, make sure there exists a rule, r, with 
                            the right-hand  side equal to y, and if this is the  
                            case, reduce <y to the the left-hand side of r on
                            the pushdown top;
                      (2):  write out r 
                      */

                /* (1): reduction */
                // finding <y on the pushdown top  
                stringToReduce.clear();
                stringToReduce = pdAutomaton->getSymbolsToReduce();

                // testing that a string starting with symbol < 
                // occurs on the pushdown top
                if(!stringToReduce.empty())
                {
                    // placing y into handle 
                    T_PdString handle = stringToReduce;
                    handle.erase(handle.begin());
                    
                    // finding rule r with y on the right-hand side 
                    int res = findRightSide(handle);
                    
                    // r with y on the right-hand side is found
                    if(res != -1)
                    {  
                        // getting rule r of the form C -> y  
                        reduceRule = getRule(res);
                    
                        pdSymbolIns = new T_PdSymbol;
                        setSymbol(pdSymbolIns, *(reduceRule.begin()), NULL);
                        leftSide.clear();
                        leftSide.push_back(pdSymbolIns);

                        //reducing y to C on the pushdown top  
                        pdAutomaton->switchTop(stringToReduce, leftSide);

                        // postfixAction[] specifies the semantic action 
                        // associated with the rule according to which a reduction
                        // is performed 

                        char c = postfixAction[res];
                        if(c != '_')
                        {
                            // some action prescribed
                            if(c == '#') // integer value
                            {
                            // As described in Lex, 0 represents the zero value
                            // while 1 represents any non-zero value in this 
                            // implementation.  The next line converts the value
                            // to the corresponding character that is concatenated
                            // with the current postfix expression string.
                            char cVal = 
                                    (*(stringToReduce.back())).token->value+'0';
                                postfixExpr.push_back(cVal);
                            }
                            else if(c == 'i')
                            {    
                                postfixExpr.push_back(postfixAction[res]);
                            }
                            else
                            {
                                postfixExpr.push_back(postfixAction[res]);
                            }
                        }
                    }
                    else
                    // no rule has y on its right-hand side
                    { 
                        cerr << "SYNTAX ERROR: the right-hand side of the 
                                 rule is missing" << endl;
                        return 0;
                    }
                }
                else
                // no symbol < occurs in the pushdown 
                {
                    cerr << "SYNTAX ERROR: no symbol < occurs in the pushdown,
                             so no handle is delimited"
                         << endl;
                    return 0;
                }
                break;
            default:
                cerr << "SYNTAX ERROR: table-detected error by a blank entry"  
                     << endl;
                return 0;
                break;
        }
        pdSymbolTmp = pdAutomaton->getTopmostTerm();
    }while( !((pdSymbolTmp->symbol == BOTTOM) && (inSymbol == BOTTOM)) );

    return 0;
}

The whole compiler source code including common routines you can download here.

lectures/books/sdt_cpp.txt · Last modified: 2007/10/31 16:18 by krivka
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Noncommercial-Share Alike 4.0 International
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki