Compilers

Actual Parsing

Left associativity, the loop trick, and unary expressions

From theory to implementation

In the previous lecture we saw the recursive descent algorithm in theory. Now we implement it against the CatScript expression grammar, starting with primary expressions and building up to additive and unary expressions.


Primary expressions

The simplest parser: only integers.

expression        -> primary_expression ;
primary_expression -> INTEGER ;
Expression parsePrimaryExpression() {
    if (match(INTEGER)) {
        Token token = consumeToken();
        IntegerLiteralExpression expr = new IntegerLiteralExpression();
        expr.setToken(token);
        expr.setValue(Integer.parseInt(token.getStringValue()));
        return expr;
    }
    return new SyntaxErrorExpression(consumeToken());
}

The pattern for every primary expression:

  1. Check the current token type with match()
  2. Consume the token
  3. Construct the AST node and fill in its fields
  4. Return the node

For single-token expressions like integer literals, setToken() sets both the start and end token at once.


Additive expressions

additive_expression -> primary_expression, { ( '+' | '-' ), primary_expression } ;

(We skip factor_expression for now and go straight to primary_expression.)

First attempt: recursion

Expression parseAdditiveExpression() {
    Expression lhs = parsePrimaryExpression();

    if (match(PLUS, MINUS)) {
        AdditiveExpression expr = new AdditiveExpression();
        expr.setOperator(consumeToken());
        expr.setLeftHandSide(lhs);
        expr.setStart(lhs.getStart());
        Expression rhs = parseAdditiveExpression();  // recursive call
        expr.setRightHandSide(rhs);
        expr.setEnd(rhs.getEnd());
        return expr;
    }

    return lhs;
}

Debugging 4 - 3 - 2 with this code produces:

(- 4 (- 3 2))

That’s 4 - (3 - 2) = 4 - 1 = 3 - wrong. The recursive call greedily consumes all remaining minus tokens on the right side, producing right associativity.

The fix: use a loop

Expression parseAdditiveExpression() {
    Expression lhs = parsePrimaryExpression();

    while (match(PLUS, MINUS)) {
        AdditiveExpression expr = new AdditiveExpression();
        expr.setOperator(consumeToken());
        expr.setLeftHandSide(lhs);
        expr.setStart(lhs.getStart());
        Expression rhs = parsePrimaryExpression();  // NOT recursive
        expr.setRightHandSide(rhs);
        expr.setEnd(rhs.getEnd());
        lhs = expr;  // slide the new expression into the left-hand side
    }

    return lhs;
}

Two changes from the recursive version:

  Recursive (wrong) Loop (correct)
Control flow if + recursive call while loop
Right-hand side parseAdditiveExpression() parsePrimaryExpression()
After building node return immediately lhs = expr (slide under)

How the loop produces left associativity

For 4 - 3 - 2:

  1. Parse 4 as a primary expression -> lhs = 4
  2. Match -, consume it. Parse 3 as a primary. Build (- 4 3). Set lhs = (- 4 3)
  3. Match -, consume it. Parse 2 as a primary. Build (- (- 4 3) 2). Set lhs = (- (- 4 3) 2)
  4. No more + or - -> return (- (- 4 3) 2)

Result: (4 - 3) - 2 = -1 - correct.

The trick: each iteration takes the current lhs and makes it the left child of a new additive expression. This “slides” the previous result under the new operator, ensuring left-to-right grouping.


The mnemonic: loop left, recurse right

Left-associative operators -> use a loop. Right-associative operators -> use recursion.

This is the essential trick for recursive descent parsing of binary operators. The curly braces { ... } in the EBNF grammar (zero or more repetitions) correspond to a while loop in the implementation.


Unary expressions

unary_expression -> ( '!' | '-' ), unary_expression
                 | primary_expression ;

Unary expressions are right-associative - --5 means -(-(5)), so we use recursion:

Expression parseUnaryExpression() {
    if (match(MINUS, NOT)) {
        Token operator = consumeToken();
        UnaryExpression expr = new UnaryExpression();
        expr.setOperator(operator);
        expr.setStart(operator);
        Expression rhs = parseUnaryExpression();  // recursion -> right associative
        expr.setRightHandSide(rhs);
        expr.setEnd(rhs.getEnd());
        return expr;
    }
    return parsePrimaryExpression();
}

Wiring it into the grammar

Before unary expressions existed, parseAdditiveExpression called parsePrimaryExpression directly. Now we insert parseUnaryExpression in between:

parseAdditiveExpression -> calls parseUnaryExpression
parseUnaryExpression    -> calls parsePrimaryExpression

Each new precedence level gets wired into this chain. Eventually:

parseExpression -> parseEqualityExpression -> parseComparisonExpression
-> parseAdditiveExpression -> parseFactorExpression -> parseUnaryExpression
-> parseIndexExpression -> parsePrimaryExpression

Example: 5 - -5

  1. parseAdditiveExpression -> parse unary -> parse primary -> 5
  2. Match - (additive) -> build additive expression, left side = 5
  3. Parse unary for right side -> match - (unary!) -> build unary expression
  4. Parse unary again -> no minus/not -> parse primary -> 5
  5. Return unary (-5) as right side of additive

The same - token means subtraction in the additive context and negation in the unary context. Recursive descent resolves this naturally based on position in the grammar.


All binary expressions follow the same pattern

Equality, comparison, and factor expressions all look nearly identical to the additive expression - just different token types in the while condition and different “next level down” calls:

// Comparison expression (same pattern as additive)
Expression parseComparisonExpression() {
    Expression lhs = parseAdditiveExpression();

    while (match(GREATER, GREATER_EQUAL, LESS, LESS_EQUAL)) {
        ComparisonExpression expr = new ComparisonExpression();
        expr.setOperator(consumeToken());
        expr.setLeftHandSide(lhs);
        expr.setStart(lhs.getStart());
        Expression rhs = parseAdditiveExpression();
        expr.setRightHandSide(rhs);
        expr.setEnd(rhs.getEnd());
        lhs = expr;
    }

    return lhs;
}

Summary

Concept Key point
Primary expressions Match token type, consume, construct AST node, return
Recursion -> right associativity Recursive calls greedily consume operators on the right
Loop -> left associativity while loop slides each result into the left child
“Loop left, recurse right” The mnemonic for parsing binary operators correctly
Unary expressions Right-associative, so they use recursion (parseUnaryExpression calls itself)
Wiring precedence levels Each level calls the next level down; new levels are inserted into the chain
Same token, different meaning - is subtraction (additive) or negation (unary) depending on grammar position

Further reading