Actual Parsing
Left associativity, the loop trick, and unary expressions
From theory to implementation
In the previous lecture we saw the recursive descent algorithm in theory. Now we implement it against the CatScript expression grammar, starting with primary expressions and building up to additive and unary expressions.
Primary expressions
The simplest parser: only integers.
expression -> primary_expression ;
primary_expression -> INTEGER ;
Expression parsePrimaryExpression() {
if (match(INTEGER)) {
Token token = consumeToken();
IntegerLiteralExpression expr = new IntegerLiteralExpression();
expr.setToken(token);
expr.setValue(Integer.parseInt(token.getStringValue()));
return expr;
}
return new SyntaxErrorExpression(consumeToken());
}
The pattern for every primary expression:
- Check the current token type with
match() - Consume the token
- Construct the AST node and fill in its fields
- Return the node
For single-token expressions like integer literals, setToken() sets both the start and end token at once.
Additive expressions
additive_expression -> primary_expression, { ( '+' | '-' ), primary_expression } ;
(We skip factor_expression for now and go straight to primary_expression.)
First attempt: recursion
Expression parseAdditiveExpression() {
Expression lhs = parsePrimaryExpression();
if (match(PLUS, MINUS)) {
AdditiveExpression expr = new AdditiveExpression();
expr.setOperator(consumeToken());
expr.setLeftHandSide(lhs);
expr.setStart(lhs.getStart());
Expression rhs = parseAdditiveExpression(); // recursive call
expr.setRightHandSide(rhs);
expr.setEnd(rhs.getEnd());
return expr;
}
return lhs;
}
Debugging 4 - 3 - 2 with this code produces:
(- 4 (- 3 2))
That’s 4 - (3 - 2) = 4 - 1 = 3 - wrong. The recursive call greedily consumes all remaining minus tokens on the right side, producing right associativity.
The fix: use a loop
Expression parseAdditiveExpression() {
Expression lhs = parsePrimaryExpression();
while (match(PLUS, MINUS)) {
AdditiveExpression expr = new AdditiveExpression();
expr.setOperator(consumeToken());
expr.setLeftHandSide(lhs);
expr.setStart(lhs.getStart());
Expression rhs = parsePrimaryExpression(); // NOT recursive
expr.setRightHandSide(rhs);
expr.setEnd(rhs.getEnd());
lhs = expr; // slide the new expression into the left-hand side
}
return lhs;
}
Two changes from the recursive version:
| Recursive (wrong) | Loop (correct) | |
|---|---|---|
| Control flow | if + recursive call |
while loop |
| Right-hand side | parseAdditiveExpression() |
parsePrimaryExpression() |
| After building node | return immediately | lhs = expr (slide under) |
How the loop produces left associativity
For 4 - 3 - 2:
- Parse
4as a primary expression ->lhs = 4 - Match
-, consume it. Parse3as a primary. Build(- 4 3). Setlhs = (- 4 3) - Match
-, consume it. Parse2as a primary. Build(- (- 4 3) 2). Setlhs = (- (- 4 3) 2) - No more
+or--> return(- (- 4 3) 2)
Result: (4 - 3) - 2 = -1 - correct.
The trick: each iteration takes the current lhs and makes it the left child of a new additive expression. This “slides” the previous result under the new operator, ensuring left-to-right grouping.
The mnemonic: loop left, recurse right
Left-associative operators -> use a loop. Right-associative operators -> use recursion.
This is the essential trick for recursive descent parsing of binary operators. The curly braces { ... } in the EBNF grammar (zero or more repetitions) correspond to a while loop in the implementation.
Unary expressions
unary_expression -> ( '!' | '-' ), unary_expression
| primary_expression ;
Unary expressions are right-associative - --5 means -(-(5)), so we use recursion:
Expression parseUnaryExpression() {
if (match(MINUS, NOT)) {
Token operator = consumeToken();
UnaryExpression expr = new UnaryExpression();
expr.setOperator(operator);
expr.setStart(operator);
Expression rhs = parseUnaryExpression(); // recursion -> right associative
expr.setRightHandSide(rhs);
expr.setEnd(rhs.getEnd());
return expr;
}
return parsePrimaryExpression();
}
Wiring it into the grammar
Before unary expressions existed, parseAdditiveExpression called parsePrimaryExpression directly. Now we insert parseUnaryExpression in between:
parseAdditiveExpression -> calls parseUnaryExpression
parseUnaryExpression -> calls parsePrimaryExpression
Each new precedence level gets wired into this chain. Eventually:
parseExpression -> parseEqualityExpression -> parseComparisonExpression
-> parseAdditiveExpression -> parseFactorExpression -> parseUnaryExpression
-> parseIndexExpression -> parsePrimaryExpression
Example: 5 - -5
parseAdditiveExpression-> parse unary -> parse primary ->5- Match
-(additive) -> build additive expression, left side =5 - Parse unary for right side -> match
-(unary!) -> build unary expression - Parse unary again -> no minus/not -> parse primary ->
5 - Return unary
(-5)as right side of additive
The same - token means subtraction in the additive context and negation in the unary context. Recursive descent resolves this naturally based on position in the grammar.
All binary expressions follow the same pattern
Equality, comparison, and factor expressions all look nearly identical to the additive expression - just different token types in the while condition and different “next level down” calls:
// Comparison expression (same pattern as additive)
Expression parseComparisonExpression() {
Expression lhs = parseAdditiveExpression();
while (match(GREATER, GREATER_EQUAL, LESS, LESS_EQUAL)) {
ComparisonExpression expr = new ComparisonExpression();
expr.setOperator(consumeToken());
expr.setLeftHandSide(lhs);
expr.setStart(lhs.getStart());
Expression rhs = parseAdditiveExpression();
expr.setRightHandSide(rhs);
expr.setEnd(rhs.getEnd());
lhs = expr;
}
return lhs;
}
Summary
| Concept | Key point |
|---|---|
| Primary expressions | Match token type, consume, construct AST node, return |
| Recursion -> right associativity | Recursive calls greedily consume operators on the right |
| Loop -> left associativity | while loop slides each result into the left child |
| “Loop left, recurse right” | The mnemonic for parsing binary operators correctly |
| Unary expressions | Right-associative, so they use recursion (parseUnaryExpression calls itself) |
| Wiring precedence levels | Each level calls the next level down; new levels are inserted into the chain |
| Same token, different meaning | - is subtraction (additive) or negation (unary) depending on grammar position |