Parsing Statements
Print, var, for, while, if, and assignment statements
Expressions vs statements
| Expressions | Statements | |
|---|---|---|
| Produce | A value | Side effects |
| Action | Evaluate | Execute |
| Examples | 1 + 1, foo(x), x > 10 |
print(x), var x = 10, for (...) |
| Children | Other expressions | Expressions and other statements |
Statements form the core structure of imperative programming languages. They control flow (if, for, while), declare variables (var), and trigger side effects (print). Their arguments are typically expressions.
The CatScript statement grammar
statement -> for_statement
| while_statement
| if_statement
| print_statement
| variable_statement
| assignment_statement
| function_call_statement ;
Statement parsing strategy
Most statements start with a keyword - a strong, unambiguous start symbol:
| Statement | Start token |
|---|---|
for |
for keyword |
while |
while keyword |
if |
if keyword |
print |
print keyword |
var |
var keyword |
| assignment | IDENTIFIER (no keyword) |
| function call | IDENTIFIER (no keyword) |
The null-return convention
Each statement type gets its own parseXxxStatement() function. If the current token doesn’t match, the function returns null:
Statement parseStatement() {
Statement printStmt = parsePrintStatement();
if (printStmt != null) return printStmt;
Statement varStmt = parseVarStatement();
if (varStmt != null) return varStmt;
Statement forStmt = parseForStatement();
if (forStmt != null) return forStmt;
// ... and so on for each statement type
}
This gives a clean, regular structure - easy to extend by duplicating and modifying a block.
Print statement
print_statement -> 'print', '(', expression, ')' ;
Statement parsePrintStatement() {
if (match(PRINT)) {
PrintStatement stmt = new PrintStatement();
stmt.setStart(consumeToken());
require(LEFT_PAREN, stmt);
stmt.setExpression(parseExpression());
stmt.setEnd(require(RIGHT_PAREN, stmt));
return stmt;
}
return null;
}
The expression between the parentheses can be anything - an additive expression, a function call, a concatenation. We just call parseExpression() and whatever it is gets consumed and becomes a child of the print statement.
Variable statement
variable_statement -> 'var', IDENTIFIER, [ ':', type_expression ], '=', expression ;
Statement parseVarStatement() {
if (match(VAR)) {
VariableStatement stmt = new VariableStatement();
stmt.setStart(consumeToken());
Token identifier = require(IDENTIFIER, stmt);
stmt.setVariableName(identifier.getStringValue());
if (matchAndConsume(COLON)) {
TypeLiteral type = parseTypeExpression();
stmt.setExplicitType(type.getType());
}
require(EQUALS, stmt);
Expression expr = parseExpression();
stmt.setExpression(expr);
stmt.setEnd(expr.getEnd());
return stmt;
}
return null;
}
Key details:
- The identifier is required -
require(IDENTIFIER, stmt)adds a parse error if it’s missing - The type annotation (
: int) is optional - usematchAndConsume(COLON)to check - The colon and equals are syntactic delimiters - consumed but not stored as AST fields
- The end token is whatever the expression ends with (we don’t know or care what it is)
For statement
for_statement -> 'for', '(', IDENTIFIER, 'in', expression, ')',
'{', { statement }, '}' ;
Parsing a for loop:
- Match and consume the
forkeyword (set as start token) - Require
( - Require an
IDENTIFIER(the loop variable) - Require the
inkeyword - Parse an expression (the iterable)
- Require
) - Require
{ - Parse statements in a loop while the next token is not
} - Require
}(set as end token)
Statement bodies: the while-not-close-brace pattern
require(LEFT_BRACE, stmt);
List<Statement> body = new ArrayList<>();
while (!match(RIGHT_BRACE)) {
body.add(parseStatement());
}
stmt.setStatements(body);
stmt.setEnd(require(RIGHT_BRACE, stmt));
This pattern - while the next token is not a closing brace, keep parsing statements - recurs in every statement that has a body (for, while, if, else).
While statement
while_statement -> 'while', '(', expression, ')', '{', { statement }, '}' ;
Nearly identical structure to for. Match the while keyword, require parenthesized expression, require braced body.
If statement
if_statement -> 'if', '(', expression, ')', '{', { statement }, '}',
[ 'else', ( if_statement | '{', { statement }, '}' ) ] ;
After parsing the condition and body:
- Check for
else - If the token after
elseisif-> recursively callparseIfStatement()(handleselse ifchains) - Otherwise -> parse another braced statement body (the terminal
elseblock)
The dangling else problem
In C/Java, if statements don’t require braces:
if (val1)
if (val2) foo();
else bar(); // which if does this else belong to?
Without braces, it’s ambiguous whether else binds to the inner or outer if. C resolves this by convention (bind to the closest if), but it’s a source of bugs.
CatScript avoids this entirely by requiring braces for all if/else bodies. The grammar makes it unambiguous.
Assignment statement
assignment_statement -> IDENTIFIER, '=', expression ;
Unlike other statements, assignments don’t start with a keyword - they start with an identifier:
if (match(IDENTIFIER)) {
Token name = consumeToken();
require(EQUALS, stmt);
Expression expr = parseExpression();
// ...
return assignmentStmt;
}
Assignment vs function call: another ambiguity
Two statement types start with IDENTIFIER:
| Tokens | Statement type |
|---|---|
x = 10 |
Assignment |
foo(1, 2) |
Function call statement |
Resolution: after consuming the identifier, check the next token.
- Next token is
=-> assignment statement - Next token is
(-> function call statement (wraps a function call expression)
This is the same kind of one-token lookahead we saw with identifier expressions vs function call expressions at the expression level.
Function call statements
A function call statement wraps a function call expression and discards the return value. In Java, this is like calling doSomething() without assigning the result - the expression produces a value, but it’s used in a statement position for its side effects.
Statements containing statements
Statements can be nested:
for (x in list) {
if (x > 10) {
print(x)
}
}
The parse tree:
ForStatement
├── identifier: "x"
├── expression: IdentifierExpression("list")
└── body:
└── IfStatement
├── condition: ComparisonExpression(x > 10)
└── body:
└── PrintStatement
└── expression: IdentifierExpression("x")
This nesting happens naturally - parseStatement() calls itself recursively through statement body parsing.
Summary
| Concept | Key point |
|---|---|
| Expressions vs statements | Expressions produce values; statements have side effects and control flow |
| Keyword-started statements | for, while, if, print, var all start with a keyword - easy to distinguish |
| Null-return convention | Each parseXxxStatement() returns null if no match, enabling a chain of attempts |
| Statement bodies | while (!match(RIGHT_BRACE)) { parseStatement(); } |
| Dangling else | CatScript requires braces, avoiding the ambiguity entirely |
| Assignment ambiguity | IDENTIFIER could start assignment or function call - check next token |
| Nested statements | Statements can contain other statements; parse tree reflects the nesting |