Compilers

Semantic Analysis of Statements

Validating variable, assignment, if, for, return, and function definition statements

Statements vs. expressions in validation

  Expressions Statements
Have a type Yes - resolved during validate No - but enforce type rules on their children
Symbol table use Mainly lookups (identifier expressions) Lookups, insertions, and scope management
Scope changes Never for, while, if, function definitions all push/pop scopes

The validation pattern for statements

  1. Validate children – call validate on any child expressions and statements
  2. Manage scopes – push before entering a block, pop when leaving
  3. Check assignability – verify types are compatible according to the rules
  4. Register symbolsvar and for statements add new entries to the symbol table

The simplest statement – print(expression) accepts any type (everything is assignable to object):

void validate(SymbolTable symbolTable) {
    expression.validate(symbolTable);
}

Just validate the child expression so it resolves its type. No type errors possible at the statement level.


Variable statement

var x = 10                  // type inferred as int
var y : object = 10         // explicit type - int is assignable to object
var z : boolean = 10        // ERROR - int is not assignable to boolean

Validation steps:

  1. Validate the right-hand side expression (resolves its type)
  2. Check for duplicate names in the symbol table -> error if found
  3. Determine the type:
    • If explicit type is given -> use it, but verify the expression’s type is assignable to the explicit type
    • If no explicit type -> type inference – copy the expression’s type
  4. Register the variable name and resolved type in the symbol table

Type inference is remarkably simple: the type of var x = 10 is just the type of the expression 10, which is int. All the hard work was already done when the expression resolved its type.


Assignment statement

x = "hello"
  1. Look up the variable name in the symbol table
  2. If not found -> unknown name error (assigning to an undeclared variable)
  3. If found -> validate the expression, then check: is the expression’s type assignable to the symbol’s type?

If statement

if (condition) {
    // true branch
} else {
    // else branch
}
  1. Validate the condition expression – its type must be assignable to boolean
  2. Push scope for the true branch, validate all true statements, pop scope
  3. Push scope for the else branch, validate all else statements, pop scope

The two branches have independent scopes:

if (cond) {
    var x = 10      // scope A
} else {
    var x = "hello" // scope B - no conflict with scope A
}

For statement

for (i in [1, 2, 3]) {
    print(i)
}
  1. Validate the expression – its type must be an instance of ListType
  2. Extract the component type from the list (e.g. int from list<int>)
  3. Push scope (before registering the loop variable!)
  4. Register the loop variable with the component type in the symbol table
  5. Validate all body statements
  6. Pop scope – the loop variable is now out of scope

Critical: push the scope before registering the loop variable so that it’s removed when the scope pops. The loop variable shouldn’t be accessible after the loop.


While statement

A combination of if and for patterns:

  1. Validate the condition expression – must be assignable to boolean
  2. Push scope, validate body statements, pop scope

Index assignment statement

x[3] = 10

Three things to verify:

Part Rule
x Must be a list type
3 (index) Must be of type int
10 (value) Must be assignable to the component type of the list

Return statement

function foo() : string {
    return "hello"
}
  1. Walk up the parse tree via getParent() to find the enclosing FunctionDefinitionStatement
  2. If no enclosing function is found -> error (return outside of a function)
  3. If found -> verify the expression’s type is assignable to the function’s return type

Function definition statement

The most involved validation:

  1. Push scope
  2. Register all parameters (name + type pairs) in the symbol table
  3. Validate all body statements
  4. If return type is not void -> validate return coverage
  5. Pop scope

Return coverage

Non-void functions must return a value on every possible path. This is a recursive algorithm:

Last statement Coverage rule
return Covered
if/else Both branches must have return coverage (recursive)
for / while Body must have return coverage (recursive)
Anything else Not covered

Void functions skip this check – they can fall off the end without returning.


Function registration (revisited)

CatScriptProgram.validate() calls registerFunctions() before validating statements. This puts all function names and their CatScriptFunctionType into the global symbol table.

Result: functions can be called from anywhere in the program – no forward declaration needed. This is why CatScript doesn’t need header files like C.


The complete validation pattern

Statement Validates children Manages scope Adds symbols Checks assignability
print expression - -
var expression - variable name + type explicit type vs. expression type
assignment expression - - expression type vs. symbol type
if condition + branches push/pop for each branch - condition must be boolean
for list expression + body push/pop (includes loop var) loop variable expression must be list type
while condition + body push/pop - condition must be boolean
index assignment index + value - - value assignable to list component type
return expression - - expression type vs. function return type
function def body statements push/pop parameters return coverage if non-void