Semantic Analysis

Moving beyond syntax to meaning - error phases, control flow, and optimization

The compiler pipeline so far

Phase	Input	Output
Tokenizer	Raw text	List of tokens
Parser	List of tokens	Parse tree
Semantic analysis	Parse tree	Validated parse tree (with types and errors)

Semantic analysis asks: does this tree make sense? We’re no longer concerned with lexical or grammatical correctness - we’re evaluating meaning.

Three types of errors

$x                          
var x : boolean 10          
var y : boolean = 1

Line	Error type	Phase	Why
1	Lexical	Tokenizer	`$` has no meaning in CatScript outside of strings
2	Syntactic	Parser	Missing `=` - violates the grammar (`require` fails)
3	Semantic	Validation	`1` is not a boolean - lexically and grammatically valid, but meaningless

Errors come from different phases of the compiler. This distinction is fundamental and recurs throughout semantic analysis.

Where semantic analysis happens in CatScript

Unlike the tokenizer and parser which are separate components, semantic analysis code lives inside the parse tree elements themselves. Each parse element (expression or statement) implements a validate method:

// In AdditiveExpression.java
@Override
public void validate(SymbolTable symbolTable) {
    // type checking logic here
}

ParseElement is the base class. Statement and Expression extend it. Concrete classes like ForStatement and AdditiveExpression implement the abstract validate method with their own semantic rules.

Type errors vs. non-type errors

Semantic analysis in statically typed languages like CatScript focuses heavily on type systems, but type checking isn’t everything.

Use-before-declaration

print(x)
var x = 10

This is syntactically valid but semantically incorrect in CatScript - x is used before it’s declared.

In JavaScript with var, this is legal due to variable hoisting - variables are moved to the top of their scope. With let, it’s an error. CatScript follows Java-style semantics: variables must be declared before use.

Control flow analysis - return coverage

function sign(x : int) : string {
    if (x > 0) {
        return "positive"
    }
    if (x < 0) {
        return "negative"
    }
}

There’s no type error here. The problem is that if x equals zero, no value is returned. This is a control flow error - not every path through the function leads to a return statement.

The algorithm for validating return coverage is recursive:

Look at the last statement in the statement list
If it’s a return statement - covered
If it’s an if statement - recurse on both the true and else branches (both must have return coverage)
If it’s a for or while loop - recurse on its body
Otherwise - not covered

Void functions don’t need return coverage because they can fall off the end naturally.

Dead code elimination

Semantic analysis can also be used for optimization:

if (DEBUG) {
    logExpensiveState();
}

If DEBUG is a compile-time constant known to be false, the entire block can be eliminated. The compiler evaluates the condition at compile time and drops the unreachable code.

Similarly, Java optimizes constant expressions:

var x = 1 + 2;  // javac compiles this as: var x = 3;

No addition happens at runtime - the compiler does it during semantic analysis.

Other semantic errors in Java

Error	Description
Variable might not have been initialized	Control flow analysis - all paths between declaration and use must assign a value
Unhandled exception	Checked exceptions must be caught or declared thrown
Checked vs. runtime exceptions	`SQLException` (checked) must be handled; `RuntimeException` subclasses don’t

The 80/20 rule

Semantic analysis involves design decisions. Every language makes different choices:

JavaScript: no use-before-declaration error, missing returns produce undefined
Rust: allows variable shadowing
Haskell: very strict type systems, aim for zero type errors
CatScript: standard static typing, not sound but practical

The general principle: don’t overcook it. Catch the most impactful errors without making the type system so complex that it becomes a burden.

Summary

Concept	Key point
Three error phases	Lexical (tokenizer), syntactic (parser), semantic (validation)
Validate methods	Semantic analysis is implemented on parse tree elements, not in a separate visitor
Type errors	Detected by checking type assignability
Control flow errors	Return coverage, uninitialized variables - not about types
Dead code elimination	Compile-time evaluation of constants for optimization
Design decisions	Every language makes different semantic choices - there’s no single “right” answer