Semantic Analysis
Moving beyond syntax to meaning - error phases, control flow, and optimization
The compiler pipeline so far
| Phase | Input | Output |
|---|---|---|
| Tokenizer | Raw text | List of tokens |
| Parser | List of tokens | Parse tree |
| Semantic analysis | Parse tree | Validated parse tree (with types and errors) |
Semantic analysis asks: does this tree make sense? We’re no longer concerned with lexical or grammatical correctness - we’re evaluating meaning.
Three types of errors
$x
var x : boolean 10
var y : boolean = 1
| Line | Error type | Phase | Why |
|---|---|---|---|
| 1 | Lexical | Tokenizer | $ has no meaning in CatScript outside of strings |
| 2 | Syntactic | Parser | Missing = - violates the grammar (require fails) |
| 3 | Semantic | Validation | 1 is not a boolean - lexically and grammatically valid, but meaningless |
Errors come from different phases of the compiler. This distinction is fundamental and recurs throughout semantic analysis.
Where semantic analysis happens in CatScript
Unlike the tokenizer and parser which are separate components, semantic analysis code lives inside the parse tree elements themselves. Each parse element (expression or statement) implements a validate method:
// In AdditiveExpression.java
@Override
public void validate(SymbolTable symbolTable) {
// type checking logic here
}
ParseElement is the base class. Statement and Expression extend it. Concrete classes like ForStatement and AdditiveExpression implement the abstract validate method with their own semantic rules.
Type errors vs. non-type errors
Semantic analysis in statically typed languages like CatScript focuses heavily on type systems, but type checking isn’t everything.
Use-before-declaration
print(x)
var x = 10
This is syntactically valid but semantically incorrect in CatScript - x is used before it’s declared.
In JavaScript with var, this is legal due to variable hoisting - variables are moved to the top of their scope. With let, it’s an error. CatScript follows Java-style semantics: variables must be declared before use.
Control flow analysis - return coverage
function sign(x : int) : string {
if (x > 0) {
return "positive"
}
if (x < 0) {
return "negative"
}
}
There’s no type error here. The problem is that if x equals zero, no value is returned. This is a control flow error - not every path through the function leads to a return statement.
The algorithm for validating return coverage is recursive:
- Look at the last statement in the statement list
- If it’s a
returnstatement - covered - If it’s an
ifstatement - recurse on both the true and else branches (both must have return coverage) - If it’s a
fororwhileloop - recurse on its body - Otherwise - not covered
Void functions don’t need return coverage because they can fall off the end naturally.
Dead code elimination
Semantic analysis can also be used for optimization:
if (DEBUG) {
logExpensiveState();
}
If DEBUG is a compile-time constant known to be false, the entire block can be eliminated. The compiler evaluates the condition at compile time and drops the unreachable code.
Similarly, Java optimizes constant expressions:
var x = 1 + 2; // javac compiles this as: var x = 3;
No addition happens at runtime - the compiler does it during semantic analysis.
Other semantic errors in Java
| Error | Description |
|---|---|
| Variable might not have been initialized | Control flow analysis - all paths between declaration and use must assign a value |
| Unhandled exception | Checked exceptions must be caught or declared thrown |
| Checked vs. runtime exceptions | SQLException (checked) must be handled; RuntimeException subclasses don’t |
The 80/20 rule
Semantic analysis involves design decisions. Every language makes different choices:
- JavaScript: no use-before-declaration error, missing returns produce
undefined - Rust: allows variable shadowing
- Haskell: very strict type systems, aim for zero type errors
- CatScript: standard static typing, not sound but practical
The general principle: don’t overcook it. Catch the most impactful errors without making the type system so complex that it becomes a burden.
Summary
| Concept | Key point |
|---|---|
| Three error phases | Lexical (tokenizer), syntactic (parser), semantic (validation) |
| Validate methods | Semantic analysis is implemented on parse tree elements, not in a separate visitor |
| Type errors | Detected by checking type assignability |
| Control flow errors | Return coverage, uninitialized variables - not about types |
| Dead code elimination | Compile-time evaluation of constants for optimization |
| Design decisions | Every language makes different semantic choices - there’s no single “right” answer |