Compilers

Symbol Tables

Stacks of maps - managing scope, variables, and functions

What is a symbol table?

A symbol table is a map of names to types - but more precisely, it’s a stack of maps. Each map represents a scope, and the stack manages scope nesting.

class SymbolTable {
    LinkedList<Map<String, CatScriptType>> scopes = new LinkedList<>();

    SymbolTable() {
        scopes.push(new HashMap<>());  // global scope
    }
}

The constructor creates the first map - the global scope. Anything declared at the top level of a CatScript program ends up here.


Block-based scoping

                          // global scope
for (x in [1, 2, 3]) {   // block scope starts
    var x = 12
}                         // block scope ends
var x = "hello"           // back to global scope  -  no conflict

The two variables named x don’t conflict because they exist in different scopes. When the for loop’s block ends, everything declared inside it disappears.


Push and pop

Event Action
Enter a new block ({) Push a new map onto the stack
Exit a block (}) Pop the top map off the stack
symbolTable.pushScope();    // new block
symbolTable.registerSymbol("x", CatScriptType.INT);
// ... validate body statements ...
symbolTable.popScope();     // x is gone

Pushing creates a fresh map for new symbols. Popping removes all symbols introduced in that scope in one operation.


Where validation starts

// In CatScriptProgram.verify()
public void verify() {
    SymbolTable symbolTable = new SymbolTable();  // creates global scope
    this.validate(symbolTable);                    // recursive validation
    // collect errors and throw if any exist
}

The symbol table is created once and threaded through every validate call in the parse tree.


Function registration

Before validating any statements, CatScript pre-registers all functions in the symbol table:

// In CatScriptProgram.validate()
registerFunctions(symbolTable);  // put ALL functions into global scope first
for (Statement stmt : statements) {
    stmt.validate(symbolTable);
}

This means functions can be called from anywhere in a CatScript program - no forward declarations needed.

In C, functions must be declared before they’re used, which is why header files exist. CatScript avoids this by registering everything up front.


Variable statements and the symbol table

When a var statement is validated:

  1. Check if the name already exists in the symbol table -> duplicate name error
  2. Register the name with its type in the current scope
// Pseudo-code for var statement validation
if (symbolTable.hasSymbol(variableName)) {
    addError("duplicate variable name");
} else {
    symbolTable.registerSymbol(variableName, resolvedType);
}

For statements and scope

For loops introduce both a new scope and a new variable:

for (x in [1, 2, 3]) {
    print(x)
}

The loop variable x has its type derived from the component type of the list expression. If the list is list<int>, then x is int.

Important: push the scope before registering the loop variable, so it gets cleaned up when the scope pops:

symbolTable.pushScope();
symbolTable.registerSymbol("x", componentType);
// validate body statements
symbolTable.popScope();  // x is no longer in scope

Scoping rules - order matters

var x = "hello"
for (i in [1, 2, 3]) {
    var x = 12              // ERROR: x already exists in enclosing scope
}

vs.

for (i in [1, 2, 3]) {
    var x = 12              // fine  -  inner scope
}
var x = "hello"             // fine  -  inner x was popped

When x is declared in the global scope before the for loop, it’s visible inside the loop. Redeclaring it is a duplicate name error. When x is declared inside the loop first, it’s removed when the scope pops, so the global x is fine.


Shadowing

CatScript does not allow variable shadowing - redeclaring a name in an inner scope that already exists in an outer scope is an error.

Rust, by contrast, allows shadowing:

let x = "hello";
{
    let x = 12;   // shadows outer x  -  legal in Rust
}
// x is back to "hello"

Implementing shadowing would mean skipping the duplicate-name check and simply looking up variables from the top of the stack downward, returning the first match.


Java scoping quirks

Scenario Legal?
Two local variables with the same name in the same scope No
Local variable shadowing a class field Yes
Instance field accessed from a static method No - this doesn’t exist in static context
Static field accessed from any method Yes

Java effectively uses two symbol tables: one for static members and one for instance members.


Dynamic scoping (Perl)

In most modern languages, scoping is lexical - determined by where code is written. In Perl’s local keyword, scoping is dynamic - a variable’s value bleeds into called functions:

sub inner { print $var; }
sub outer { local $var = "outer"; inner(); }
$var = "global";
outer();   # prints "outer", not "global"

Dynamic scoping makes programs harder to reason about and is largely abandoned in modern languages. Even Lisp has moved to lexical scoping.


Thread-local scoping

Thread-local variables are visible only to a single thread, avoiding race conditions. Java supports this via the ThreadLocal class rather than language syntax. JavaScript operates this way by default due to its single-threaded event loop.


Summary

Concept Key point
Symbol table A stack of maps (name -> type), one map per scope
Push/pop Enter a block -> push; exit a block -> pop
Function registration All functions registered before validation - no forward declarations needed
Variable declaration Check for duplicates, then register name and type
For loops Push scope before registering loop variable so it’s cleaned up on pop
No shadowing CatScript doesn’t allow inner scopes to redeclare outer names
Lexical vs. dynamic Modern languages use lexical scoping; dynamic scoping (Perl) is rare today