Compilers

Semantic Analysis of Expressions

Type checking additive, comparison, function call, and list literal expressions

The CatScript type system in code

The type system is implemented as an object hierarchy rooted in CatScriptType:

public static final CatScriptType INT = new CatScriptType("int", Integer.class);
public static final CatScriptType STRING = new CatScriptType("string", String.class);
public static final CatScriptType BOOLEAN = new CatScriptType("bool", Boolean.class);
public static final CatScriptType OBJECT = new CatScriptType("object", Object.class);
public static final CatScriptType NULL = new CatScriptType("null", Object.class);
public static final CatScriptType VOID = new CatScriptType("void", Void.class);

ListType extends CatScriptType and adds a component type (e.g. list<int> has component type int).

CatScriptFunctionType extends CatScriptType with parameter types and a return type.


Assignability rules

The isAssignableFrom method on CatScriptType encodes the core rules:

Rule Result
Anything assigned to void Only void is assignable to void
null assigned to anything Always true - null is the bottom type
Otherwise Delegates to Java’s Class.isAssignableFrom on the backing class

For list types, assignability is covariant on the component type: list<string> is assignable to list<object> because string is assignable to object. This is convenient but unsound.


Expressions have types

Expression extends ParseElement and adds a type field. Statement does not have a type.

Every expression resolves its type during validate. The general pattern:

  1. Validate children first (so they resolve their types)
  2. Determine own type from the children’s types
  3. Check type rules and add errors for violations

Additive expression

Additive expressions are overloaded - they handle both integer addition and string concatenation:

void validate(SymbolTable symbolTable) {
    lhs.validate(symbolTable);
    rhs.validate(symbolTable);

    if (lhs.getType().equals(CatScriptType.STRING)
            || rhs.getType().equals(CatScriptType.STRING)) {
        setType(CatScriptType.STRING);
    } else {
        setType(CatScriptType.INT);
        if (!lhs.getType().equals(CatScriptType.INT)) {
            lhs.addError(/* incompatible type */);
        }
        if (!rhs.getType().equals(CatScriptType.INT)) {
            rhs.addError(/* incompatible type */);
        }
    }
}

If either side is a string, the result is a string (concatenation). Otherwise, both sides must be integers.


Factor (multiplication) expression

Simpler than addition - no overloading. Both sides must be int, result is always int.


Comparison expression

Both sides must be comparable types. The return type is always boolean.


Equality expression

Both sides can be anything - all types support equality comparison. The return type is boolean.


Unary expression

Operator Operand type Result type
- (negate) int int
not boolean boolean

Parenthesized expression

The type is simply the type of the inner expression. Parentheses affect evaluation order, not types.


Literals

Literal Type
42 int
"hello" string
true / false boolean
null null

Literals can’t have type errors themselves - errors arise from the position they’re used in (e.g. passing a string to a function that expects int).


List literal and type unification

[1, 2, 3]           // list<int>
[1, "two", 3]       // list<object>
[1, null, 3]        // list<int>  -  null is assignable to int

Validate all children first, then determine the list type by unifying the children’s types:

  • If all children have the same type -> list<that_type>
  • If children have mixed types -> unify up to a common supertype (ultimately object)
  • null is the bottom type and doesn’t force unification upward

The type diamond: null (bottom) -> concrete types (int, string, boolean) -> object (top). Unification walks up toward object as needed.


Identifier expression

The type of an identifier comes from the symbol table:

void validate(SymbolTable symbolTable) {
    CatScriptType type = symbolTable.getSymbolType(name);
    if (type == null) {
        addError(ErrorType.UNKNOWN_NAME);
    } else {
        setType(type);
    }
}

This is the main place where expressions interact with the symbol table. The type was registered earlier by a var statement, function parameter, or for-loop variable.


Index expression

x[5]
Part Requirement
x (the collection) Must be a ListType
5 (the index) Must be of type int
Result type The component type of the list (e.g. int if x is list<int>)

Function call expression

var result = foo(1, "hello")
  1. Look up the function name in the symbol table
  2. Verify the symbol is a CatScriptFunctionType (not an int or string - that’s an error)
  3. Set the expression’s type to the return type of the function
  4. Validate each argument expression
  5. Check that each argument’s type is assignable to the corresponding parameter type
CatScriptType symbolType = symbolTable.getSymbolType(name);
if (symbolType instanceof CatScriptFunctionType funcType) {
    setType(funcType.getReturnType());
    for (int i = 0; i < arguments.size(); i++) {
        arguments.get(i).validate(symbolTable);
        // check: argument type assignable to parameter type at position i
    }
}

The recursive pattern

Type resolution flows bottom-up through the parse tree:

  1. Literals know their types immediately
  2. Identifiers look up their types in the symbol table
  3. Binary expressions validate children, then derive their own type
  4. Function calls use the function’s declared return type
  5. List literals unify children’s types
  6. Parent expressions use their children’s resolved types

Each expression makes local decisions that globally verify correctness.


Summary

Expression Type resolution
Additive String if either side is string; otherwise int (both sides must be int)
Factor Always int; both sides must be int
Comparison Always boolean
Equality Always boolean; any types allowed
Unary - -> int; not -> boolean
Parenthesized Same as inner expression
Literals Implied by literal type
List literal Unify children’s types into list<unified_type>
Identifier Looked up from symbol table
Index Component type of the list being indexed
Function call Return type of the function