Parse Trees & Abstract Syntax Trees

Visitor pattern, expression problem, and AST design

Overview

This lecture covers what parse trees (abstract syntax trees) are, how they’re structured in code, and two competing approaches to organizing logic around them: the visitor pattern vs. methods directly on nodes. It also introduces the expression problem - a fundamental design trade-off in software engineering.

1. From Tokens to Parse Trees

The compiler pipeline so far:

Source Text  ->  Tokenizer  ->  Stream of Tokens  ->  Parser  ->  Parse Tree

Tokenization (covered previously): turning a stream of characters into a linear series of tokens.
Parsing (this section): feeding tokens into a parser to produce a parse tree (a.k.a. abstract syntax tree or syntax tree).

Terminology

Term	Usage
Parse Tree	Informal / practical term
Abstract Syntax Tree (AST)	Formal term
Syntax Tree	General-purpose term

There are subtle theoretical differences between these, but for a practical compilers course they are used interchangeably.

2. What Makes a Tree?

A tree has a root node.
Each node has a collection of children.
Children may have children of their own (recursive structure).
In most parse trees, a child also has a pointer to its parent (enables walking the tree in both directions).

Trees and Recursion

Lists / arrays -> favor iteration
Trees -> favor recursion

Many compiler algorithms are recursive walks over the parse tree.

Example: Expression Tree

The minus operator is the root; its children are the plus sub-expression and the literal 4.

3. Modeling Parse Trees in Java (CatScript Approach)

The `ParseElement` Base Class

Abstract base class that all AST nodes extend.
Contains:
- List<ParseElement> children - child nodes
- Token start - the first token this element spans
- Token end - the last token this element spans
- Parse error information

Start and End Tokens

Each parse element maps back to a span of tokens in the source:

Construct	Start Token	End Token
`for` loop	`for` keyword	`}` (closing brace)
`if` statement	`if` keyword	`}` (closing brace)

This connection between the parse tree and the underlying source text is maintained via the start/end token references.

Class Hierarchy Example

ParseElement (abstract)
  └── Expression (abstract)
        └── AdditiveExpression
              - leftHandSide: Expression
              - rightHandSide: Expression

For 1 + 1:

An AdditiveExpression object holds:
- leftHandSide -> integer literal expression (1)
- rightHandSide -> integer literal expression (1)

4. Code Generation Approach (Crafting Interpreters Book)

The book Crafting Interpreters (whose language is called Lox) uses a code generator to produce AST classes:

How It Works

Define classes as simple strings: "Binary : Expr left, Token operator, Expr right"
A code generator parses these strings (split on : for class name, split on , for fields)
The defineType function builds Java source code:
- Class declaration extending base class
- Constructor with field parameters
- Field assignments
- Field declarations

Why Use Code Generation?

Avoids writing repetitive boilerplate (constructors, field assignments, field declarations) for every AST node type.
This is metaprogramming - writing programs that write programs.

Downsides

Java is bad at metaprogramming in general.
The generated classes become black boxes - if you regenerate, any hand-written logic inside them gets overwritten.
This forces you to keep logic outside the AST nodes (leading to the visitor pattern).

Other forms of metaprogramming: C++ macros, Rust macros.

5. The Expression Problem

Definition

Given n operations and m types, you must choose one trade-off.

In the lecture’s example: 3 operations (interpret, resolve, analyze) x 4 types (binary, grouping, literal, unary).

	Add New Type	Add New Operation
OOP approach	Easy (just add a class, implement methods)	Hard (must modify every existing class)
Functional approach	Hard (must update every function)	Easy (just add a new function)

The Functional Approach

In functional programming, instead of objects with methods, you write top-level functions that use elaborate if / pattern-matching statements to handle each type. Adding a new function is easy (standalone code), but adding a new type means updating every existing function.

The Trade-off

There is no solution that makes both adding new types and adding new operations easy without modifying existing code.

In Practice

“With modern IDEs, I can add a new function to a class hierarchy very easily. Just add it to the base class, hit alt-enter, implement on subclasses. It’s really not a hard problem.”

6. The Visitor Pattern

From the book: “The visitor pattern is the most widely misunderstood pattern in all of design patterns, which is really saying something when you look at the software architecture excesses of the past couple decades.”

Structure

Base class defines an accept(Visitor v) method.
Visitor interface defines a visit method for every concrete subclass.
Each concrete subclass implements accept by calling visitor.visit(this).

Example: Vehicle Visitor

// Base class
abstract class Vehicle {
    abstract void accept(VehicleVisitor visitor);
}

// Visitor interface
interface VehicleVisitor {
    void visit(Car car);
    void visit(Truck truck);
}

// Concrete class
class Truck extends Vehicle {
    void accept(VehicleVisitor visitor) {
        visitor.visit(this);  // dispatches to visit(Truck)
    }
}

// Concrete visitor
class VehiclePrinter implements VehicleVisitor {
    void visit(Car car)   { print("Found a car"); }
    void visit(Truck truck) { print("Found a truck"); }
}

Key mechanism: this inside Truck.accept() is of type Truck, so visitor.visit(this) calls visit(Truck) - achieving double dispatch.

Simpler Alternatives

instanceof checks: if (vehicle instanceof Car) { ... } else if (vehicle instanceof Truck) { ... }
Put the method directly on the class: vehicle.print()

Why the Book Uses Visitors

Because the AST classes are code-generated, you can’t add logic to them (it would be overwritten).
Visitors let you define operations externally to the generated classes.

7. The CatScript Approach: Methods on Nodes

Instead of visitors, put operations directly on the AST node classes:

class AdditiveExpression extends Expression {
    Expression leftHandSide;
    Expression rightHandSide;

    // Evaluation logic  -  right here on the node
    Object evaluate(CatScriptRuntime runtime) {
        Object lhs = leftHandSide.evaluate(runtime);
        Object rhs = rightHandSide.evaluate(runtime);
        if (isAddition) return (int)lhs + (int)rhs;
        else return (int)lhs - (int)rhs;
    }

    // Transpilation  -  also right here
    void transpile(StringBuilder sb) { ... }

    // Compilation  -  also right here
    void compile(ByteCodeGenerator gen) { ... }
}

Four Operations on the Parse Tree

Operation	Purpose
Validate	Type checking / verification
Evaluate (Eval)	Interpret the program directly
Transpile	Translate to another language
Compile	Generate bytecode

Advantages

Dramatically simpler codebase - no visitors, no callbacks
Everything about a node type is in one place (high cohesion)
Easy to read, debug, and understand
Copy-paste with slight modifications is preferred over complex abstractions

The “Violation”

This violates separation of concerns - the AST knows about compilation, evaluation, etc. Carson argues this is a worthwhile trade-off:

“I don’t care when people say you’re not separating your concerns. I think there are more important things like simplicity, making it easy to debug, simplicity, locality of behavior, simplicity, easy reading, simplicity, and also simplicity.”

8. Software Design Philosophy

The Developer Experience Curve (attributed to Lea Verou)

How priorities shift over a programming career:

Concern	Early Career	~5 Years	~10 Years	Senior
“Does it work?”	Highest priority	Less important	Even less	Assumed
Code readability	Some	Low (focus on expressiveness)	Growing	Highest priority
DRY (Don’t Repeat Yourself)	Low (don’t know how)	Growing	Peak (no repeated code!)	Declining - copy-paste is OK

Key Takeaways

Fight hard to keep code simple.
Copy-paste with slight modifications > super complicated architecture with no repetition.
Three similar lines of code is better than a premature abstraction.
The visitor pattern and code generation add complexity that may not be justified.
Choose simplicity and readability over the “right” architecture.
The instructor’s ideal: the “grug brain” parser - plain Java classes with methods, no visitors, no code generators.

“Copy and paste with slight modifications is better than a super complicated architecture that has no repetition but is impossible to understand.”

Summary

Topic	Key Point
Parse Trees	Objects with children arranged as a tree; nodes have start/end tokens
Code Generation	Metaprogramming to avoid boilerplate; forces logic outside the AST
Expression Problem	Fundamental trade-off: easy new types OR easy new operations, not both
Visitor Pattern	Double-dispatch mechanism; powerful but adds significant complexity
CatScript Approach	Handwritten classes, logic on nodes, no visitors - simplicity wins
Design Philosophy	Readability and simplicity > DRY > architectural purity