This post documents the second step of my “write a relational database from scratch” journey: the parser.
The parser turns the token stream from the lexer into a small, typed AST we can execute later.

Why a Parser?

The lexer normalized raw text into tokens. The parser now checks those tokens against a grammar and builds a tree:

  • It validates shapes like CREATE TABLE ... (...) and INSERT ... VALUES (...).
  • It turns literals and identifiers into typed nodes, ready for semantic checks and execution.

Grammar (current scope)

program        := statement ";" EOF ;
statement      := create_table | insert_stmt | select_stmt ;

create_table   := "CREATE" "TABLE" ident "(" column_def ("," column_def)* ")" ;
column_def     := ident data_type ( "NOT" "NULL" | "NULL" | "DEFAULT" expr )* ;

data_type      := BOOLEAN | BOOL | INTEGER | INT | FLOAT | DOUBLE | STRING | TEXT | VARCHAR ;

insert_stmt    := "INSERT" "INTO" ident ["(" ident ("," ident)* ")"]
                  "VALUES" "(" expr ("," expr)* ")"
                  ("," "(" expr ("," expr)* ")")* ;

select_stmt    := "SELECT" "*" "FROM" ident ;

expr           := literal ;              // minimal: NUMBER | 'STRING' | TRUE | FALSE | NULL

The grammar is intentionally tiny but useful: it lets us create tables, insert seed rows, and fetch them back with SELECT *.


Token Stream & One-Token Lookahead

From the lexer we get Token { kind, text, keyword } where kind is one of
KEYWORD | IDENTITY | STRING | NUMBER | OPEN_PAREN | CLOSE_PAREN | COMMA | SEMICOLON | ASTERISK | PLUS | MINUS | SLASH.

The parser maintains one-token lookahead:

private final Lexer lexer;
private Token look; // null => EOF

public Parser(String input) {
  this.lexer = new Lexer(input);
  this.look = lexer.hasNext() ? lexer.nextToken() : null; // prime the first token
}

Two tiny contracts

  • peek() returns look without consuming it — use it to decide what to parse.
  • next() returns look and advances look with lexer.nextToken() — use it to consume once you decide.

AST model (short recap)

The AST is a handful of value types:

  • Statement = CreateTable | Insert | Select
  • CreateTable(name, columns) with Column(name, dataType, nullable?, defaultExpr?)
  • Insert(tableName, columns?, values) where values is a list of rows (each row is a list of Expression)
  • Select(tableName)
  • Expression = Const for now (NULL | BOOLEAN | INTEGER | FLOAT | STRING)

This keeps the parser simple and the next stage (semantic checks/execution) clean.


Implementation Highlights

1) Statement dispatch

private AST.Statement parseStatement() {
  Token t = peek();
  if (t.kind == TokenKind.KEYWORD) {
    return switch (t.keyword) {
      case Create -> parseDDL();
      case Select -> parseSelect();
      case Insert -> parseInsert();
      default -> throw new ParseException("[Parser] Unexpected keyword " + t);
    };
  }
  throw new ParseException("[Parser] Unexpected token " + t);
}

2) CREATE TABLE

private AST.Statement parseDDL() {
  nextExpect(Token.keyword(Keyword.Create));
  nextExpect(Token.keyword(Keyword.Table));
  return parseDDLCreateTable();
}

private AST.Statement parseDDLCreateTable() {
  String table = nextIdentity();
  nextExpect(Token.symbol(TokenKind.OPEN_PAREN, "("));

  List<AST.Column> cols = new ArrayList<>();
  while (true) {
    cols.add(parseDDLColumns());
    if (nextIfToken(Token.symbol(TokenKind.COMMA, ",")) == null) break;
  }
  nextExpect(Token.symbol(TokenKind.CLOSE_PAREN, ")"));
  return new AST.CreateTable(table, cols);
}

Column definition + constraints:

private AST.Column parseDDLColumns() {
  String name = nextIdentity();
  Token t = next(); // data type keyword
  AST.DataType ty = switch (t.keyword) {
    case Boolean, Bool -> AST.DataType.BOOLEAN;
    case Integer, Int  -> AST.DataType.INTEGER;
    case Float, Double -> AST.DataType.FLOAT;
    case String, Text, Varchar -> AST.DataType.STRING;
    default -> throw new ParseException("[Parser] Unexpected keyword " + t.keyword);
  };

  Boolean nullable = null;  // unspecified by default
  AST.Expression def = null;

  while (true) {
    Token k = nextIfKeyword();
    if (k == null) break;
    switch (k.keyword) {
      case Null -> nullable = Boolean.TRUE;
      case Not  -> { nextExpect(Token.keyword(Keyword.Null)); nullable = Boolean.FALSE; }
      case Default -> def = parseExpression();
      default -> throw new ParseException("[Parser] Unexpected keyword " + k.keyword);
    }
  }
  return new AST.Column(name, ty, nullable, def);
}

3) Expressions (minimal, literal-only)

private AST.Expression parseExpression() {
  Token t = next();
  return switch (t.kind) {
    case STRING -> AST.Const.ofString(t.text);
    case NUMBER -> {
      String raw = t.text;
      boolean isFloat = raw.indexOf('.') >= 0; // optionally also check e/E
      try {
        if (isFloat) yield AST.Const.ofFloat(Double.parseDouble(raw));
        int v = Integer.parseInt(raw);          // use Long.parseLong for larger integers
        yield AST.Const.ofInteger(v);
      } catch (NumberFormatException e) {
        throw new ParseException("[Parser] Invalid numeric literal: " + raw, e);
      }
    }
    case KEYWORD -> switch (t.keyword) {
      case True -> AST.Const.ofBoolean(true);
      case False -> AST.Const.ofBoolean(false);
      case Null -> AST.Const.ofNull();
      default -> throw new ParseException("[Parser] Unexpected keyword" + t.keyword);
    };
    default -> throw new ParseException("[Parser] Unexpected token" + t);
  };
}

4) SELECT & INSERT

// SELECT * FROM tbl;
private AST.Statement parseSelect() {
  nextExpect(Token.keyword(Keyword.Select));
  nextExpect(Token.symbol(TokenKind.ASTERISK, "*"));
  nextExpect(Token.keyword(Keyword.From));
  String table = nextIdentity();
  return new AST.Select(table);
}
// INSERT INTO t [(c1,c2)] VALUES (1,'a'),(2,'b');
private AST.Statement parseInsert() {
  nextExpect(Token.keyword(Keyword.Insert));
  nextExpect(Token.keyword(Keyword.Into));
  String table = nextIdentity();

  List<String> cols = null;
  if (nextIfToken(Token.symbol(TokenKind.OPEN_PAREN, "(")) != null) {
    cols = new ArrayList<>();
    while (true) {
      cols.add(nextIdentity());
      Token sep = next();
      if (sep.equals(Token.symbol(TokenKind.CLOSE_PAREN, ")"))) break;
      if (sep.equals(Token.symbol(TokenKind.COMMA, ","))) continue;
      throw new ParseException("[Parser] Unexpected token " + sep);
    }
  }

  nextExpect(Token.keyword(Keyword.Values));

  List<List<AST.Expression>> rows = new ArrayList<>();
  while (true) {
    nextExpect(Token.symbol(TokenKind.OPEN_PAREN, "("));
    List<AST.Expression> exprs = new ArrayList<>();
    while (true) {
      exprs.add(parseExpression());
      Token sep = next();
      if (sep.equals(Token.symbol(TokenKind.CLOSE_PAREN, ")"))) break;
      if (sep.equals(Token.symbol(TokenKind.COMMA, ","))) continue;
      throw new ParseException("[Parser] Unexpected token " + sep);
    }
    rows.add(exprs);
    if (nextIfToken(Token.symbol(TokenKind.COMMA, ",")) == null) break;
  }
  return new AST.Insert(table, cols, rows);
}

Error Handling

A tiny custom exception keeps messages focused:

public class ParseException extends RuntimeException {
  public ParseException(String message) { super(message); }
  public ParseException(String message, Throwable cause) { super(message, cause); }
}

Examples

Create table

CREATE TABLE users (
  id INT NOT NULL,
  name STRING DEFAULT 'guest',
  active BOOLEAN DEFAULT TRUE
);

Insert rows

INSERT INTO users (id, name) VALUES (1, 'a'), (2, 'b');

Select

SELECT * FROM users;