This post documents the second step of my “write a relational database from scratch” journey: the parser.
The parser turns the token stream from the lexer into a small, typed AST we can execute later.
Why a Parser?
The lexer normalized raw text into tokens. The parser now checks those tokens against a grammar and builds a tree:
- It validates shapes like
CREATE TABLE ... (...)andINSERT ... VALUES (...). - It turns literals and identifiers into typed nodes, ready for semantic checks and execution.
Grammar (current scope)
program := statement ";" EOF ;
statement := create_table | insert_stmt | select_stmt ;
create_table := "CREATE" "TABLE" ident "(" column_def ("," column_def)* ")" ;
column_def := ident data_type ( "NOT" "NULL" | "NULL" | "DEFAULT" expr )* ;
data_type := BOOLEAN | BOOL | INTEGER | INT | FLOAT | DOUBLE | STRING | TEXT | VARCHAR ;
insert_stmt := "INSERT" "INTO" ident ["(" ident ("," ident)* ")"]
"VALUES" "(" expr ("," expr)* ")"
("," "(" expr ("," expr)* ")")* ;
select_stmt := "SELECT" "*" "FROM" ident ;
expr := literal ; // minimal: NUMBER | 'STRING' | TRUE | FALSE | NULL
The grammar is intentionally tiny but useful: it lets us create tables, insert seed rows, and fetch them back with SELECT *.
Token Stream & One-Token Lookahead
From the lexer we get Token { kind, text, keyword } where kind is one ofKEYWORD | IDENTITY | STRING | NUMBER | OPEN_PAREN | CLOSE_PAREN | COMMA | SEMICOLON | ASTERISK | PLUS | MINUS | SLASH.
The parser maintains one-token lookahead:
private final Lexer lexer;
private Token look; // null => EOF
public Parser(String input) {
this.lexer = new Lexer(input);
this.look = lexer.hasNext() ? lexer.nextToken() : null; // prime the first token
}
Two tiny contracts
peek()returnslookwithout consuming it — use it to decide what to parse.next()returnslookand advanceslookwithlexer.nextToken()— use it to consume once you decide.
AST model (short recap)
The AST is a handful of value types:
Statement = CreateTable | Insert | SelectCreateTable(name, columns)withColumn(name, dataType, nullable?, defaultExpr?)Insert(tableName, columns?, values)wherevaluesis a list of rows (each row is a list ofExpression)Select(tableName)Expression = Constfor now (NULL | BOOLEAN | INTEGER | FLOAT | STRING)
This keeps the parser simple and the next stage (semantic checks/execution) clean.
Implementation Highlights
1) Statement dispatch
private AST.Statement parseStatement() {
Token t = peek();
if (t.kind == TokenKind.KEYWORD) {
return switch (t.keyword) {
case Create -> parseDDL();
case Select -> parseSelect();
case Insert -> parseInsert();
default -> throw new ParseException("[Parser] Unexpected keyword " + t);
};
}
throw new ParseException("[Parser] Unexpected token " + t);
}
2) CREATE TABLE
private AST.Statement parseDDL() {
nextExpect(Token.keyword(Keyword.Create));
nextExpect(Token.keyword(Keyword.Table));
return parseDDLCreateTable();
}
private AST.Statement parseDDLCreateTable() {
String table = nextIdentity();
nextExpect(Token.symbol(TokenKind.OPEN_PAREN, "("));
List<AST.Column> cols = new ArrayList<>();
while (true) {
cols.add(parseDDLColumns());
if (nextIfToken(Token.symbol(TokenKind.COMMA, ",")) == null) break;
}
nextExpect(Token.symbol(TokenKind.CLOSE_PAREN, ")"));
return new AST.CreateTable(table, cols);
}
Column definition + constraints:
private AST.Column parseDDLColumns() {
String name = nextIdentity();
Token t = next(); // data type keyword
AST.DataType ty = switch (t.keyword) {
case Boolean, Bool -> AST.DataType.BOOLEAN;
case Integer, Int -> AST.DataType.INTEGER;
case Float, Double -> AST.DataType.FLOAT;
case String, Text, Varchar -> AST.DataType.STRING;
default -> throw new ParseException("[Parser] Unexpected keyword " + t.keyword);
};
Boolean nullable = null; // unspecified by default
AST.Expression def = null;
while (true) {
Token k = nextIfKeyword();
if (k == null) break;
switch (k.keyword) {
case Null -> nullable = Boolean.TRUE;
case Not -> { nextExpect(Token.keyword(Keyword.Null)); nullable = Boolean.FALSE; }
case Default -> def = parseExpression();
default -> throw new ParseException("[Parser] Unexpected keyword " + k.keyword);
}
}
return new AST.Column(name, ty, nullable, def);
}
3) Expressions (minimal, literal-only)
private AST.Expression parseExpression() {
Token t = next();
return switch (t.kind) {
case STRING -> AST.Const.ofString(t.text);
case NUMBER -> {
String raw = t.text;
boolean isFloat = raw.indexOf('.') >= 0; // optionally also check e/E
try {
if (isFloat) yield AST.Const.ofFloat(Double.parseDouble(raw));
int v = Integer.parseInt(raw); // use Long.parseLong for larger integers
yield AST.Const.ofInteger(v);
} catch (NumberFormatException e) {
throw new ParseException("[Parser] Invalid numeric literal: " + raw, e);
}
}
case KEYWORD -> switch (t.keyword) {
case True -> AST.Const.ofBoolean(true);
case False -> AST.Const.ofBoolean(false);
case Null -> AST.Const.ofNull();
default -> throw new ParseException("[Parser] Unexpected keyword" + t.keyword);
};
default -> throw new ParseException("[Parser] Unexpected token" + t);
};
}
4) SELECT & INSERT
// SELECT * FROM tbl;
private AST.Statement parseSelect() {
nextExpect(Token.keyword(Keyword.Select));
nextExpect(Token.symbol(TokenKind.ASTERISK, "*"));
nextExpect(Token.keyword(Keyword.From));
String table = nextIdentity();
return new AST.Select(table);
}
// INSERT INTO t [(c1,c2)] VALUES (1,'a'),(2,'b');
private AST.Statement parseInsert() {
nextExpect(Token.keyword(Keyword.Insert));
nextExpect(Token.keyword(Keyword.Into));
String table = nextIdentity();
List<String> cols = null;
if (nextIfToken(Token.symbol(TokenKind.OPEN_PAREN, "(")) != null) {
cols = new ArrayList<>();
while (true) {
cols.add(nextIdentity());
Token sep = next();
if (sep.equals(Token.symbol(TokenKind.CLOSE_PAREN, ")"))) break;
if (sep.equals(Token.symbol(TokenKind.COMMA, ","))) continue;
throw new ParseException("[Parser] Unexpected token " + sep);
}
}
nextExpect(Token.keyword(Keyword.Values));
List<List<AST.Expression>> rows = new ArrayList<>();
while (true) {
nextExpect(Token.symbol(TokenKind.OPEN_PAREN, "("));
List<AST.Expression> exprs = new ArrayList<>();
while (true) {
exprs.add(parseExpression());
Token sep = next();
if (sep.equals(Token.symbol(TokenKind.CLOSE_PAREN, ")"))) break;
if (sep.equals(Token.symbol(TokenKind.COMMA, ","))) continue;
throw new ParseException("[Parser] Unexpected token " + sep);
}
rows.add(exprs);
if (nextIfToken(Token.symbol(TokenKind.COMMA, ",")) == null) break;
}
return new AST.Insert(table, cols, rows);
}
Error Handling
A tiny custom exception keeps messages focused:
public class ParseException extends RuntimeException {
public ParseException(String message) { super(message); }
public ParseException(String message, Throwable cause) { super(message, cause); }
}
Examples
Create table
CREATE TABLE users (
id INT NOT NULL,
name STRING DEFAULT 'guest',
active BOOLEAN DEFAULT TRUE
);
Insert rows
INSERT INTO users (id, name) VALUES (1, 'a'), (2, 'b');
Select
SELECT * FROM users;