B3 statements: Syntax, raw AST, and printing
K. Rustan M. Leino
Manuscript KRML 304, 5 July 2025

Abstract. This document describes the abstract syntax, concrete syntax, and printing related to RawAst statements in the B3 implementation.

Introduction

The abstract syntax captured by the types in the RawAst module of the B3 implementation are more general than the concrete syntax that is parsed. This gives rise to an abstract syntax tree (AST) that is easier to process, because it is more general and abstracts over restrictions in the syntax.

An AST is said to be anomalous if it's not the direct counterpart of what the parser accepts.

Printing is defined on the general AST, not just on the restricted way in which the parser creates such ASTs. Two desired properties of the printed output are:

This document describes the relation between these three components.

Blocks

In the AST, a block statement, or block for short, is a list of statements:

datatype Stmt =
  $\ldots$
  | Block(stmts: seq<Stmt>)

The concrete syntax for a block is a pair of curly braces surrounding a sequence of (zero of more) statements. Here and throughout, the concrete syntax shown uses S to denote a statement.

"{" S* "}"

Printing of a block is straightforward.

Procedure bodies

A procedure has an optional body:

datatype Procedure = Procedure($\ldots$, body: Option<Stmt>)

In the AST, the body is an optional Stmt. However, the parser only accepts a block. In an anomalous AST, where the body is some Stmt that is not a Block, the printer prints the anomalous body inside a pair of curly braces, as if it were a block containing a single statement.

Loops

A loop statement contains a body. In the AST, the body is a Stmt:

datatype Stmt =
  $\ldots$
  | Loop($\ldots$, body: Stmt)

The parser only accepts a block as a loop body. Thus, as for procedure bodies with an anomalous body, a loop with an anomalous body prints the body within a pair of curly braces, as if it were a block containing a single statement.

Variable declarations

A variable declaration introduces a new variable (whose name may shadow a previously declared variable) for use in a given statement body.

datatype Stmt =
  $\ldots$
  | VarDecl(name: string, isMutable: bool, typ: TypeName,
            init: Option<Expr>, body: Stmt)

Variable declarations are parsed in four forms of concrete syntax:

var x ":" T S*
var x ":" T ":=" E S*
val x ":" T S*
val x ":" T ":=" E S*

where x is an identifier, T is a type, and E is an expression. Of these, x and T are the name and typ of the VarDecl. The keyword var says isMutable := true and the keyword val says isMutable := false. The presence or absence of ":=" E correspond to the init in the AST.

In the parsed form, the body is not a single statement but a sequence of statements. This sequence is parsed as long as possible, which means that the parser only ever generates a variable declaration at the end of an enclosing statement sequence inside a block. The parsed statement sequence is placed in a Block and this Block is used as the body of the VarDecl.

To reverse this process for printing in the presence of a possibly anomalous AST, two considerations arise.

One consideration is that the concrete syntax expects the variable declaration's body to end in an enclosing block. This can be handled by printing the variable declaration inside a pair of curly braces. However, in the expected case where the variable declaration is already the last statement in an enclosing block, there is no need to add any curly braces.

The other consideration is that parsing introduced a Block to hold the statement sequences that form the body of the variable declaration. To reverse that process, only the Block's statement sequence is printed, without enclosing curly braces. An anomalous VarDecl, where body is not a Block, prints the body as the statement that it is.

Forall assertions

There is one more variable declaration. It is the forall statement, which introduces a bound variable for an assertion statement.

datatype Stmt =
  $\ldots$
  | AForall(name: string, typ: TypeName, body: Stmt)

Because of the way it tends to be used, namely to be more intentional about the end of the scope of its bound variable, the forall statement has a different concrete syntax than the other variable declarations. Using B to represent a block statement, the concrete syntax is

forall x ":" T B

The parser records the components of the AST as x, T, and B.

In the anomalous case where body is not a block statement, the printer prints the body within a pair of curly braces, as if it were a block containing a single statement.

Labeled statements

A labeled statement is a label followed by a statement:

datatype Stmt =
  $\ldots$
  | LabeledStmt(lbl: Label, stmt: Stmt)

The parser expects only blocks and loops can be labeled. Using to represent a label identifier, and L to represent a loop, the concrete syntax thus has just the following two forms of labeled statements:

 ":" B
 ":" L

To print an anomalous labeled statement, there the stmt labeled is neither a block nor a loop, the printer prints the labeled statement within a pair of curly braces, as if it were a block containing a single statement.

If statements

The language has two if statements:

datatype Stmt =
  $\ldots$
  | If(cond: Expr, thn: Stmt, els: Stmt)
  | IfCase(cases: seq<Case>)

datatype Case = Case(cond: Expr, body: Stmt)

The concrete syntax of If has four possible forms:

if E B
if E B0 else B1
if E B else If
if E B else IfCase

The parser turns the first of these forms into an instance of the second form: if E B else {}. When the printer gets an if statement where the else branch is an empty block, it omits the else clause altogether. This violates the second of our two stated desired properties of the printer, since if the parser gets the second form with an empty else block, then the printer will print it as the first form.

The concrete syntax of IfCase corresponds to the abstract syntax, but with the restriction that there always be at least one case. That is, the concrete syntax is

if C+

where C denotes

case E B

If the printer get an anomalous IfCase with an empty sequence, it prints it as

if case false {}

Assertion expressions

There are two forms of assertion expressions:

datatype AExpr =
  | AExpr(e: Expr)
  | AAssertion(s: Stmt)

In the second form, the concrete syntax expects the assertion statement s to be a block statement.

In the case of an anomalous AST where s is not a block statement, s is printed inside a pair of curly braces, as if it were a block containing a single statement.