Abstract. This document describes the abstract syntax, concrete syntax, and printing related
to RawAst statements in the B3 implementation.
The abstract syntax captured by the types in the RawAst module of the B3 implementation
are more general than the concrete syntax that is parsed. This gives rise to an abstract syntax
tree (AST) that is easier to process, because it is more general and abstracts over restrictions
in the syntax.
An AST is said to be anomalous if it's not the direct counterpart of what the parser accepts.
Printing is defined on the general AST, not just on the restricted way in which the parser creates such ASTs. Two desired properties of the printed output are:
This document describes the relation between these three components.
In the AST, a block statement, or block for short, is a list of statements:
datatype Stmt =
| Block(stmts: seq<Stmt>)
The concrete syntax for a block is a pair of curly braces surrounding a sequence of
(zero of more) statements. Here and throughout, the concrete syntax shown uses S
to denote a statement.
"{" S* "}"
Printing of a block is straightforward.
A procedure has an optional body:
datatype Procedure = Procedure(, body: Option<Stmt>)
In the AST, the body is an optional Stmt. However, the parser only accepts a block.
In an anomalous AST, where the body is some Stmt that is not a Block, the printer
prints the anomalous body inside a pair of curly braces, as if it were a block containing
a single statement.
A loop statement contains a body. In the AST, the body is a Stmt:
datatype Stmt =
| Loop(, body: Stmt)
The parser only accepts a block as a loop body. Thus, as for procedure bodies with an anomalous body, a loop with an anomalous body prints the body within a pair of curly braces, as if it were a block containing a single statement.
A variable declaration introduces a new variable (whose name may shadow a previously declared variable) for use in a given statement body.
datatype Stmt =
| VarDecl(name: string, isMutable: bool, typ: TypeName,
init: Option<Expr>, body: Stmt)
Variable declarations are parsed in four forms of concrete syntax:
var x ":" T S*
var x ":" T ":=" E S*
val x ":" T S*
val x ":" T ":=" E S*
where x is an identifier, T is a type, and E is an expression.
Of these, x and T are the name and typ of the VarDecl. The keyword
var says isMutable := true and the keyword val says isMutable := false.
The presence or absence of ":=" E correspond to the init in the AST.
In the parsed form, the body is not a single statement but a sequence of statements.
This sequence is parsed as long as possible, which means that the parser only ever
generates a variable declaration at the end of an enclosing statement sequence inside
a block. The parsed statement sequence is placed in a Block
and this Block is used as the body of the VarDecl.
To reverse this process for printing in the presence of a possibly anomalous AST, two considerations arise.
One consideration is that the concrete syntax expects the variable declaration's body to end in an enclosing block. This can be handled by printing the variable declaration inside a pair of curly braces. However, in the expected case where the variable declaration is already the last statement in an enclosing block, there is no need to add any curly braces.
The other consideration is that parsing introduced a Block to hold the statement
sequences that form the body of the variable declaration. To reverse that process,
only the Block's statement sequence is printed, without enclosing curly braces.
An anomalous VarDecl, where body is not a Block, prints the body as
the statement that it is.
There is one more variable declaration. It is the forall statement, which introduces a bound variable for an assertion statement.
datatype Stmt =
| AForall(name: string, typ: TypeName, body: Stmt)
Because of the way it tends to be used, namely to be more intentional about the
end of the scope of its bound variable, the forall statement has a different concrete
syntax than the other variable declarations. Using B to represent a block statement,
the concrete syntax is
forall x ":" T B
The parser records the components of the AST as x, T, and B.
In the anomalous case where body is not a block statement, the printer prints the
body within a pair of curly braces, as if it were a block containing a single statement.
A labeled statement is a label followed by a statement:
datatype Stmt =
| LabeledStmt(lbl: Label, stmt: Stmt)
The parser expects only blocks and loops can be labeled. Using ℓ to represent
a label identifier, and L to represent a loop, the
concrete syntax thus has just the following two forms of labeled statements:
ℓ ":" B
ℓ ":" L
To print an anomalous labeled statement, there the stmt labeled is neither a
block nor a loop, the printer prints the labeled statement within a pair of
curly braces, as if it were a block containing a single statement.
The language has two if statements:
datatype Stmt =
| If(cond: Expr, thn: Stmt, els: Stmt)
| IfCase(cases: seq<Case>)
datatype Case = Case(cond: Expr, body: Stmt)
The concrete syntax of If has four possible forms:
if E B
if E B0 else B1
if E B else If
if E B else IfCase
The parser turns the first of these forms into an instance of the second form: if E B else {}.
When the printer gets an if statement where the else branch is an empty block,
it omits the else clause altogether. This violates the second of our two stated
desired properties of the printer, since if the parser gets the second form with an
empty else block, then the printer will print it as the first form.
The concrete syntax of IfCase corresponds to the abstract syntax, but with the restriction
that there always be at least one case. That is, the concrete syntax is
if C+
where C denotes
case E B
If the printer get an anomalous IfCase with an empty sequence, it prints it as
if case false {}There are two forms of assertion expressions:
datatype AExpr =
| AExpr(e: Expr)
| AAssertion(s: Stmt)
In the second form, the concrete syntax expects the assertion statement s to be a block statement.
In the case of an anomalous AST where s is not a block statement, s is printed
inside a pair of curly braces, as if it were a block containing a single statement.