In this chapter, we will make use of grammars and grammar-based testing to systematically generate program code – for instance, to test a compiler or an interpreter. Not very surprisingly, we use Python and the Python interpreter as our domain.
We chose Python not only because the rest of the book is also based on Python. Most importantly, Python brings lots of built-in infrastructure we can leverage, especially
This allows us to leverage grammars that operate on ASTs rather than concrete syntax, greatly reducing complexity.
Prerequisites
# ignore
import sys
# ignore
if sys.version_info < (3, 10):
print("This code requires Python 3.10 or later")
sys.exit(0)
To produce code, it is fairly easy to write a grammar with concrete syntax. If we want to produce, say, arithmetic expressions, we can easily create a concrete grammar which does precisely that.
We use the Fuzzingbook format for grammars, in which grammars are represented as dictionaries from symbols to lists of expansion alternatives.
EXPR_GRAMMAR: Grammar = {
"<start>":
["<expr>"],
"<expr>":
["<term> + <expr>", "<term> - <expr>", "<term>"],
"<term>":
["<factor> * <term>", "<factor> / <term>", "<factor>"],
"<factor>":
["+<factor>",
"-<factor>",
"(<expr>)",
"<integer>.<integer>",
"<integer>"],
"<integer>":
["<digit><integer>", "<digit>"],
"<digit>":
["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"]
}
assert is_valid_grammar(EXPR_GRAMMAR)
We can use this grammar to produce syntactically valid arithmetic expressions. We use the ISLa solver as our generator, as it is the most powerful; but we could also use any other of our grammar fuzzers such as GrammarFuzzer at this point.
Here are some concrete inputs produced from the grammar:
expr_solver = ISLaSolver(EXPR_GRAMMAR)
for _ in range(10):
print(expr_solver.solve())
4.3 + 512 / -(7 / 6 - 0 / 9 * 1 * 1) * +8.3 / 7 * 4 / 6 (4 / 7 + 1) / (4) / 9 / 8 + 4 / (3 + 6 - 7) +--(--(-9) * (4 * 7 + (4) + 4) + --(+(3)) - 6 + 0 / 7 + 7) (2 * 6 + 0 - 5) * 4 - +1 * (2 - 2) / 8 / 6 (+-(0 - (1) * 7 / 3)) / ((1 * 3 + 8) + 9 - +1 / --0) - 5 * (-+939.491) +2.9 * 0 / 501.19814 / --+--(6.05002) +-8.8 / (1) * -+1 + -8 + 9 - 3 / 8 * 6 + 4 * 3 * 5 (+(8 / 9 - 1 - 7)) + ---06.30 / +4.39 8786.82 - +01.170 / 9.2 - +(7) + 1 * 9 - 0 +-6 * 0 / 5 * (-(1.7 * +(-1 / +4.9 * 5 * 1 * 2) + -4.2 + (6 + -5) / (4 * 3 + 4)))
We could extend the grammar further to also produce assignments and other statements, and piece by piece cover the entire syntax of the programming language. However, this would be a not-so-great idea. Why?
The problem is that when testing compilers, you not only want to be able to produce code, but also to parse code, such that you can mutate and manipulate it at will. And this is where our "concrete" syntax will give us problems. While we can easily parse code (or expressions) that exactly adheres to the syntax...
expr_solver.check('2 + 2')
True
... a single space will already suffice to make it fail...
expr_solver.check('2 + 2')
Error parsing "2 + 2" starting with "<start>"
False
... as does the absence of spaces:
expr_solver.check('2+2')
Error parsing "2+2" starting with "<start>"
False
Indeed, spaces are optional in most programming languages. We could update our grammar such that it can handle optional spaces at all times (introducing a <space>
nonterminal). But then, there are other features like comments...
expr_solver.check('2 + 2 # should be 4')
Error parsing "2 + 2 # should be 4" starting with "<start>"
False
... or continuation lines ...
expr_solver.check('2 + \\\n2') # An expression split over two lines
Error parsing "2 + \ 2" starting with "<start>"
False
that our grammar would have to cover.
On top, there are language features that cannot be even represented properly in a context-free grammar:
For this reason, it is often a good idea to make use of a dedicated parser (or preprocessor) to turn input into a more abstract representation - typically a tree structure. In programming languages, such a tree is called an abstract syntax tree (AST); it is the data structure that compilers operate on.
Abstract Syntax Trees (ASTs) that represent program code are among the most complex data structures in the world (if not the most complex data structures) - notably because they reflect all the complexity of the programming language and its features. The good news is that in Python, working with ASTs is particularly easy - one can work with them using standard language features.
Let us illustrate ASTs using an example. Here is a piece of code that we'd like to work with:
def main():
print("Hello, world!") # A simple example
main()
Hello, world!
Let us obtain the source code of this function:
main_source = inspect.getsource(main)
print(main_source)
def main(): print("Hello, world!") # A simple example
We make use of the Python AST module to convert this code string to an AST and back.
With ast.parse()
, we can parse the main()
source into an AST:
main_tree = ast.parse(main_source)
This is what this tree looks like:
show_ast(main_tree)
We see how the function definition has become a FunctionDef
node, whose third child is an Expr
node, which in turn becomes a Call
– of the "print"
function with an argument of "Hello, world!"
.
Each of these AST nodes comes as a constructor – that is, we can invoke FunctionDef()
to obtain a function definition node, or Call()
to obtain a call node.
These constructors take the AST children as arguments, but also lots of optional arguments (which we did not use so far). The dump of the AST into a string reveals all the arguments for each constructor:
print(ast.dump(main_tree, indent=4))
Module( body=[ FunctionDef( name='main', args=arguments( posonlyargs=[], args=[], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[ Expr( value=Call( func=Name(id='print', ctx=Load()), args=[ Constant(value='Hello, world!')], keywords=[]))], decorator_list=[])], type_ignores=[])
The Python ast documentation lists all these constructors, which make up the abstract syntax. There are more than 100 individual constructors! (We said that ASTs are complex, right?)
The nice thing about the above string representation is that we can take it as is and turn it into a tree again:
my_main_tree = Module(
body=[
FunctionDef(
name='main',
args=arguments(
posonlyargs=[],
args=[],
kwonlyargs=[],
kw_defaults=[],
defaults=[]),
body=[
Expr(
value=Call(
func=Name(id='print', ctx=Load()),
args=[
Constant(value='Hello, world!')],
keywords=[]))],
decorator_list=[])],
type_ignores=[])
We can take this tree and compile it into executable code:
my_main_tree = fix_missing_locations(my_main_tree) # required for trees built from constructors
my_main_code = compile(my_main_tree, filename='<unknown>', mode='exec')
del main # This deletes the definition of main()
exec(my_main_code) # This defines main() again from `code`
main()
Hello, world!
We can also unparse the tree (= turn it into source code again). (Note how the comment got lost during parsing.)
print(ast.unparse(my_main_tree))
def main(): print('Hello, world!')
Hence, we can
ast.parse()
)ast.unparse()
)To generate and mutate ASTs (step #2, above), we need means to produce correct ASTs, invoking all constructors with the correct arguments. The plan is thus to have a grammar for ASTs, which produces (and parses) ASTs as we like.
Programming language grammars are among the most complicated formal grammars around, and ASTs reflect much of this complexity. We will use the abstract AST grammar as specified in the Python documentation as base, and build a formal context-free grammar step by step.
We will start with simple constants – strings and integers. Again, we use the fuzzingbook
syntax for grammars, as it allows for easier extension.
ANYTHING_BUT_DOUBLE_QUOTES_AND_BACKSLASH = (string.digits + string.ascii_letters + string.punctuation + ' ').replace('"', '').replace('\\', '')
ANYTHING_BUT_SINGLE_QUOTES_AND_BACKSLASH = (string.digits + string.ascii_letters + string.punctuation + ' ').replace("'", '').replace('\\', '')
ANYTHING_BUT_DOUBLE_QUOTES_AND_BACKSLASH
"0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!#$%&'()*+,-./:;<=>?@[]^_`{|}~ "
ANYTHING_BUT_SINGLE_QUOTES_AND_BACKSLASH
'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&()*+,-./:;<=>?@[]^_`{|}~ '
PYTHON_AST_CONSTANTS_GRAMMAR: Grammar = {
'<start>': [ '<expr>' ],
# Expressions
'<expr>': [ '<Constant>', '<Expr>' ],
'<Expr>': [ 'Expr(value=<expr>)' ],
# Constants
'<Constant>': [ 'Constant(value=<literal>)' ],
'<literal>': [ '<string>', '<integer>', '<float>', '<bool>', '<none>' ],
# Strings
'<string>': [ '"<not_double_quotes>*"', "'<not_single_quotes>*'" ],
'<not_double_quotes>': list(ANYTHING_BUT_DOUBLE_QUOTES_AND_BACKSLASH),
'<not_single_quotes>': list(ANYTHING_BUT_SINGLE_QUOTES_AND_BACKSLASH),
# FIXME: The actual rules for Python strings are also more complex:
# https://docs.python.org/3/reference/lexical_analysis.html#numeric-literals
# Numbers
'<integer>': [ '<digit>', '<nonzerodigit><digits>' ],
'<float>': [ '<integer>.<integer>' ],
'<nonzerodigit>': ['1', '2', '3', '4', '5', '6', '7', '8', '9'],
'<digits>': [ '<digit><digits>', '<digit>' ],
'<digit>': ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'],
# FIXME: There are _many_ more ways to express numbers in Python; see
# https://docs.python.org/3/reference/lexical_analysis.html#numeric-literals
# More
'<bool>': [ 'True', 'False' ],
'<none>': [ 'None' ],
# FIXME: Not supported: bytes, format strings, regex strings...
}
Note that we use extended Backus-Naur form in our grammars (here: <string>
):
<elem>+
stands for one or more instances of <elem>
;<elem>*
stands for zero or more instances of <elem>
;<elem>?
stands for one or zero instances of <elem>
.A call to is_valid_grammar()
ensures our grammar is free of common mistakes. Don't write grammars without it!
assert is_valid_grammar(PYTHON_AST_CONSTANTS_GRAMMAR)
constants_grammar = convert_ebnf_grammar(PYTHON_AST_CONSTANTS_GRAMMAR)
constants_solver = ISLaSolver(constants_grammar)
constants_tree_str = str(constants_solver.solve())
print(constants_tree_str)
Expr(value=Constant(value=None))
We can create an AST from this expression and turn it into Python code (well, a literal):
constants_tree = eval(constants_tree_str)
ast.unparse(constants_tree)
'None'
Let's do this a number of times:
def test_samples(grammar: Grammar, iterations: int = 10, start_symbol = None, log: bool = True):
g = convert_ebnf_grammar(grammar)
solver = ISLaSolver(g, start_symbol=start_symbol, max_number_free_instantiations=iterations)
for i in range(iterations):
tree_str = str(solver.solve())
tree = eval(tree_str)
ast.fix_missing_locations(tree)
if log:
code = ast.unparse(tree)
print(f'{code:40} # {tree_str}')
test_samples(PYTHON_AST_CONSTANTS_GRAMMAR)
False # Expr(value=Constant(value=False)) 2 # Constant(value=2) None # Constant(value=None) '#' # Constant(value="#") 550.81 # Constant(value=550.81) True # Constant(value=True) '.' # Constant(value='.') 467 # Constant(value=467) 7894 # Constant(value=7894) 263 # Constant(value=263)
Our grammar can also parse ASTs obtained from concrete code.
sample_constant_code = "4711"
sample_constant_ast = ast.parse(sample_constant_code).body[0] # get the `Expr` node
sample_constant_ast_str = ast.dump(sample_constant_ast)
print(sample_constant_ast_str)
Expr(value=Constant(value=4711))
constant_solver = ISLaSolver(constants_grammar)
constant_solver.check(sample_constant_ast_str)
True
Let us now come up with a quiz question: Does our grammar support negative numbers?
For this, let's first find out if the Constant()
constructor also take a negative number as an argument? It turns out it can:
ast.unparse(Constant(value=-1))
'-1'
But what happens if we parse a negative number, say -1
? One might assume that this simply results in a Constant(-1)
, right? Try it out yourself!
quiz("If we parse a negative number, do we obtain ",
[
"a `Constant()` with a negative value, or",
"a unary `-` operator applied to a positive value?"
], 1 ** 0 + 1 ** 1)
The answer is that parsing -1
yields a unary minus USub()
applied to a positive value:
print(ast.dump(ast.parse('-1')))
Module(body=[Expr(value=UnaryOp(op=USub(), operand=Constant(value=1)))], type_ignores=[])
As unary operators are not part of our grammar (yet), it cannot handle negative numbers:
sample_constant_code = "-1"
sample_constant_ast = ast.parse(sample_constant_code).body[0] # get the `Expr` node
sample_constant_ast_str = ast.dump(sample_constant_ast)
constant_solver = ISLaSolver(constants_grammar)
constant_solver.check(sample_constant_ast_str)
Error parsing "Expr(value=UnaryOp(op=USub(), operand=Constant(value=1)))" starting with "<start>"
False
In the next sections, we will gradually expand our grammar with more and more Python features, eventually covering (almost) the entire language.
At this point, we have covered (almost) all AST elements of Python.
There would be a few more Python elements to consider (marked as FIXME
, above), but we'll leave these to the reader.
Let us define PYTHON_AST_GRAMMAR
as the official grammar coming out of this chapter.
PYTHON_AST_GRAMMAR = PYTHON_AST_MODULE_GRAMMAR
python_ast_grammar = convert_ebnf_grammar(PYTHON_AST_GRAMMAR)
Here are a few (very weird) examples of Python functions we can produce. All of these are valid, but only syntactically – very few of the code samples produced this way will actually result in something meaningful.
for elt in [ '<FunctionDef>' ]:
print(elt)
test_samples(PYTHON_AST_GRAMMAR, start_symbol=elt)
print()
<FunctionDef> def w(): pass # FunctionDef(name='w', args=arguments(posonlyargs=[], args=[], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[Pass()], decorator_list=[]) def a(): break # FunctionDef(name='a', args=arguments(posonlyargs=[], args=[], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[Break()], decorator_list=[]) def o(): return # FunctionDef(name='o', args=arguments(posonlyargs=[], args=[], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[Return()], decorator_list=[]) def v(): # type: continue # FunctionDef(name='v', args=arguments(posonlyargs=[], args=[], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[Continue()], decorator_list=[], type_comment='') def j(): # type: return # FunctionDef(name='j', args=arguments(posonlyargs=[], args=[], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[Return()], decorator_list=[], type_comment="") def k(): return return # FunctionDef(name='k', args=arguments(posonlyargs=[], args=[], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[Return(), Return()], decorator_list=[]) def Q() -> set(): # type: return # FunctionDef(name='Q', args=arguments(posonlyargs=[], args=[], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[Return()], decorator_list=[], returns=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), type_comment='') def d() -> None: return assert set(), set() return # FunctionDef(name='d', args=arguments(posonlyargs=[], args=[], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[Return(), Assert(test=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[]), msg=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])), Return()], decorator_list=[], returns=Constant(value=None)) def K() -> set(): return # FunctionDef(name='K', args=arguments(posonlyargs=[], args=[], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[Return()], decorator_list=[], returns=Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])) def y(): # type: return # FunctionDef(name='y', args=arguments(posonlyargs=[], args=[], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[Return()], decorator_list=[], type_comment='')
For convenience, let us introduce a class PythonFuzzer
that makes use of the above grammar in order to produce Python code. This will be fairly easy to use.
class PythonFuzzer(ISLaSolver):
"""Produce Python code."""
def __init__(self,
start_symbol: Optional[str] = None, *,
grammar: Optional[Grammar] = None,
constraint: Optional[str] =None,
**kw_params) -> None:
"""Produce Python code. Parameters are:
* `start_symbol`: The grammatical entity to be generated (default: `<FunctionDef>`)
* `grammar`: The EBNF grammar to be used (default: `PYTHON__AST_GRAMMAR`); and
* `constraint` an ISLa constraint (if any).
Additional keyword parameters are passed to the `ISLaSolver` superclass.
"""
if start_symbol is None:
start_symbol = '<FunctionDef>'
if grammar is None:
grammar = PYTHON_AST_GRAMMAR
assert start_symbol in grammar
g = convert_ebnf_grammar(grammar)
if constraint is None:
super().__init__(g, start_symbol=start_symbol, **kw_params)
else:
super().__init__(g, constraint, start_symbol=start_symbol, **kw_params)
def fuzz(self) -> str:
"""Produce a Python code string."""
abstract_syntax_tree = eval(str(self.solve()))
ast.fix_missing_locations(abstract_syntax_tree)
return ast.unparse(abstract_syntax_tree)
By default, the PythonFuzzer
will produce a function definition - that is, a function header and body.
fuzzer = PythonFuzzer()
print(fuzzer.fuzz())
def L(): continue
By passing a start symbol as parameter, you can have PythonFuzzer
produce arbitrary Python elements:
fuzzer = PythonFuzzer('<While>')
print(fuzzer.fuzz())
while (set()[set():set()], *(set())): if {}: while set(): continue break else: del return
Here is a list of all possible start symbols:
sorted(list(PYTHON_AST_GRAMMAR.keys()))
['<Assert>', '<Assign>', '<Attribute>', '<AugAssign>', '<BinOp>', '<BoolOp>', '<Break>', '<Call>', '<Compare>', '<Constant>', '<Continue>', '<Delete>', '<Dict>', '<EmptySet>', '<Expr>', '<For>', '<FunctionDef>', '<If>', '<List>', '<Module>', '<Name>', '<Pass>', '<Return>', '<Set>', '<Slice>', '<Starred>', '<Subscript>', '<Tuple>', '<UnaryOp>', '<While>', '<With>', '<arg>', '<arg_list>', '<args>', '<args_param>', '<arguments>', '<bool>', '<boolop>', '<cmpop>', '<cmpop_list>', '<cmpops>', '<decorator_list_param>', '<defaults_param>', '<digit>', '<digits>', '<expr>', '<expr_list>', '<exprs>', '<float>', '<func>', '<id>', '<id_continue>', '<id_start>', '<identifier>', '<integer>', '<keyword>', '<keyword_list>', '<keywords>', '<keywords_param>', '<kw_defaults_param>', '<kwarg>', '<kwonlyargs_param>', '<lhs_Attribute>', '<lhs_List>', '<lhs_Name>', '<lhs_Starred>', '<lhs_Subscript>', '<lhs_Tuple>', '<lhs_expr>', '<lhs_exprs>', '<literal>', '<mod>', '<none>', '<nonempty_expr_list>', '<nonempty_lhs_expr_list>', '<nonempty_stmt_list>', '<nonzerodigit>', '<not_double_quotes>', '<not_single_quotes>', '<operator>', '<orelse_param>', '<posonlyargs_param>', '<returns>', '<start>', '<stmt>', '<stmt_list>', '<stmts>', '<string>', '<type_comment>', '<type_ignore>', '<type_ignore_list>', '<type_ignore_param>', '<type_ignores>', '<unaryop>', '<vararg>', '<withitem>', '<withitem_list>', '<withitems>']
When fuzzing, you may be interested in specific properties of the produced output. How can we influence the code that PythonFuzzer
produces? We explore two ways:
A simple way to adjust output generation is to adapt the grammar.
Let us assume you'd like to have function definitions without decorators. To achieve this, you can alter the rule that produces function definitions:
PYTHON_AST_GRAMMAR['<FunctionDef>']
['FunctionDef(name=<identifier>, args=<arguments>, body=<nonempty_stmt_list><decorator_list_param><returns>?<type_comment>?)']
As any AST rule, it comes in abstract syntax, so we first have to identify the element we'd like to adjust.
In our case, this is decorator_list
.
Since decorator_list is a list, we can alter the rule to produce empty lists only.
To create a new adapted grammar, we do not alter the existing PYTHON_AST_GRAMMAR
.
Instead, we use the extend_grammar()
function to create a new grammar with a new, adapted rule for <FunctionDef>
:
python_ast_grammar_without_decorators: Grammar = extend_grammar(PYTHON_AST_GRAMMAR,
{
'<FunctionDef>' :
['FunctionDef(name=<identifier>, args=<arguments>, body=<nonempty_stmt_list>, decorator_list=[])']
})
However, we're not done yet.
We also need to ensure that our grammar is valid, as any misspelled nonterminal identifier will result in problems during production.
For this, we use the is_valid_grammar()
function:
with ExpectError():
assert is_valid_grammar(python_ast_grammar_without_decorators)
'<decorator_list_param>': defined, but not used. Consider applying trim_grammar() on the grammar '<returns>': defined, but not used. Consider applying trim_grammar() on the grammar '<decorator_list_param>': unreachable from <start>. Consider applying trim_grammar() on the grammar '<returns>': unreachable from <start>. Consider applying trim_grammar() on the grammar Traceback (most recent call last): File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_32402/3611578183.py", line 2, in <cell line: 1> assert is_valid_grammar(python_ast_grammar_without_decorators) AssertionError (expected)
We see that with our change, our grammar has an orphaned rule: The <returns>
rule is no longer used.
This is because <returns>
is part of the <type_annotation>
we just have deleted.
(<type_annotation>
is still used when defining types for variables.)
To fix this, we need to delete the <returns>
rule from our grammar.
Fortunately, we have a function trim_grammar()
, which deletes all orphaned rules:
python_ast_grammar_without_decorators = trim_grammar(python_ast_grammar_without_decorators)
With this, our grammar becomes valid...
assert is_valid_grammar(python_ast_grammar_without_decorators)
... and we can use it for fuzzing - now without decorators:
fuzzer = PythonFuzzer(grammar=python_ast_grammar_without_decorators)
print(fuzzer.fuzz())
def X(): break
Adjusting the grammar is straightforward once you understood the grammar structure, but the AST grammar is complex; also, your changes and extensions tie you closely to the grammar structure. Carefully study how the individual rules are defined, above.
A more elegant alternative to altering the grammar is to make use of constraints that tune the grammar to your needs.
Since PythonFuzzer
is derived from ISLaSolver
, we can pass a constraint
argument constraining the grammar, as discussed in the chapter on fuzzing with constraints.
If we want to have a function definition with 10 characters in each identifier, we make use of an ISLa constraint:
fuzzer = PythonFuzzer(constraint='str.len(<id>) = 10')
print(fuzzer.fuzz())
def yWOOLwypwp(): # type: return
We can also constrain individual children – say, the actual identifier of the function.
# Also works (the <identifier> has quotes)
fuzzer = PythonFuzzer(constraint='<FunctionDef>.<identifier> = "\'my_favorite_function\'"')
print(fuzzer.fuzz())
@[set(), set()] @set() | {} @(-*set())[set():(): set()[:]()] def my_favorite_function(dlFf=Qr, l1M=set(), *) -> 942.5: return
Assume we want to test how the compiler handles large numbers. Let us define a constraint such that the function body (<nonempty_stmt_list>
) contains at least one integer (<integer>
) with a value of at least 1000:
fuzzer = PythonFuzzer(constraint=
"""
exists <integer> x:
(inside(x, <nonempty_stmt_list>) and str.to.int(x) > 1000)
""")
print(fuzzer.fuzz())
@[set(), +set(), set()] @{set(): set(), set(): set()} @(set(), *set() & set()) def l(r, a, /, *uXLV, _=set()[:], **Z) -> sdTYWE9b or {set(), set().R}.Vy != z1vw([]): del 1008
Assume we'd like to test compilers with non-trivial functions. Here's how to define a constraint such that the function body has exactly three statements (<stmt>
). Note that this can take more than a minute to resolve, but the result definitely is a nontrivial function.
# This will not work with ISLa 2
fuzzer = PythonFuzzer(constraint="""
forall <FunctionDef> def: count(def, "<stmt>", "3")
""")
print(fuzzer.fuzz())
@3.91 def V8(w, /, *, t=set(), C5D=set(), **foT6): if *{}.S[:] - ((set()) not in set() in set()): break else: return
And finally, if we want the decorator list to be empty, as in our grammar-altering example, we can constrain the decorator list to be empty:
# ignore
# with ExpectError(mute=True):
# # Triggers an ISLa error (AssertionError)
# fuzzer = PythonFuzzer(constraint='''
# str.contains(<FunctionDef>, "decorator_list=[]")
# ''')
# print(fuzzer.fuzz())
# ignore
# with ExpectError(mute=True):
# # Triggers an ISLa error (AssertionError)
# fuzzer = PythonFuzzer(constraint='<FunctionDef>.<expr_list> = "[]"')
# print(fuzzer.fuzz())
fuzzer = PythonFuzzer(constraint='<FunctionDef>..<expr_list> = "[]"')
print(fuzzer.fuzz())
def l(Jws4IzSPx_O2ajk687obQB3mflULCTJWnAv9GHg0YRtVNycueKFDMihZ5rXd1pqEo, /, *, **g): return
When producing code for compilers (or actually, producing inputs in general), it is often a good idea to not just create everything from scratch, but rather to mutate existing inputs. This way, one can achieve a better balance between common inputs (the ones to mutate) and uncommon inputs (the new parts added via mutation).
To mutate inputs, we first need to be able to parse them. This is where a grammar is really put to test - can it really parse all possible code? This is why relying on an existing parser that is tried and proven (in our case the Python parser) and operating on an abstraction (in our case the AST) is really handy.
We already have seen how to parse code into an AST, using ast.parse()
:
def sum(a, b): # A simple example
the_sum = a + b
return the_sum
sum_source = inspect.getsource(sum)
sum_tree = ast.parse(sum_source)
print(ast.unparse(sum_tree))
def sum(a, b): the_sum = a + b return the_sum
sum_str = ast.dump(sum_tree)
sum_str
"Module(body=[FunctionDef(name='sum', args=arguments(posonlyargs=[], args=[arg(arg='a'), arg(arg='b')], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[Assign(targets=[Name(id='the_sum', ctx=Store())], value=BinOp(left=Name(id='a', ctx=Load()), op=Add(), right=Name(id='b', ctx=Load()))), Return(value=Name(id='the_sum', ctx=Load()))], decorator_list=[])], type_ignores=[])"
Our grammar is able to parse this (non_trivial) string:
solver = ISLaSolver(python_ast_grammar)
assert solver.check(sum_str)
To mutate the input, we first have to parse it into a derivation tree structure. This is (again) a tree representation of the code, but this time, using the elements of our grammar.
sum_tree = solver.parse(sum_str)
Let us inspect what a derivation tree looks like. Alas, the string representation is very long and not that useful:
len(repr(sum_tree))
8737
repr(sum_tree)[:200]
"DerivationTree('<start>', (DerivationTree('<mod>', (DerivationTree('<Module>', (DerivationTree('Module(body=', (), id=495073), DerivationTree('<nonempty_stmt_list>', (DerivationTree('[', (), id=495071"
However, we can visualize the derivation tree:
display_tree(sum_tree)
We see that a derivation tree consists of nonterminal nodes whose children make up an expansion from the grammar.
For instance, at the very top, we see that a <start>
nonterminal expands into a <mod>
nonterminal, which again expands into a <Module>
nonterminal.
This comes right from the grammar rules
python_ast_grammar['<start>']
['<mod>']
and
python_ast_grammar['<mod>']
['<Module>']
The child of <mod>
is a <Module>
, which expands into the nodes
(body=
<nonempty_stmt_list>
, type_ignores=
<type_ignore_list>
)
Here, nodes like (body=
or , type_ignores=
are called terminal nodes (because they have no more elements to expand).
The nonterminals like <nonempty_stmt_list>
get expanded further below – notably, <nonempty_stmt_list>
expands into a <FunctionDef>
node that represents the sum()
definition.
Again, the structure exactly follows the <Module>
definition in our grammar:
python_ast_grammar['<Module>']
['Module(body=<nonempty_stmt_list><type_ignore_param>)']
If we traverse the tree depth-first, left to right, and only collect the terminal symbols, we obtain the original string we parsed.
Applying the str()
function to the derivation tree gets us exactly that string:
str(sum_tree)
"Module(body=[FunctionDef(name='sum', args=arguments(posonlyargs=[], args=[arg(arg='a'), arg(arg='b')], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[Assign(targets=[Name(id='the_sum', ctx=Store())], value=BinOp(left=Name(id='a', ctx=Load()), op=Add(), right=Name(id='b', ctx=Load()))), Return(value=Name(id='the_sum', ctx=Load()))], decorator_list=[])], type_ignores=[])"
And again, we can convert this string into an AST and thus obtain our original function:
sum_ast = ast.fix_missing_locations(eval(str(sum_tree)))
print(ast.unparse(sum_ast))
def sum(a, b): the_sum = a + b return the_sum
With derivation trees, we can have a structured representation of our input. In our case, we already have that with ASTs, so why bother introducing a new one? The answer is simple: Derivation trees also allow us to synthesize new inputs, because we have a grammar that describes their structure.
Most notably, we can mutate inputs as follows:
<symbol>
in the derivation tree to be mutated.<symbol>
.<symbol>
by the expansion just generated.This is a decent programming task, and if you'd like a blueprint, have a look at the FragmentMutator
in this tutorial on greybox fuzzing with grammars.
Fortunately, ISLa already provides us with functionality that does exactly this.
The ISLaSolver.mutate()
method takes an input and mutates it according to the rules in the grammar.
The input to mutate can be given as a derivation tree, or as a string; its output is a derivation tree (which can again be converted into a string).
Let us apply mutate()
on our sum()
function. The min_mutations
and max_mutations
parameters define how many mutation steps should be performed; we set both to 1 in order to have exactly one mutation.
sum_mutated_tree = solver.mutate(sum_str, min_mutations=1, max_mutations=1)
sum_mutated_ast = ast.fix_missing_locations(eval(str(sum_mutated_tree)))
print(ast.unparse(sum_mutated_ast))
def sum(a, b): the_sum = a + b return the_sum
Toy with the above to see the effect of a mutation.
Note if one of the top-level nodes (like <FunctionDef>
or <Module>
) is selected for mutation, then sum()
will be replaced by something entirely different. Otherwise, though, the code will still be pretty similar to the original sum()
code.
Of course, the more we increase the number of mutations, the more different the code will look like:
sum_mutated_tree = solver.mutate(sum_str, min_mutations=10, max_mutations=20)
sum_mutated_ast = ast.fix_missing_locations(eval(str(sum_mutated_tree)))
print(ast.unparse(sum_mutated_ast))
def sum(a, b): the_9GuWCvL4cpgyi37K5I_ = a + b return the_jXHPe1oqMG
By toying with the mutate()
parameters, we can control how common and how uncommon our input should be.
Does mutating existing code help us in finding bugs?
Let us assume we have a buggy compiler that generates bad code for an expression of the form <elem> * (<elem> + <elem>)
.
The code in has_distributive_law()
checks an AST for the presence of this bug:
def has_distributive_law(tree) -> bool:
for node in walk(tree): # iterate over all nodes in `tree`
# print(node)
if isinstance(node, ast.BinOp):
if isinstance(node.op, ast.Mult):
if isinstance(node.right, ast.BinOp):
if isinstance(node.right.op, ast.Add):
return True
if isinstance(node.left, ast.BinOp):
if isinstance(node.left.op, ast.Add):
return True
return False
To understand how this works, a visualization of the AST comes in handy:
show_ast(ast.parse("1 + (2 * 3)"))
has_distributive_law(ast.parse("1 * (2 + 3)"))
True
has_distributive_law(ast.parse("(1 + 2) * 3"))
True
has_distributive_law(ast.parse("1 + (2 * 3)"))
False
has_distributive_law(ast.parse("def f(a, b):\n return a * (b + 10)"))
True
How many attempts does it take for each until we find a mutation that triggers the bug in has_distributive_law()
?
Let us write a function that computes this number.
def how_many_mutations(code: str) -> int:
solver = ISLaSolver(python_ast_grammar)
code_ast = ast.parse(code)
code_ast = ast.fix_missing_locations(code_ast)
code_ast_str = ast.dump(code_ast)
code_derivation_tree = solver.parse(code_ast_str)
mutations = 0
mutated_code_ast = code_ast
while not has_distributive_law(mutated_code_ast):
mutations += 1
if mutations % 100 == 0:
print(f'{mutations}...', end='')
mutated_code_str = str(solver.mutate(code_derivation_tree))
mutated_code_ast = eval(mutated_code_str)
# mutated_code_ast = ast.fix_missing_locations(mutated_code_ast)
# print(ast.dump(mutated_code_ast))
# print(ast.unparse(mutated_code_ast))
return mutations
If we pass an input that already exhibits the bug, we do not need any mutation:
assert how_many_mutations('1 * (2 + 3)') == 0
However, the further we are away from the bug, the more mutations (and the more time) it takes to find it.
Notably, mutating 2 + 2
until we have a distributive law still is much faster than mutating 2
.
how_many_mutations('2 + 2') # <-- Note: this can take a minute
54
how_many_mutations('2') # <-- Note: this can take several minutes
100...200...300...400...500...600...700...800...900...1000...1100...1200...1300...1400...1500...1600...1700...1800...1900...2000...2100...2200...2300...2400...2500...
2500
We conclude that mutating existing code can indeed be helpful, especially if it is syntactically close to inputs that trigger bugs. If you want to have a good chance in finding bugs, focus on inputs that have triggered bugs before – sometimes a simple mutation of these already helps finding a new bug.
One interesting application of mutating inputs is to use mutations for evolutionary fuzzing. The idea is to have a population of inputs, to apply mutations on them, and to check whether they improve on a particular goal (mostly code coverage). Those inputs that do improve are being retained ("survival of the fittest") as the next generation, and evolved further. By repeating this process often enough, we may obtain inputs that cover large parts of code and thus improve chances to uncover bugs.
Let us assume we have a buggy compiler that generates bad code for an expression of the form <elem> * (<elem> + <elem>)
.
The function has_distributive_law()
, above, checks an AST for the presence of this bug.
Our aim is to detect this bug via fuzzing. But if we simply generate random inputs from scratch, it may take a long time until we generate the exact copmbination of operators that triggers the bug.
To have our fuzzers guided by coverage, we first need to measure code coverage.
We make use of the Coverage module from the Fuzzing Book, which is particularly easy to use.
It simply uses a with
clause to obtain coverage from the code in the with
body.
Here is how to obtain coverage for our has_distributive_law()
code, above:
mult_ast = ast.parse("1 * 2")
with Coverage() as cov:
has_distributive_law(mult_ast)
The coverage()
method tells us which lines in the code actually have been reached.
This includes lines from has_distributive_law()
, but also lines from other functions called.
cov.coverage()
{('_handle_fromlist', 1063), ('_handle_fromlist', 1064), ('_handle_fromlist', 1071), ('_handle_fromlist', 1075), ('_handle_fromlist', 1087), ('has_distributive_law', 2), ('has_distributive_law', 4), ('has_distributive_law', 5), ('has_distributive_law', 6), ('has_distributive_law', 10), ('has_distributive_law', 14), ('iter_child_nodes', 264), ('iter_child_nodes', 265), ('iter_child_nodes', 266), ('iter_child_nodes', 267), ('iter_child_nodes', 268), ('iter_child_nodes', 269), ('iter_child_nodes', 270), ('iter_fields', 252), ('iter_fields', 253), ('iter_fields', 254), ('walk', 378), ('walk', 379), ('walk', 380), ('walk', 381), ('walk', 382), ('walk', 383)}
Which are the lines executed? With a bit of code inspection, we can easily visualize the covered lines:
def show_coverage(cov, fun):
fun_lines, fun_start = inspect.getsourcelines(fun)
fun_name = fun.__name__
coverage = cov.coverage()
for line in range(len(fun_lines)):
if (fun_name, line + fun_start) in coverage:
print('# ', end='') # covered lines
else:
print(' ', end='') # uncovered lines
print(line + fun_start, fun_lines[line], end='')
show_coverage(cov, has_distributive_law)
1 def has_distributive_law(tree) -> bool: # 2 for node in walk(tree): # iterate over all nodes in `tree` 3 # print(node) # 4 if isinstance(node, ast.BinOp): # 5 if isinstance(node.op, ast.Mult): # 6 if isinstance(node.right, ast.BinOp): 7 if isinstance(node.right.op, ast.Add): 8 return True 9 # 10 if isinstance(node.left, ast.BinOp): 11 if isinstance(node.left.op, ast.Add): 12 return True 13 # 14 return False
In this listing, a #
indicates that the code has been executed (covered).
We see that our input "1 * 2" satisfies the conditions in Lines 4 and 5, but does not satisfy the conditions in later lines.
Let us now use coverage as a fitness function to guide evolution.
The higher the fitness (the coverage), the higher the chances of an input to be retained for further evolution.
Our ast_fitness()
function simply counts the number of lines covered in has_distributive_law()
.
def ast_fitness(code_ast) -> int:
with Coverage() as cov:
has_distributive_law(code_ast)
lines = set()
for (name, line) in cov.coverage():
if name == has_distributive_law.__name__:
lines.add(line)
return len(lines)
Here is the fitness of a number of given inputs:
ast_fitness(ast.parse("1"))
3
ast_fitness(ast.parse("1 + 1"))
4
ast_fitness(ast.parse("1 * 2"))
6
ast_fitness(ast.parse("1 * (2 + 3)"))
6
Now, let's set up a fitness function that takes derivation trees.
Essentially, our tree_fitness()
function is based on the ast_fitness()
function, above;
however, we also add a small component 1 / len(code_str)
to give extra fitness to shorter inputs.
Otherwise, our inputs may grow and keep on growing, making mutations inefficient.
def tree_fitness(tree) -> float:
code_str = str(tree)
code_ast = ast.fix_missing_locations(eval(code_str))
fitness = ast_fitness(code_ast)
# print(ast.unparse(code_ast), f"\n=> Fitness = {fitness}\n")
return fitness + 1 / len(code_str)
tree_fitness(sum_tree)
4.002666666666666
Let us now make use of our fitness function to implement a simple evolutionary fuzzing algorithm.
We start with evolution – that is, taking a population and adding offspring via mutations.
Our initial population consists of a single candidate – in our case, sum_tree
reflecting the sum()
function, above.
def initial_population(tree):
return [ (tree, tree_fitness(tree)) ]
sum_population = initial_population(sum_tree)
len(sum_population)
1
Our evolve()
function adds two new children to each population member.
OFFSPRING = 2
def evolve(population, min_fitness=-1):
solver = ISLaSolver(python_ast_grammar)
for (candidate, _) in list(population):
for i in range(OFFSPRING):
child = solver.mutate(candidate, min_mutations=1, max_mutations=1)
child_fitness = tree_fitness(child)
if child_fitness > min_fitness:
population.append((child, child_fitness))
return population
sum_population = evolve(sum_population)
len(sum_population)
3
As we can evolve all these, too, we get an exponential growth.
sum_population = evolve(sum_population)
len(sum_population)
9
sum_population = evolve(sum_population)
len(sum_population)
27
sum_population = evolve(sum_population)
len(sum_population)
81
sum_population = evolve(sum_population)
len(sum_population)
243
No population can expand forever and still survive. Let us thus limit the population to a certain size.
POPULATION_SIZE = 100
The select()
function implements survival of the fittest: It limits the population to at most POPULATION_SIZE
elements, sorting them by their fitness (highest to lowest).
Members with low fitness beyond POPULATION_SIZE
do not survive.
def get_fitness(elem):
(candidate, fitness) = elem
return fitness
def select(population):
population = sorted(population, key=get_fitness, reverse=True)
population = population[:POPULATION_SIZE]
return population
We can use the following call to trim our sum_population
to the fittest members:
sum_population = select(sum_population)
len(sum_population)
100
We now have everything in place:
sum_population
)evolve()
)select()
)Let us repeat this process over several generations. We track whenever we have found a new "best" candidate and log them. If we find a candidate that triggers the bug, we stop. Note that this may take a long time, and not necessarily yield a perfect result.
As common in search-based approaches, we stop and restart the search if we have not found a sufficient solution after a number of generations (here: GENERATIONS
).
Other than that, we keep searching until we have a solution.
GENERATIONS = 100 # Upper bound
trial = 1
found = False
while not found:
sum_population = initial_population(sum_tree)
prev_best_fitness = -1
for generation in range(GENERATIONS):
sum_population = evolve(sum_population, min_fitness=prev_best_fitness)
sum_population = select(sum_population)
best_candidate, best_fitness = sum_population[0]
if best_fitness > prev_best_fitness:
print(f"Generation {generation}: found new best candidate (fitness={best_fitness}):")
best_ast = ast.fix_missing_locations(eval(str(best_candidate)))
print(ast.unparse(best_ast))
prev_best_fitness = best_fitness
if has_distributive_law(best_ast):
print("Done!")
found = True
break
trial = trial + 1
print(f"\n\nRestarting; trial #{trial}")
Generation 0: found new best candidate (fitness=4.002666666666666): def sum(a, b): the_sum = a + b return the_sum Generation 1: found new best candidate (fitness=4.0027027027027025): def sum(a, b): the_sum = a + b return FE Generation 4: found new best candidate (fitness=4.002865329512894): def sum(): the_sum = a + b return the_sum Generation 5: found new best candidate (fitness=6.00094696969697): if set()[:] * *set(): def sum(a, b): mc = a + b return FE else: M = set() continue set().f[set():set()]() Generation 7: found new best candidate (fitness=7.002364066193853): def sum(a, b): mc = (a + b) * () return FE Done! Restarting; trial #2
Success! We found a piece of code that triggers the bug. Check for occurrences of the distributive law.
print(ast.unparse(best_ast))
def sum(a, b): mc = (a + b) * () return FE
assert has_distributive_law(best_ast)
You may note that not all of the code is required to trigger the bug. We could run our evolutionary fuzzer a bit longer to see whether it can be further reduced, or use a dedicated input reduction technique such as Delta Debugging.
Could the bug in distributive_law()
have been found without evolutionary guidance - i.e., simply by applying one mutation to sum()
?
When producing an expression (<expr>
), we calculate how big the chances are to
*
, and+
Let's do a few queries on our grammar to compute the chances.
assert '<BinOp>' in python_ast_grammar['<expr>']
len(python_ast_grammar['<expr>'])
15
assert 'Add()' in python_ast_grammar['<operator>']
assert 'Mult()' in python_ast_grammar['<operator>']
len(python_ast_grammar['<operator>'])
13
(len(python_ast_grammar['<expr>']) # chances of choosing a `BinOp`
* len(python_ast_grammar['<operator>']) # chances of choosing a `*`
* len(python_ast_grammar['<expr>']) # chances of choosing a `BinOp` as a child
* len(python_ast_grammar['<operator>']) # chances of choosing a `+`
/ 2) # two chances - one for the left child, one for the right
19012.5
On average, it would take about 19000 (non-evolutionary) runs until we have an expression that triggers the distributive law. So it is definitely better to make use of additional information (say, coverage) in order to guide mutations towards a goal.
This chapter provides a PythonFuzzer
class that allows producing arbitrary Python code elements:
fuzzer = PythonFuzzer()
print(fuzzer.fuzz())
def R(): break
By default, PythonFuzzer
produces a function definition – that is, a list of statements as above.
You can pass a start_symbol
argument to state which Python element you'd like to have:
fuzzer = PythonFuzzer('<While>')
print(fuzzer.fuzz())
while {set()[set():set():set()]}: C = set() D @= set() break else: return
Here is a list of all possible start symbols. Their names reflect the nonterminals from the Python ast
module documentation.
sorted(list(PYTHON_AST_GRAMMAR.keys()))
['<Assert>', '<Assign>', '<Attribute>', '<AugAssign>', '<BinOp>', '<BoolOp>', '<Break>', '<Call>', '<Compare>', '<Constant>', '<Continue>', '<Delete>', '<Dict>', '<EmptySet>', '<Expr>', '<For>', '<FunctionDef>', '<If>', '<List>', '<Module>', '<Name>', '<Pass>', '<Return>', '<Set>', '<Slice>', '<Starred>', '<Subscript>', '<Tuple>', '<UnaryOp>', '<While>', '<With>', '<arg>', '<arg_list>', '<args>', '<args_param>', '<arguments>', '<bool>', '<boolop>', '<cmpop>', '<cmpop_list>', '<cmpops>', '<decorator_list_param>', '<defaults_param>', '<digit>', '<digits>', '<expr>', '<expr_list>', '<exprs>', '<float>', '<func>', '<id>', '<id_continue>', '<id_start>', '<identifier>', '<integer>', '<keyword>', '<keyword_list>', '<keywords>', '<keywords_param>', '<kw_defaults_param>', '<kwarg>', '<kwonlyargs_param>', '<lhs_Attribute>', '<lhs_List>', '<lhs_Name>', '<lhs_Starred>', '<lhs_Subscript>', '<lhs_Tuple>', '<lhs_expr>', '<lhs_exprs>', '<literal>', '<mod>', '<none>', '<nonempty_expr_list>', '<nonempty_lhs_expr_list>', '<nonempty_stmt_list>', '<nonzerodigit>', '<not_double_quotes>', '<not_single_quotes>', '<operator>', '<orelse_param>', '<posonlyargs_param>', '<returns>', '<start>', '<stmt>', '<stmt_list>', '<stmts>', '<string>', '<type_comment>', '<type_ignore>', '<type_ignore_list>', '<type_ignore_param>', '<type_ignores>', '<unaryop>', '<vararg>', '<withitem>', '<withitem_list>', '<withitems>']
If you'd like more control over Python code generation, here is what is happening behind the scenes.
The EBNF grammar PYTHON_AST_GRAMMAR
can parse and produce abstract syntax trees for Python.
To produce a Python module without PythonFuzzer
, you would take these steps:
Step 1: Create a non-EBNF grammar suitable for ISLaSolver
(or any other grammar fuzzer):
python_ast_grammar = convert_ebnf_grammar(PYTHON_AST_GRAMMAR)
Step 2: Feed the resulting grammar into a grammar fuzzer such as ISLa:
solver = ISLaSolver(python_ast_grammar, start_symbol='<FunctionDef>')
Step 3: Have the grammar fuzzer produce a string. This string represents an AST.
ast_string = str(solver.solve())
ast_string
'FunctionDef(name=\'y\', args=arguments(posonlyargs=[], args=[], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[Return()], decorator_list=[Call(func=Name(id="set", ctx=Load()), args=[], keywords=[])])'
Step 4: Convert the AST into an actual Python AST data structure.
abstract_syntax_tree = eval(ast_string)
Step 5: Finally, convert the AST structure back into readable Python code:
ast.fix_missing_locations(abstract_syntax_tree)
print(ast.unparse(abstract_syntax_tree))
@set() def y(): return
The chapter has many more applications, including parsing and mutating Python code, evolutionary fuzzing, and more.
Here are the details on the PythonFuzzer
constructor:
# ignore
import inspect
import markdown
from bookutils import HTML
# ignore
sig = inspect.signature(PythonFuzzer.__init__)
sig_str = str(sig) if sig else ""
doc = inspect.getdoc(PythonFuzzer.__init__) or ""
HTML(markdown.markdown('`PythonFuzzer' + sig_str + '`\n\n' + doc))
PythonFuzzer(self, start_symbol: Optional[str] = None, *, grammar: Optional[Dict[str, List[Union[str, Tuple[str, Dict[str, Any]]]]]] = None, constraint: Optional[str] = None, **kw_params) -> None
Produce Python code. Parameters are:
start_symbol
: The grammatical entity to be generated (default: <FunctionDef>
)grammar
: The EBNF grammar to be used (default: PYTHON__AST_GRAMMAR
); andconstraint
an ISLa constraint (if any).Additional keyword parameters are passed to the ISLaSolver
superclass.
# ignore
from ClassDiagram import display_class_hierarchy
# ignore
display_class_hierarchy([PythonFuzzer],
public_methods=[
PythonFuzzer.__init__,
PythonFuzzer.fuzz,
ISLaSolver.__init__
],
project='fuzzingbook')
The seminal work on compiler testing is Csmith \cite{Yang2011}, a generator of C programs. Csmith has been used to thoroughly test compilers such as Clang or GCC; beyond producing code that is syntactically correct, it also aims at semantic correctness as well as avoiding undefined and unspecified behaviors. This is a must read for anyone in the field in compiler testing.