This is the manuscript of Andreas Zeller's keynote "Coding Effective Testing Tools Within Minutes" at the TAIC PART 2020 conference.
In our Fuzzing Book, we use Python to implement automated testing techniques, and also as the language for most of our test subjects. Why Python? The short answer is
Python made us amazingly productive. Most techniques in this book took 2-3 days to implement. This is about 10-20 times faster than for "classic" languages like C or Java.
A factor of 10–20 in productivity is enormous, almost ridiculous. Why is that so, and which consequences does this have for research and teaching?
In this essay, we will explore some reasons, prototyping a symbolic test generator from scratch. This normally would be considered a very difficult task, taking months to build. Yet, developing the code in this chapter took less than two hours – and explaining it takes less than 20 minutes.
Python is a high-level language that allows one to focus on the actual algorithms rather than how individual bits and bytes are passed around in memory. For this book, this is important: We want to focus on how individual techniques work, and not so much their optimization. Focusing on algorithms allows you to toy and tinker with them, and quickly develop your own. Once you have found out how to do things, you can still port your approach to some other language or specialized setting.
As an example, take the (in)famous triangle program, which classifies a triangle of lengths $a$, $b$, $c$ into one of three categories. It reads like pseudocode; yet, we can easily execute it.
def triangle(a, b, c):
if a == b:
if b == c:
return 'equilateral'
else:
return 'isosceles #1'
else:
if b == c:
return 'isosceles #2'
else:
if a == c:
return 'isosceles #3'
else:
return 'scalene'
Here's an example of executing the triangle()
function:
triangle(2, 3, 4)
'scalene'
For the remainder of this chapter, we will use the triangle()
function as ongoing example for a program to be tested. Of course, the complexity of triangle()
is a far cry from large systems, and what we show in this chapter will not apply to, say, an ecosystem of thousands of intertwined microservices. Its point, however, is to show how easy certain techniques can be – if you have the right language and environment.
If you want to test triangle()
with random values, that's fairly easy to do. Just bring along one of the Python random number generators and throw them into triangle()
.
for i in range(10):
a = randrange(1, 10)
b = randrange(1, 10)
c = randrange(1, 10)
t = triangle(a, b, c)
print(f"triangle({a}, {b}, {c}) = {repr(t)}")
triangle(9, 1, 1) = 'isosceles #2' triangle(3, 8, 2) = 'scalene' triangle(6, 1, 3) = 'scalene' triangle(3, 9, 7) = 'scalene' triangle(5, 9, 8) = 'scalene' triangle(8, 6, 1) = 'scalene' triangle(2, 7, 5) = 'scalene' triangle(9, 6, 3) = 'scalene' triangle(1, 1, 2) = 'isosceles #1' triangle(9, 2, 4) = 'scalene'
So far, so good – but that's something you can do in pretty much any programming language. What is it that makes Python special?
Dynamic analysis is the ability to track what is happening during program execution. The Python settrace()
mechanism allows you to track all code lines, all variables, all values, as the program executes – and all this in a handful of lines of code. Our Coverage
class from the chapter on coverage shows how to capture a trace of all lines executed in five lines of code; such a trace easily converts into sets of lines or branches executed. With two more lines, you can easily track all functions, arguments, variable values, too – see for instance our chapter on dynamic invariants. And you can even access the source code of individual functions (and print it out, too!) All this takes 10, maybe 20 minutes to implement.
Here is a piece of Python that does it all. We track lines executed, and for every line, we print its source codes and the current values of all local variables:
def traceit(frame, event, arg):
function_code = frame.f_code
function_name = function_code.co_name
lineno = frame.f_lineno
vars = frame.f_locals
source_lines, starting_line_no = inspect.getsourcelines(frame.f_code)
loc = f"{function_name}:{lineno} {source_lines[lineno - starting_line_no].rstrip()}"
vars = ", ".join(f"{name} = {vars[name]}" for name in vars)
print(f"{loc:50} ({vars})")
return traceit
The function sys.settrace()
registers traceit()
as a trace function; it will then trace the given invocation of triangle()
:
def triangle_traced():
sys.settrace(traceit)
triangle(2, 2, 1)
sys.settrace(None)
triangle_traced()
triangle:1 def triangle(a, b, c): (a = 2, b = 2, c = 1) triangle:2 if a == b: (a = 2, b = 2, c = 1) triangle:3 if b == c: (a = 2, b = 2, c = 1) triangle:6 return 'isosceles #1' (a = 2, b = 2, c = 1) triangle:6 return 'isosceles #1' (a = 2, b = 2, c = 1)
In comparison, try to build such a dynamic analysis for, say, C. You can either instrument the code to track all lines executed and record variable values, storing the resulting info in some database. This will take you weeks, if not months to implement. You can also run your code through a debugger (step-print-step-print-step-print); but again, programming the interaction can take days. And once you have the first results, you'll probably realize you need something else or better, so you go back to the drawing board. Not fun.
Together with a dynamic analysis such as the one above, you can make fuzzing much smarter. Search-based testing, for instance, evolves a population of inputs towards a particular goal, such as coverage. With a good dynamic analysis, you can quickly implement search-based strategies for arbitrary goals.
Static analysis refers to the ability to analyze program code without actually executing it. Statically analyzing Python code to deduce any property can be a nightmare, because the language is so highly dynamic. (More on that below.)
If your static analysis does not have to be sound, – for instance, because you only use it to support and guide another technique such as testing – then a static analysis in Python can be very simple. The ast
module allows you to turn any Python function into an abstract syntax tree (AST), which you then can traverse as you like. Here's the AST for our triangle()
function:
if rich_output():
# Normally, this will do
from showast import show_ast
else:
def show_ast(tree):
ast.dump(tree, indent=4)
triangle_source = inspect.getsource(triangle)
triangle_ast = ast.parse(triangle_source)
show_ast(triangle_ast)
Now suppose one wants to identify all triangle
branches and their conditions using static analysis. You would traverse the AST, searching for If
nodes, and take their first child (the condition). This is easy as well:
def collect_conditions(tree):
conditions = []
def traverse(node):
if isinstance(node, ast.If):
cond = ast.unparse(node.test).strip()
conditions.append(cond)
for child in ast.iter_child_nodes(node):
traverse(child)
traverse(tree)
return conditions
Here are the four if
conditions occurring in the triangle()
code:
collect_conditions(triangle_ast)
['a == b', 'b == c', 'b == c', 'a == c']
Not only can we extract individual program elements, we can also change them at will and convert the tree back into source code. Program transformations (say, for instrumentation or mutation analysis) are a breeze. The above code took five minutes to write. Again, try that in Java or C.
Let's get back to testing. We have shown how to extract conditions from code. To reach a particular location in the triangle()
function, one needs to find a solution for the path conditions leading to that branch. To reach the last line in triangle()
(the 'scalene'
branch), we have to find a solution for
$$a \ne b \land b \ne c \land a \ne c$$
We can make use of a constraint solver for this, such as Microsoft's Z3 solver:
Let us use Z3 to find a solution for the 'scalene'
branch condition:
a = z3.Int('a')
b = z3.Int('b')
c = z3.Int('c')
s = z3.Solver()
s.add(z3.And(a > 0, b > 0, c > 0)) # Triangle edges are positive
s.add(z3.And(a != b, b != c, a != c)) # Our condition
s.check()
Z3 has shown us that there is a solution ("sat" = "satisfiable"). Let us get one:
m = s.model()
m
We can use this solution right away for testing the triangle()
function and find that it indeed covers the 'scalene'
branch. The method as_long()
converts the Z3 results into numerical values.
triangle(m[a].as_long(), m[b].as_long(), m[c].as_long())
'scalene'
With what we have seen, we can now build a symbolic test generator – a tool that attempts to systematically create test inputs that cover all paths. Let us find all conditions we need to solve, by exploring all paths in the tree. We turn these paths to Z3 format right away:
def collect_path_conditions(tree):
paths = []
def traverse_if_children(children, context, cond):
old_paths = len(paths)
for child in children:
traverse(child, context + [cond])
if len(paths) == old_paths:
paths.append(context + [cond])
def traverse(node, context):
if isinstance(node, ast.If):
cond = ast.unparse(node.test).strip()
not_cond = "z3.Not(" + cond + ")"
traverse_if_children(node.body, context, cond)
traverse_if_children(node.orelse, context, not_cond)
else:
for child in ast.iter_child_nodes(node):
traverse(child, context)
traverse(tree, [])
return ["z3.And(" + ", ".join(path) + ")" for path in paths]
path_conditions = collect_path_conditions(triangle_ast)
path_conditions
['z3.And(a == b, b == c)', 'z3.And(a == b, z3.Not(b == c))', 'z3.And(z3.Not(a == b), b == c)', 'z3.And(z3.Not(a == b), z3.Not(b == c), a == c)', 'z3.And(z3.Not(a == b), z3.Not(b == c), z3.Not(a == c))']
Now all we need to do is to feed these constraints into Z3. We see that we easily cover all branches:
for path_condition in path_conditions:
s = z3.Solver()
s.add(a > 0, b > 0, c > 0)
eval(f"s.check({path_condition})")
m = s.model()
print(m, triangle(m[a].as_long(), m[b].as_long(), m[c].as_long()))
[a = 1, c = 1, b = 1] equilateral [c = 2, a = 1, b = 1] isosceles #1 [c = 2, a = 1, b = 2] isosceles #2 [c = 1, a = 1, b = 2] isosceles #3 [c = 3, a = 1, b = 2] scalene
Success! We have covered all branches of the triangle program!
Now, the above is still very limited – and tailored to the capabilities of the triangle()
code. A full implementation would actually
Some of these may not be supported by the Z3 theories.
To make it easier for a constraint solver to find solutions, you could also provide concrete values observed from earlier executions that already are known to reach specific paths in the program. Such concrete values would be gathered from the tracing mechanisms above, and boom: you would have a pretty powerful and scalable concolic (concrete-symbolic) test generator.
Now, the above might take you a day or two, and as you expand your test generator beyond triangle()
, you will add more and more features. The nice part is that every of these features you will invent might actually be a research contribution – something nobody has thought of before. Whatever idea you might have: you can quickly implement it and try it out in a prototype. And again, this will be orders of magnitude faster than for conventional languages.
Python has a reputation for being hard to analyze statically, and this is true; its dynamic nature makes it hard for traditional static analysis to exclude specific behaviors.
We see Python as a great language for prototyping automated testing and dynamic analysis techniques, and as a good language to illustrate lightweight static and symbolic analysis techniques that would be used to guide and support other techniques (say, generating software tests).
But if you want to prove specific properties (or the absence thereof) by static analysis of code only, Python is a challenge, to say the least; and there are areas for which we would definitely warn against using it.
Using Python to demonstrate static type checking will be suboptimal (to say the least) because, well, Python programs typically do not come with type annotations. You can, of course, annotate variables with types, as we assume in the chapter on Symbolic Fuzzing:
def typed_triangle(a: int, b: int, c: int) -> str:
return triangle(a, b, c)
Most real-world Python code will not be annotated with types, though. While you can also retrofit them, as discussed in our chapter on dynamic invariants, Python simply is not a good domain to illustrate type checking. If you want to show the beauty and usefulness of type checking, use a strongly typed language like Java, ML, or Haskell.
Python is a highly dynamic language in which you can change anything at runtime. It is no problem assigning different types to a variable, as in
x = 42
x = "a string" # type: ignore
or change the existence (and scope) of a variable depending on some runtime condition:
p1, p2 = True, False
if p1:
x = 42
if p2:
del x
# Does x exist at this point?
Such properties make symbolic reasoning on code (including static analysis and type checking) much harder, if not outright impossible. If you need lightweight static and symbolic analysis techniques to guide other techniques (say, test generation), then imprecision may not hurt much. But if you want to derive guarantees from your code, do not use Python as test subject; again, strongly statically typed languages like Java/ML/Haskell (or some very restricted toy language) are much better grounds for experimentation.
This does not mean that languages like Python should not be statically checked. On the contrary, the widespread usage of Python calls loudly for better static checking tools. But if you want to teach or research static and symbolic techniques, we definitely would not use Python as our language of choice.
One neat thing about prototyping (with Python or whatever) is that it allows you to fully focus on your approach, rather than on the infrastructure. Very obviously, this is useful for teaching – you can use examples as the ones above in a lecture to very quickly communicate essential techniques of program analysis and test generation.
But prototyping has more advantages. A Jupyter Notebook (like this one) documents how you developed your approach, together with examples, experiments, and rationales – and still focusing on the essentials. If you write a tool the "classical" way, you will eventually deliver thousands of lines of code that do everything under the sun, but only once you have implemented everything will you know whether things actually work. This is a huge risk, and if you still have to change things, you will have to refactor things again and again. Furthermore, for anyone who will work on that code later, it will take days, if not weeks, to re-extract the basic idea of the approach, as it will be buried under loads and loads of infrastructure and refactorings.
Our consequence at this point is that we now implement new ideas twice:
First, we implement things as a notebook (as this one), experimenting with various approaches and parameters until we get them right.
Only once we have the approach right, and if we have confidence that it works, we reimplement it in a tool that works on large scale programs. This can still take weeks to months, but at least we know we are on a good path.
Incidentally, it may well be that the original notebooks will have a longer life, as they are simpler, better documented, and capture the gist of our novel idea. And this is how several of the notebooks in this book came to be.
All the code examples above can be run by you – and changed as you like! From the Web page, the easiest way is to go to "Resources $\rightarrow$ Edit as Notebook", and you can experiment with the original Jupyter Notebook right within your browser. (Use Shift
+ Return
to execute code.)
From the "Resources" menu, you can also download the Python code (.py
) to run it within a Python environment, or download the notebook (.ipynb
) to run it within Jupyter – and again, change them as you like. If you want to run this code on your own machine, you will need the following packages:
pip install showast
pip install z3-solver
Enjoy!
Python is a great language for prototyping testing and debugging tools:
Python is not recommended as a domain for pure symbolic code analysis, though.
However, even a potentially unsound symbolic analysis can still guide test generation – and this again is very easy to build.
Jupyter Notebooks (using Python or other languages) are great for prototyping:
If you want to see more examples of us using Python for prototyping – have a look at this book! Specifically,
There's lots to learn – enjoy the read!
The triangle problem is adapted from "The Art of Software Testing" by Myers and Sandler \cite{Myers2004}. It is an allegedly simple problem but which reveals a surprising depth when you think about all the things that might go wrong.
The Z3 solver we use in this chapter was developed at Microsoft Research under the lead of Leonardo de Moura and Nikolaj Bjørner \cite{z3}. It is one of the most powerful and most popular solvers.
Our path collector is still very limited. Things that do not work include
a and b
need to be translated to Z3 syntax z3.And(a, b)
.if A: return
, the condition not A
must hold for the following statements.The more of these you implement, the closer you will get to a full-fledged symbolic test generator for Python. But at some point, your prototype may not be a prototype anymore, and then, Python may no longer be the best language to use. Find a good moment when it is time to switch from a prototypical to a production tool.