This book is massive. With more than 20,000 lines of code and 150,000 words of text, a printed version would cover more than 1,200 pages of text. Obviously, we do not assume that everybody wants to read everything.
While the chapters of this book can be read one after the other, there are many possible paths through the book. In this graph, an arrow $A \rightarrow B$ means that chapter $A$ is a prerequisite for chapter $B$. You can pick arbitrary paths in this graph to get to the topics that interest you most:
# ignore
from bookutils import rich_output, InteractiveSVG
# ignore
sitemap = None
if rich_output():
sitemap = InteractiveSVG(filename='PICS/Sitemap.svg')
sitemap
But since even this map can be overwhelming, here are a few tours to get you started. Each of these tours allows you to focus on a particular view, depending on whether you are a programmer, student, or researcher.
You have a program to test. You want to generate tests as quickly as possible and as thoroughly as possible. You don't care so much how something is implemented, but it should get the job done. You want to learn how to use things.
Start with Introduction to Testing to get the basic concepts. (You would know most of these anyway, but it can't hurt to get quick reminders).
Use the simple fuzzers from the chapter on Fuzzers to test your program against the first random inputs.
Get coverage from your program and use coverage information to guide test generation towards code coverage.
Define an input grammar for your program and use this grammar to thoroughly fuzz your program with syntactically correct inputs. As fuzzer, we would recommend a grammar coverage fuzzer, as this ensures coverage of input elements.
If you want more control over the generated inputs, consider probabilistic fuzzing and fuzzing with generator functions.
If you want to deploy a large set of fuzzers, learn how to manage a large set of fuzzers.
In each of these chapters, start with the "Synopsis" parts; these will give you quick introductions on how to use things, as well as point you to relevant usage examples. With this, enough said. Get back to work and enjoy!
These tours are how the book is organized. Having gone through the Introduction to Testing for the basic concepts, you can read your way through these parts:
The lexical tour focuses on lexical test generation techniques, i.e. techniques that compose an input character by character and byte by byte. Very fast and robust techniques with a minimum of bias.
The syntactical tour focuses on grammars as a means to specify the syntax of inputs. The resulting test generators produce syntactically correct inputs, making tests much faster, and provide lots of control mechanisms for the tester.
The semantic tour makes use of code semantics to shape and guide test generation. Advanced techniques include extracting input grammars, mining function specifications, and symbolic constraint solving to cover as many code paths as possible.
The application tour applies the techniques defined in the earlier parts on domains such as Web servers, user interfaces, APIs, or configurations.
The management tour finally focuses on how to handle and organize large sets of test generators, and when to stop fuzzing.
Most of these chapters start with a "Synopsis" section that explains how to use the most important concepts. You can choose whether you want a "usage" perspective (then just read the synopsis) or an "understanding" perspective (then read on).
You are a student of computer science and/or software engineering. You want to know basics of testing and related fields. Beyond just using techniques, you want to dig deeper into algorithms and implementations. We have the following recommendation for you:
Start with Introduction to Testing and Coverage to get the basic concepts. (You may know some of these already, but hey, you're a student, right?)
Learn how simple fuzzers work from the chapter on Fuzzers. This already gives you tools that took down 30% of UNIX utilities in the 90s. What happens if you test some tool that has never been fuzzed before?
Mutation-based fuzzing is pretty much the standard in fuzzing today: Take a set of seeds, and mutate them until we find a bug.
Learn how grammars can be used to generate syntactically correct inputs. This makes test generation much more efficient, but you have to write (or mine) a grammar in the first place.
Learn how to fuzz APIs and graphical user interfaces. Both of these are important domains for software test generation.
Learn how to reduce failure-inducing inputs to a minimum automatically. This is a great time saver for debugging, especially in conjunction with automated testing.
For all these chapters, experiment with the implementations to understand their concepts. Feel free to experiment as you wish.
If you are a teacher, the above chapters can be useful in programming and/or software engineering courses. Make use of slides and/or live programming, and have students work on exercises.
On top of the "Undergraduate" tour, you want to get deeper into test generation techniques, including techniques that are more demanding.
Search-based testing allows you to guide test generation towards specific goals, such as code coverage. Robust and efficient.
Get an introduction to configuration testing. How does one test and cover a system that comes with multiple configuration options?
Mutation analysis seeds synthetic defects (mutations) into program code to check whether the tests find them. If the tests do not find mutations, they likely won't find real bugs either.
Learn how to parse inputs using grammars. If you want to analyze, decompose, mutate existing inputs, you need a parser for that.
Concolic and symbolic fuzzing solve constraints along program paths to reach code that is hard to test. Used wherever reliability is paramount; also a hot research topic.
Learn how to estimate when to stop fuzzing. There has to be a stop at some point, right?
For all these chapters, experiment with the code; feel free to create your own variations and extensions. This is how we get to research!
If you are a teacher, the above chapters can be useful in advanced courses on software engineering and testing. Again, you can make use of slides and/or live programming, and have students work on exercises.
This tour focuses on black-box fuzzing – that is, techniques that work without feedback from the program under test. Have a look at
Basic fuzzing. This already gives you tools that took down 30% of UNIX utilities in the 90s. What happens if you test some tool that has never been fuzzed before?
Syntactical fuzzing focuses on grammars as a means to specify the syntax of inputs. The resulting test generators produce syntactically correct inputs, making tests much faster, and provide lots of control mechanisms for the tester.
Semantic fuzzing attaches constraints to grammars, making inputs not only syntactically valid, but also semantically valid - and empowering you to shape test inputs just like you want them,
Domain-specific fuzzing showing a number of applications of these techniques, from configurations to graphical user interfaces.
If you want to deploy a large set of fuzzers, learn how to manage a large set of fuzzers.
This tour focuses on white-box fuzzing – that is, techniques that leverage feedback from the program under test. Have a look at
Coverage to get the basic concepts of coverage and how to measure it for Python.
Mutation-based fuzzing is pretty much the standard in fuzzing today: Take a set of seeds, and mutate them until we find a bug.
Greybox fuzzing with algorithms from the popular American Fuzzy Lop (AFL) fuzzer.
Information Flow and Concolic Fuzzing showing how to capture information flow in Python programs and how to leverage it to produce more intelligent test cases.
Symbolic Fuzzing, reasoning about the behavior of a program without executing it.
On top of the "Graduate" tour, you are looking for techniques that are somewhere between lab stage and widespread usage – in particular, techniques where there is still room for lots of improvement. If you look for research ideas, go for these topics.
Mining function specifications is a hot topic in research: Given a function, how can we infer an abstract model that describes its behavior? The conjunction with test generation offers several opportunities here, in particular for dynamic specification mining.
Mining input grammars promises to join the robustness and ease of use of lexical fuzzing with the efficiency and speed of syntactical fuzzing. The idea is to mine an input grammar from a program automatically, which then serves as base for syntactical fuzzing. Still in an early stage, but lots of potential.
Probabilistic grammar fuzzing gives programmers lots of control over which elements should be generated. Plenty of research possibilities at the intersection of probabilistic fuzzing and mining data from given tests, as sketched in this chapter.
Fuzzing with generators and Fuzzing with constraints gives programmers the ultimate control over input generation, namely by allowing them to define their own generator functions or to define their own input constraints. The big challenge is: How can one best exploit the power of syntactic descriptions with a minimum of contextual constraints?
Carving unit tests brings the promise of speeding up test execution (and generation) dramatically, by extracting unit tests from program executions that replay only individual function calls (possibly with new, generated arguments). In Python, carving is simple to realize; here's plenty of potential to toy with.
Testing web servers and GUIs is a hot research field, fueled by the need of practitioners to test and secure their interfaces (and the need of other practitioners to break through these interfaces). Again, there's still lots of unexplored potential here.
Greybox fuzzing and greybox fuzzing with grammars bring in statistical estimators to guide test generation towards inputs and input properties that are most likely to discover new bugs. The intersection of testing, program analysis, and statistics offers lots of possibilities for future research.
For all these topics, having Python source available that implements and demonstrates the concepts is a major asset. You can easily extend the implementations with your own ideas and run evaluations right in a notebook. Once your approach is stable, consider porting it to a language with a wider range of available subjects (such as C, for example).
This is the ultimate tour – you have learned everything there is and want to contribute to the book. Then, you should read two more chapters:
The guide for authors gives an introduction on how to contribute to this book (coding styles, writing styles, conventions, and more).
The template chapter serves as a blueprint for your chapter.
If you want to contribute, feel free to contact us – preferably before writing, but after writing is fine just as well. We will be happy to incorporate your material.