Most randomly generated inputs are syntactically invalid and thus are quickly rejected by the processing program. To exercise functionality beyond input processing, we must increase chances to obtain valid inputs. One such way is so-called mutational fuzzing – that is, introducing small changes to existing inputs that may still keep the input valid, yet exercise new behavior. We show how to create such mutations, and how to guide them towards yet uncovered code, applying central concepts from the popular AFL fuzzer.
On November 2013, the first version of American Fuzzy Lop (AFL) was released. Since then, AFL has become one of the most successful fuzzing tools and comes in many flavors, e.g., AFLFast, AFLGo, and AFLSmart (which are discussed in this book). AFL has made fuzzing a popular choice for automated vulnerability detection. It was the first to demonstrate that vulnerabilities can be detected automatically at a large scale in many security-critical, real-world applications.
In this chapter, we are going to introduce the basics of mutational fuzz testing; the next chapter will then further show how to direct fuzzing towards specific code goals.
Many programs expect their inputs to come in a very specific format before they would actually process them. As an example, think of a program that accepts a URL (a Web address). The URL has to be in a valid format (i.e., the URL format) such that the program can deal with it. When fuzzing with random inputs, what are our chances to actually produce a valid URL?
To get deeper into the problem, let us explore what URLs are made of. A URL consists of a number of elements:
scheme://netloc/path?query#fragment
where
scheme
is the protocol to be used, including http
, https
, ftp
, file
...netloc
is the name of the host to connect to, such as www.google.com
path
is the path on that very host, such as search
query
is a list of key/value pairs, such as q=fuzzing
fragment
is a marker for a location in the retrieved document, such as #result
In Python, we can use the urlparse()
function to parse and decompose a URL into its parts.
from urllib.parse import urlparse
urlparse("http://www.google.com/search?q=fuzzing")
ParseResult(scheme='http', netloc='www.google.com', path='/search', params='', query='q=fuzzing', fragment='')
We see how the result encodes the individual parts of the URL in different attributes.
Let us now assume we have a program that takes a URL as input. To simplify things, we won't let it do very much; we simply have it check the passed URL for validity. If the URL is valid, it returns True; otherwise, it raises an exception.
def http_program(url: str) -> bool:
supported_schemes = ["http", "https"]
result = urlparse(url)
if result.scheme not in supported_schemes:
raise ValueError("Scheme must be one of " +
repr(supported_schemes))
if result.netloc == '':
raise ValueError("Host must be non-empty")
# Do something with the URL
return True
Let us now go and fuzz http_program()
. To fuzz, we use the full range of printable ASCII characters, such that :
, /
, and lowercase letters are included.
fuzzer(char_start=32, char_range=96)
'"N&+slk%h\x7fyp5o\'@[3(rW*M5W]tMFPU4\\P@tz%[X?uo\\1?b4T;1bDeYtHx #UJ5w}pMmPodJM,_'
Let's try to fuzz with 1000 random inputs and see whether we have some success.
for i in range(1000):
try:
url = fuzzer()
result = http_program(url)
print("Success!")
except ValueError:
pass
What are the chances of actually getting a valid URL? We need our string to start with "http://"
or "https://"
. Let's take the "http://"
case first. These are seven very specific characters we need to start with. The chance of producing these seven characters randomly (with a character range of 96 different characters) is $1 : 96^7$, or
96 ** 7
75144747810816
The odds of producing a "https://"
prefix are even worse, at $1 : 96^8$:
96 ** 8
7213895789838336
which gives us a total chance of
likelihood = 1 / (96 ** 7) + 1 / (96 ** 8)
likelihood
1.344627131107667e-14
And this is the number of runs (on average) we'd need to produce a valid URL scheme:
1 / likelihood
74370059689055.02
Let's measure how long one run of http_program()
takes:
trials = 1000
with Timer() as t:
for i in range(trials):
try:
url = fuzzer()
result = http_program(url)
print("Success!")
except ValueError:
pass
duration_per_run_in_seconds = t.elapsed_time() / trials
duration_per_run_in_seconds
2.6061250013299285e-05
That's pretty fast, isn't it? Unfortunately, we have a lot of runs to cover.
seconds_until_success = duration_per_run_in_seconds * (1 / likelihood)
seconds_until_success
1938176719.0604537
which translates into
hours_until_success = seconds_until_success / 3600
days_until_success = hours_until_success / 24
years_until_success = days_until_success / 365.25
years_until_success
61.41711407269417
Even if we parallelize things a lot, we're still in for months to years of waiting. And that's for getting one successful run that will get deeper into http_program()
.
What basic fuzzing will do well is to test urlparse()
, and if there is an error in this parsing function, it has good chances of uncovering it. But as long as we cannot produce a valid input, we are out of luck in reaching any deeper functionality.
The alternative to generating random strings from scratch is to start with a given valid input, and then to subsequently mutate it. A mutation in this context is a simple string manipulation - say, inserting a (random) character, deleting a character, or flipping a bit in a character representation. This is called mutational fuzzing – in contrast to the generational fuzzing techniques discussed earlier.
Here are some mutations to get you started:
def delete_random_character(s: str) -> str:
"""Returns s with a random character deleted"""
if s == "":
return s
pos = random.randint(0, len(s) - 1)
# print("Deleting", repr(s[pos]), "at", pos)
return s[:pos] + s[pos + 1:]
seed_input = "A quick brown fox"
for i in range(10):
x = delete_random_character(seed_input)
print(repr(x))
'A uick brown fox' 'A quic brown fox' 'A quick brown fo' 'A quic brown fox' 'A quick bown fox' 'A quick bown fox' 'A quick brown fx' 'A quick brown ox' 'A quick brow fox' 'A quic brown fox'
def insert_random_character(s: str) -> str:
"""Returns s with a random character inserted"""
pos = random.randint(0, len(s))
random_character = chr(random.randrange(32, 127))
# print("Inserting", repr(random_character), "at", pos)
return s[:pos] + random_character + s[pos:]
for i in range(10):
print(repr(insert_random_character(seed_input)))
'A quick brvown fox' 'A quwick brown fox' 'A qBuick brown fox' 'A quick broSwn fox' 'A quick brown fvox' 'A quick brown 3fox' 'A quick brNown fox' 'A quick brow4n fox' 'A quick brown fox8' 'A equick brown fox'
def flip_random_character(s):
"""Returns s with a random bit flipped in a random position"""
if s == "":
return s
pos = random.randint(0, len(s) - 1)
c = s[pos]
bit = 1 << random.randint(0, 6)
new_c = chr(ord(c) ^ bit)
# print("Flipping", bit, "in", repr(c) + ", giving", repr(new_c))
return s[:pos] + new_c + s[pos + 1:]
for i in range(10):
print(repr(flip_random_character(seed_input)))
'A quick bRown fox' 'A quici brown fox' 'A"quick brown fox' 'A quick brown$fox' 'A quick bpown fox' 'A quick brown!fox' 'A 1uick brown fox' '@ quick brown fox' 'A quic+ brown fox' 'A quick bsown fox'
Let us now create a random mutator that randomly chooses which mutation to apply:
def mutate(s: str) -> str:
"""Return s with a random mutation applied"""
mutators = [
delete_random_character,
insert_random_character,
flip_random_character
]
mutator = random.choice(mutators)
# print(mutator)
return mutator(s)
for i in range(10):
print(repr(mutate("A quick brown fox")))
'A qzuick brown fox' ' quick brown fox' 'A quick Brown fox' 'A qMuick brown fox' 'A qu_ick brown fox' 'A quick bXrown fox' 'A quick brown fx' 'A quick!brown fox' 'A! quick brown fox' 'A quick brownfox'
The idea is now that if we have some valid input(s) to begin with, we may create more input candidates by applying one of the above mutations. To see how this works, let's get back to URLs.
Let us now get back to our URL parsing problem. Let us create a function is_valid_url()
that checks whether http_program()
accepts the input.
def is_valid_url(url: str) -> bool:
try:
result = http_program(url)
return True
except ValueError:
return False
assert is_valid_url("http://www.google.com/search?q=fuzzing")
assert not is_valid_url("xyzzy")
Let us now apply the mutate()
function on a given URL and see how many valid inputs we obtain.
seed_input = "http://www.google.com/search?q=fuzzing"
valid_inputs = set()
trials = 20
for i in range(trials):
inp = mutate(seed_input)
if is_valid_url(inp):
valid_inputs.add(inp)
We can now observe that by mutating the original input, we get a high proportion of valid inputs:
len(valid_inputs) / trials
0.8
What are the odds of also producing a https:
prefix by mutating a http:
sample seed input? We have to insert ($1 : 3$) the right character 's'
($1 : 96$) into the correct position ($1 : l$), where $l$ is the length of our seed input. This means that on average, we need this many runs:
trials = 3 * 96 * len(seed_input)
trials
10944
We can actually afford this. Let's try:
trials = 0
with Timer() as t:
while True:
trials += 1
inp = mutate(seed_input)
if inp.startswith("https://"):
print(
"Success after",
trials,
"trials in",
t.elapsed_time(),
"seconds")
break
Success after 3656 trials in 0.005467624985612929 seconds
Of course, if we wanted to get, say, an "ftp://"
prefix, we would need more mutations and more runs – most important, though, we would need to apply multiple mutations.
So far, we have only applied one single mutation on a sample string. However, we can also apply multiple mutations, further changing it. What happens, for instance, if we apply, say, 20 mutations on our sample string?
seed_input = "http://www.google.com/search?q=fuzzing"
mutations = 50
inp = seed_input
for i in range(mutations):
if i % 5 == 0:
print(i, "mutations:", repr(inp))
inp = mutate(inp)
0 mutations: 'http://www.google.com/search?q=fuzzing' 5 mutations: 'http:/L/www.googlej.com/seaRchq=fuz:ing' 10 mutations: 'http:/L/www.ggoWglej.com/seaRchqfu:in' 15 mutations: 'http:/L/wwggoWglej.com/seaR3hqf,u:in' 20 mutations: 'htt://wwggoVgle"j.som/seaR3hqf,u:in' 25 mutations: 'htt://fwggoVgle"j.som/eaRd3hqf,u^:in' 30 mutations: 'htv://>fwggoVgle"j.qom/ea0Rd3hqf,u^:i' 35 mutations: 'htv://>fwggozVle"Bj.qom/eapRd[3hqf,u^:i' 40 mutations: 'htv://>fwgeo6zTle"Bj.\'qom/eapRd[3hqf,tu^:i' 45 mutations: 'htv://>fwgeo]6zTle"BjM.\'qom/eaR[3hqf,tu^:i'
As you see, the original seed input is hardly recognizable anymore. By mutating the input again and again, we get a higher variety in the input.
To implement such multiple mutations in a single package, let us introduce a MutationFuzzer
class. It takes a seed (a list of strings) as well as a minimum and a maximum number of mutations.
class MutationFuzzer(Fuzzer):
"""Base class for mutational fuzzing"""
def __init__(self, seed: List[str],
min_mutations: int = 2,
max_mutations: int = 10) -> None:
"""Constructor.
`seed` - a list of (input) strings to mutate.
`min_mutations` - the minimum number of mutations to apply.
`max_mutations` - the maximum number of mutations to apply.
"""
self.seed = seed
self.min_mutations = min_mutations
self.max_mutations = max_mutations
self.reset()
def reset(self) -> None:
"""Set population to initial seed.
To be overloaded in subclasses."""
self.population = self.seed
self.seed_index = 0
In the following, let us develop MutationFuzzer
further by adding more methods to it. The Python language requires us to define an entire class with all methods as a single, continuous unit; however, we would like to introduce one method after another. To avoid this problem, we use a special hack: Whenever we want to introduce a new method to some class C
, we use the construct
class C(C):
def new_method(self, args):
pass
This seems to define C
as a subclass of itself, which would make no sense – but actually, it introduces a new C
class as a subclass of the old C
class, and then shadowing the old C
definition. What this gets us is a C
class with new_method()
as a method, which is just what we want. (C
objects defined earlier will retain the earlier C
definition, though, and thus must be rebuilt.)
Using this hack, we can now add a mutate()
method that actually invokes the above mutate()
function. Having mutate()
as a method is useful when we want to extend a MutationFuzzer
later.
class MutationFuzzer(MutationFuzzer):
def mutate(self, inp: str) -> str:
return mutate(inp)
Let's get back to our strategy, maximizing diversity in coverage in our population. First, let us create a method create_candidate()
, which randomly picks some input from our current population (self.population
), and then applies between min_mutations
and max_mutations
mutation steps, returning the final result:
class MutationFuzzer(MutationFuzzer):
def create_candidate(self) -> str:
"""Create a new candidate by mutating a population member"""
candidate = random.choice(self.population)
trials = random.randint(self.min_mutations, self.max_mutations)
for i in range(trials):
candidate = self.mutate(candidate)
return candidate
The fuzz()
method is set to first pick the seeds; when these are gone, we mutate:
class MutationFuzzer(MutationFuzzer):
def fuzz(self) -> str:
if self.seed_index < len(self.seed):
# Still seeding
self.inp = self.seed[self.seed_index]
self.seed_index += 1
else:
# Mutating
self.inp = self.create_candidate()
return self.inp
Here is the fuzz()
method in action. With every new invocation of fuzz()
, we get another variant with multiple mutations applied.
seed_input = "http://www.google.com/search?q=fuzzing"
mutation_fuzzer = MutationFuzzer(seed=[seed_input])
mutation_fuzzer.fuzz()
'http://www.google.com/search?q=fuzzing'
mutation_fuzzer.fuzz()
'http://www.gogl9ecom/earch?qfuzzing'
mutation_fuzzer.fuzz()
'htotq:/www.googleom/yseach?q=fzzijg'
The higher variety in inputs, though, increases the risk of having an invalid input. The key to success lies in the idea of guiding these mutations – that is, keeping those that are especially valuable.
To cover as much functionality as possible, one can rely on either specified or implemented functionality, as discussed in the "Coverage" chapter. For now, we will not assume that there is a specification of program behavior (although it definitely would be good to have one!). We will assume, though, that the program to be tested exists – and that we can leverage its structure to guide test generation.
Since testing always executes the program at hand, one can always gather information about its execution – the least is the information needed to decide whether a test passes or fails. Since coverage is frequently measured as well to determine test quality, let us also assume we can retrieve coverage of a test run. The question is then: How can we leverage coverage to guide test generation?
One particularly successful idea is implemented in the popular fuzzer named American fuzzy lop, or AFL for short. Just like our examples above, AFL evolves test cases that have been successful – but for AFL, "success" means finding a new path through the program execution. This way, AFL can keep on mutating inputs that so far have found new paths; and if an input finds another path, it will be retained as well.
Let us build such a strategy. We start with introducing a Runner
class that captures the coverage for a given function. First, a FunctionRunner
class:
class FunctionRunner(Runner):
def __init__(self, function: Callable) -> None:
"""Initialize. `function` is a function to be executed"""
self.function = function
def run_function(self, inp: str) -> Any:
return self.function(inp)
def run(self, inp: str) -> Tuple[Any, str]:
try:
result = self.run_function(inp)
outcome = self.PASS
except Exception:
result = None
outcome = self.FAIL
return result, outcome
http_runner = FunctionRunner(http_program)
http_runner.run("https://foo.bar/")
(True, 'PASS')
We can now extend the FunctionRunner
class such that it also measures coverage. After invoking run()
, the coverage()
method returns the coverage achieved in the last run.
class FunctionCoverageRunner(FunctionRunner):
def run_function(self, inp: str) -> Any:
with Coverage() as cov:
try:
result = super().run_function(inp)
except Exception as exc:
self._coverage = cov.coverage()
raise exc
self._coverage = cov.coverage()
return result
def coverage(self) -> Set[Location]:
return self._coverage
http_runner = FunctionCoverageRunner(http_program)
http_runner.run("https://foo.bar/")
(True, 'PASS')
Here are the first five locations covered:
print(list(http_runner.coverage())[:5])
[('urlparse', 395), ('urlparse', 392), ('urlparse', 398), ('urlsplit', 458), ('urlsplit', 464)]
Now for the main class. We maintain the population and a set of coverages already achieved (coverages_seen
). The fuzz()
helper function takes an input and runs the given function()
on it. If its coverage is new (i.e. not in coverages_seen
), the input is added to population
and the coverage to coverages_seen
.
class MutationCoverageFuzzer(MutationFuzzer):
"""Fuzz with mutated inputs based on coverage"""
def reset(self) -> None:
super().reset()
self.coverages_seen: Set[frozenset] = set()
# Now empty; we fill this with seed in the first fuzz runs
self.population = []
def run(self, runner: FunctionCoverageRunner) -> Any: # type: ignore
"""Run function(inp) while tracking coverage.
If we reach new coverage,
add inp to population and its coverage to population_coverage
"""
result, outcome = super().run(runner)
new_coverage = frozenset(runner.coverage())
if outcome == Runner.PASS and new_coverage not in self.coverages_seen:
# We have new coverage
self.population.append(self.inp)
self.coverages_seen.add(new_coverage)
return result
Let us now put this to use:
seed_input = "http://www.google.com/search?q=fuzzing"
mutation_fuzzer = MutationCoverageFuzzer(seed=[seed_input])
mutation_fuzzer.runs(http_runner, trials=10000)
mutation_fuzzer.population
['http://www.google.com/search?q=fuzzing', 'http://www.goog.com/search;q=fuzzilng', 'http://ww.6goog\x0eoomosearch;/q=f}zzilng', 'http://uv.Lboo.comoseakrch;q=fuzilng', 'http://ww.6goog\x0eo/mosarch;/q=f}z{il~g', 'http://www.googme.com/sear#h?q=fuzzing', 'http://www.oogcom/sa3rchq=fuzlnv|', 'http://ww.6goog*./mosarch;/q=f}Zz{ilel~g', 'http://uv.Lboo.comoseakch;q=fuzilng', 'http://www.goom^e.2com/s?ear#h?q=fuzzing', 'http://hwww.coole.com+search?R=fuzzig', 'http://ww.6g7oog*./mosarch; #/q;f}Zz{ilel~gL', "http://ww.6'oog*R./mosarcx;/q=}Zz{ilel;~g", 'http://www.goofme.com/sear#h?q=fuzzi*yng', "http://sw.6'oog*R/msa'rcx;/qw?}Zz{ileRl;~g", "http://sw.6'oog*R/msa'rsx;/qw?}Zz{ileRUl;~g", "http://sw.6'oog*R/msa'rsx;qw?}Zz{ileRU;~g", 'http://wgw.gooBm^e.2com/s?&eir#h?q=]fuzzing', "http://sw.6'ooM*R/mDa'rsx;w?}Zz{ileU+~g", "http://sw.6L'ooM*R/mKD'rwx;w?}Z~{ileU#zg", 'http://ww6g7ooVg:./mosarc; #/q;f}ZzF{ielW~gL', "http://Jsw.6L'oM*R/mKD'r3w;w?~{ileU#zg", "http://sw.6'oog*R/msa'rsx;/qw?}Z#z{ileRYUl;~g", "http://sw6'oog*V/msa'rsx;/w\x7f}Z#zileRUl;~g", "http://sw6'oog*/msa'rsx;/g\x7fp}Z#zileRUl;~g"]
Success! In our population, each and every input now is valid and has a different coverage, coming from various combinations of schemes, paths, queries, and fragments.
all_coverage, cumulative_coverage = population_coverage(
mutation_fuzzer.population, http_program)
import matplotlib.pyplot as plt # type: ignore
plt.plot(cumulative_coverage)
plt.title('Coverage of urlparse() with random inputs')
plt.xlabel('# of inputs')
plt.ylabel('lines covered');
The nice thing about this strategy is that, applied to larger programs, it will happily explore one path after the other – covering functionality after functionality. All that is needed is a means to capture the coverage.
This chapter introduces a MutationFuzzer
class that takes a list of seed inputs which are then mutated:
seed_input = "http://www.google.com/search?q=fuzzing"
mutation_fuzzer = MutationFuzzer(seed=[seed_input])
[mutation_fuzzer.fuzz() for i in range(10)]
['http://www.google.com/search?q=fuzzing', 'http://wwBw.google.com/searh?q=fuzzing', 'http8//wswgoRogle.am/secch?qU=fuzzing', 'ittp://www.googLe.com/serch?q=fuzzingZ', 'httP://wgw.google.com/seasch?Q=fuxzanmgY', 'http://www.google.cxcom/search?q=fuzzing', 'hFttp://ww.-g\x7fog+le.com/s%arch?q=f-uzz#ing', 'http://www\x0egoogle.com/seaNrch?q=fuZzing', 'http//www.Ygooge.comsarch?q=fuz~Ijg', 'http8//ww.goog5le.com/sezarc?q=fuzzing']
The MutationCoverageFuzzer
maintains a population of inputs, which are then evolved in order to maximize coverage.
mutation_fuzzer = MutationCoverageFuzzer(seed=[seed_input])
mutation_fuzzer.runs(http_runner, trials=10000)
mutation_fuzzer.population[:5]
['http://www.google.com/search?q=fuzzing', 'http://wwv.oogle>co/search7Eq=fuzing', 'http://wwv\x0eOogleb>co/seakh7Eq\x1d;fuzing', 'http://wwv\x0eoglebkooqeakh7Eq\x1d;fuzing', 'http://wwv\x0eoglekol=oekh7Eq\x1d\x1bf~ing']
# ignore
from ClassDiagram import display_class_hierarchy
# ignore
display_class_hierarchy(MutationCoverageFuzzer,
public_methods=[
Fuzzer.run,
Fuzzer.__init__,
Fuzzer.runs,
Fuzzer.fuzz,
MutationFuzzer.__init__,
MutationFuzzer.fuzz,
MutationCoverageFuzzer.run,
],
types={'Location': Location},
project='fuzzingbook')
In the next chapter on greybox fuzzing, we further extend the concept of mutation-based testing with power schedules that allow spending more energy on seeds that exercise "unlikely" paths and seeds that are "closer" to a target location.
Apply the above guided mutation-based fuzzing technique on cgi_decode()
from the "Coverage" chapter. How many trials do you need until you cover all variations of +
, %
(valid and invalid), and regular characters?
seed = ["Hello World"]
cgi_runner = FunctionCoverageRunner(cgi_decode)
m = MutationCoverageFuzzer(seed)
results = m.runs(cgi_runner, 10000)
m.population
['Hello World', 'he_<+llo(or<D', 'L}eml &Wol%dD', 'L)q<}aml &cWol%d3D+']
cgi_runner.coverage()
{('cgi_decode', 16), ('cgi_decode', 17), ('cgi_decode', 18), ('cgi_decode', 19), ('cgi_decode', 20), ('cgi_decode', 23), ('cgi_decode', 24), ('cgi_decode', 25), ('cgi_decode', 26), ('cgi_decode', 27), ('cgi_decode', 29), ('cgi_decode', 30), ('cgi_decode', 31), ('cgi_decode', 32), ('cgi_decode', 33), ('cgi_decode', 34), ('cgi_decode', 38), ('cgi_decode', 39), ('cgi_decode', 40), ('run_function', 7)}
all_coverage, cumulative_coverage = population_coverage(
m.population, cgi_decode)
import matplotlib.pyplot as plt
plt.plot(cumulative_coverage)
plt.title('Coverage of cgi_decode() with random inputs')
plt.xlabel('# of inputs')
plt.ylabel('lines covered');
After 10,000 runs, we have managed to synthesize a +
character and a valid %xx
form. We can still do better.
Apply the above mutation-based fuzzing technique on bc
, as in the chapter "Introduction to Fuzzing".
Start with non-guided mutations. How many of the inputs are valid?
!curl -O mirrors.kernel.org/gnu/bc/bc-1.07.1.tar.gz
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 410k 100 410k 0 0 343k 0 0:00:01 0:00:01 --:--:-- 343k
!tar xfz bc-1.07.1.tar.gz
Second, configure the package:
!cd bc-1.07.1; ./configure
checking for a BSD-compatible install... /opt/homebrew/bin/ginstall -c checking whether build environment is sane... yes checking for a thread-safe mkdir -p... /opt/homebrew/bin/gmkdir -p checking for gawk... no checking for mawk... no checking for nawk... no checking for awk... awk checking whether make sets $(MAKE)... yes checking whether make supports nested variables... yes checking for gcc... gcc checking whether the C compiler works... yes checking for C compiler default output file name... a.out checking for suffix of executables... checking whether we are cross compiling... no checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc accepts -g... yes checking for gcc option to accept ISO C89... none needed checking whether gcc understands -c and -o together... yes checking for style of include used by make... GNU checking dependency style of gcc... gcc3 checking how to run the C preprocessor... gcc -E checking for grep that handles long lines and -e... /usr/bin/grep checking for egrep... /usr/bin/grep -E checking for ANSI C header files... yes checking for sys/types.h... yes checking for sys/stat.h... yes checking for stdlib.h... yes checking for string.h... yes checking for memory.h... yes checking for strings.h... yes checking for inttypes.h... yes checking for stdint.h... yes checking for unistd.h... yes checking minix/config.h usability... no checking minix/config.h presence... no checking for minix/config.h... no checking whether it is safe to define __EXTENSIONS__... yes checking for flex... flex checking lex output file root... lex.yy checking lex library... -ll checking whether yytext is a pointer... yes checking for ar... ar checking the archiver (ar) interface... ar checking for bison... bison -y checking for ranlib... ranlib checking whether make sets $(MAKE)... (cached) yes checking for stdarg.h... yes checking for stddef.h... yes checking for stdlib.h... (cached) yes checking for string.h... (cached) yes checking for errno.h... yes checking for limits.h... yes checking for unistd.h... (cached) yes checking for lib.h... no checking for an ANSI C-conforming const... yes checking for size_t... yes checking for ptrdiff_t... yes checking for vprintf... yes checking for _doprnt... no checking for isgraph... yes checking for setvbuf... yes checking for fstat... yes checking for strtol... yes Adding GCC specific compile flags. checking that generated files are newer than configure... done configure: creating ./config.status config.status: creating Makefile config.status: creating bc/Makefile config.status: creating dc/Makefile config.status: creating lib/Makefile config.status: creating doc/Makefile config.status: creating doc/texi-ver.incl config.status: creating config.h config.status: executing depfiles commands
Third, compile the package with special flags:
!cd bc-1.07.1 && make -k CFLAGS="--coverage"
/Applications/Xcode.app/Contents/Developer/usr/bin/make all-recursive Making all in lib gcc -DHAVE_CONFIG_H -I. -I.. -I. -I.. -I./../h -g -O2 -Wall -funsigned-char --coverage -MT getopt.o -MD -MP -MF .deps/getopt.Tpo -c -o getopt.o getopt.c getopt.c:348:28: warning: passing arguments to 'getenv' without a prototype is deprecated in all versions of C and is not supported in C2x [-Wdeprecated-non-prototype] 348 | posixly_correct = getenv ("POSIXLY_CORRECT"); | ^ In file included from getopt.c:106: ./../h/getopt.h:144:12: warning: a function declaration without a prototype is deprecated in all versions of C and is treated as a zero-parameter prototype in C2x, conflicting with a subsequent definition [-Wdeprecated-non-prototype] 144 | extern int getopt (); | ^ getopt.c:1135:1: note: conflicting prototype is here 1135 | getopt (int argc, char *const *argv, const char *optstring) | ^ 2 warnings generated. mv -f .deps/getopt.Tpo .deps/getopt.Po gcc -DHAVE_CONFIG_H -I. -I.. -I. -I.. -I./../h -g -O2 -Wall -funsigned-char --coverage -MT getopt1.o -MD -MP -MF .deps/getopt1.Tpo -c -o getopt1.o getopt1.c mv -f .deps/getopt1.Tpo .deps/getopt1.Po gcc -DHAVE_CONFIG_H -I. -I.. -I. -I.. -I./../h -g -O2 -Wall -funsigned-char --coverage -MT vfprintf.o -MD -MP -MF .deps/vfprintf.Tpo -c -o vfprintf.o vfprintf.c mv -f .deps/vfprintf.Tpo .deps/vfprintf.Po gcc -DHAVE_CONFIG_H -I. -I.. -I. -I.. -I./../h -g -O2 -Wall -funsigned-char --coverage -MT number.o -MD -MP -MF .deps/number.Tpo -c -o number.o number.c mv -f .deps/number.Tpo .deps/number.Po rm -f libbc.a ar cru libbc.a getopt.o getopt1.o vfprintf.o number.o ranlib libbc.a Making all in bc gcc -DHAVE_CONFIG_H -I. -I.. -I. -I./../h -g -O2 -Wall -funsigned-char --coverage -MT main.o -MD -MP -MF .deps/main.Tpo -c -o main.o main.c In file included from main.c:34: ./../h/getopt.h:144:12: warning: a function declaration without a prototype is deprecated in all versions of C and is treated as a zero-parameter prototype in C2x, conflicting with a previous declaration [-Wdeprecated-non-prototype] 144 | extern int getopt (); | ^ /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/unistd.h:509:6: note: conflicting prototype is here 509 | int getopt(int, char * const [], const char *) __DARWIN_ALIAS(getopt); | ^ 1 warning generated. mv -f .deps/main.Tpo .deps/main.Po gcc -DHAVE_CONFIG_H -I. -I.. -I. -I./../h -g -O2 -Wall -funsigned-char --coverage -MT bc.o -MD -MP -MF .deps/bc.Tpo -c -o bc.o bc.c mv -f .deps/bc.Tpo .deps/bc.Po gcc -DHAVE_CONFIG_H -I. -I.. -I. -I./../h -g -O2 -Wall -funsigned-char --coverage -MT scan.o -MD -MP -MF .deps/scan.Tpo -c -o scan.o scan.c mv -f .deps/scan.Tpo .deps/scan.Po gcc -DHAVE_CONFIG_H -I. -I.. -I. -I./../h -g -O2 -Wall -funsigned-char --coverage -MT execute.o -MD -MP -MF .deps/execute.Tpo -c -o execute.o execute.c mv -f .deps/execute.Tpo .deps/execute.Po gcc -DHAVE_CONFIG_H -I. -I.. -I. -I./../h -g -O2 -Wall -funsigned-char --coverage -MT load.o -MD -MP -MF .deps/load.Tpo -c -o load.o load.c mv -f .deps/load.Tpo .deps/load.Po gcc -DHAVE_CONFIG_H -I. -I.. -I. -I./../h -g -O2 -Wall -funsigned-char --coverage -MT storage.o -MD -MP -MF .deps/storage.Tpo -c -o storage.o storage.c mv -f .deps/storage.Tpo .deps/storage.Po gcc -DHAVE_CONFIG_H -I. -I.. -I. -I./../h -g -O2 -Wall -funsigned-char --coverage -MT util.o -MD -MP -MF .deps/util.Tpo -c -o util.o util.c mv -f .deps/util.Tpo .deps/util.Po gcc -DHAVE_CONFIG_H -I. -I.. -I. -I./../h -g -O2 -Wall -funsigned-char --coverage -MT warranty.o -MD -MP -MF .deps/warranty.Tpo -c -o warranty.o warranty.c warranty.c:56:1: warning: a function definition without a prototype is deprecated in all versions of C and is not supported in C2x [-Wdeprecated-non-prototype] 56 | warranty(prefix) | ^ 1 warning generated. mv -f .deps/warranty.Tpo .deps/warranty.Po echo '{0}' > libmath.h /Applications/Xcode.app/Contents/Developer/usr/bin/make global.o gcc -DHAVE_CONFIG_H -I. -I.. -I. -I./../h -g -O2 -Wall -funsigned-char --coverage -MT global.o -MD -MP -MF .deps/global.Tpo -c -o global.o global.c mv -f .deps/global.Tpo .deps/global.Po gcc -g -O2 -Wall -funsigned-char --coverage -o libmath.h -o fbc main.o bc.o scan.o execute.o load.o storage.o util.o warranty.o global.o ../lib/libbc.a -ll ./fbc -c ./libmath.b </dev/null >libmath.h ./fix-libmath_h 2655 2793 rm -f ./fbc ./global.o gcc -DHAVE_CONFIG_H -I. -I.. -I. -I./../h -g -O2 -Wall -funsigned-char --coverage -MT global.o -MD -MP -MF .deps/global.Tpo -c -o global.o global.c mv -f .deps/global.Tpo .deps/global.Po gcc -g -O2 -Wall -funsigned-char --coverage -o bc main.o bc.o scan.o execute.o load.o storage.o util.o global.o warranty.o ../lib/libbc.a -ll Making all in dc gcc -DHAVE_CONFIG_H -I. -I.. -I./.. -I./../h -g -O2 -Wall -funsigned-char --coverage -MT dc.o -MD -MP -MF .deps/dc.Tpo -c -o dc.o dc.c mv -f .deps/dc.Tpo .deps/dc.Po gcc -DHAVE_CONFIG_H -I. -I.. -I./.. -I./../h -g -O2 -Wall -funsigned-char --coverage -MT misc.o -MD -MP -MF .deps/misc.Tpo -c -o misc.o misc.c mv -f .deps/misc.Tpo .deps/misc.Po gcc -DHAVE_CONFIG_H -I. -I.. -I./.. -I./../h -g -O2 -Wall -funsigned-char --coverage -MT eval.o -MD -MP -MF .deps/eval.Tpo -c -o eval.o eval.c mv -f .deps/eval.Tpo .deps/eval.Po gcc -DHAVE_CONFIG_H -I. -I.. -I./.. -I./../h -g -O2 -Wall -funsigned-char --coverage -MT stack.o -MD -MP -MF .deps/stack.Tpo -c -o stack.o stack.c mv -f .deps/stack.Tpo .deps/stack.Po gcc -DHAVE_CONFIG_H -I. -I.. -I./.. -I./../h -g -O2 -Wall -funsigned-char --coverage -MT array.o -MD -MP -MF .deps/array.Tpo -c -o array.o array.c mv -f .deps/array.Tpo .deps/array.Po gcc -DHAVE_CONFIG_H -I. -I.. -I./.. -I./../h -g -O2 -Wall -funsigned-char --coverage -MT numeric.o -MD -MP -MF .deps/numeric.Tpo -c -o numeric.o numeric.c numeric.c:576:1: warning: a function definition without a prototype is deprecated in all versions of C and is not supported in C2x [-Wdeprecated-non-prototype] 576 | out_char (ch) | ^ 1 warning generated. mv -f .deps/numeric.Tpo .deps/numeric.Po gcc -DHAVE_CONFIG_H -I. -I.. -I./.. -I./../h -g -O2 -Wall -funsigned-char --coverage -MT string.o -MD -MP -MF .deps/string.Tpo -c -o string.o string.c mv -f .deps/string.Tpo .deps/string.Po gcc -g -O2 -Wall -funsigned-char --coverage -o dc dc.o misc.o eval.o stack.o array.o numeric.o string.o ../lib/libbc.a Making all in doc restore=: && backupdir=".am$$" && \ am__cwd=`pwd` && CDPATH="${ZSH_VERSION+.}:" && cd . && \ rm -rf $backupdir && mkdir $backupdir && \ if (makeinfo --no-split --version) >/dev/null 2>&1; then \ for f in bc.info bc.info-[0-9] bc.info-[0-9][0-9] bc.i[0-9] bc.i[0-9][0-9]; do \ if test -f $f; then mv $f $backupdir; restore=mv; else :; fi; \ done; \ else :; fi && \ cd "$am__cwd"; \ if makeinfo --no-split -I . \ -o bc.info bc.texi; \ then \ rc=0; \ CDPATH="${ZSH_VERSION+.}:" && cd .; \ else \ rc=$?; \ CDPATH="${ZSH_VERSION+.}:" && cd . && \ $restore $backupdir/* `echo "./bc.info" | sed 's|[^/]*$||'`; \ fi; \ rm -rf $backupdir; exit $rc restore=: && backupdir=".am$$" && \ am__cwd=`pwd` && CDPATH="${ZSH_VERSION+.}:" && cd . && \ rm -rf $backupdir && mkdir $backupdir && \ if (makeinfo --no-split --version) >/dev/null 2>&1; then \ for f in dc.info dc.info-[0-9] dc.info-[0-9][0-9] dc.i[0-9] dc.i[0-9][0-9]; do \ if test -f $f; then mv $f $backupdir; restore=mv; else :; fi; \ done; \ else :; fi && \ cd "$am__cwd"; \ if makeinfo --no-split -I . \ -o dc.info dc.texi; \ then \ rc=0; \ CDPATH="${ZSH_VERSION+.}:" && cd .; \ else \ rc=$?; \ CDPATH="${ZSH_VERSION+.}:" && cd . && \ $restore $backupdir/* `echo "./dc.info" | sed 's|[^/]*$||'`; \ fi; \ rm -rf $backupdir; exit $rc make[3]: Nothing to be done for `all-am'.
The file bc/bc
should now be executable...
!cd bc-1.07.1/bc; echo 2 + 2 | ./bc
4
...and you should be able to run the gcov
program to retrieve coverage information.
!cd bc-1.07.1/bc; gcov main.c
File 'main.c' Lines executed:52.55% of 137 Creating 'main.c.gcov'
As sketched in the "Coverage" chapter, the file bc-1.07.1/bc/main.c.gcov now holds the coverage information for bc.c
. Each line is prefixed with the number of times it was executed. #####
means zero times; -
means non-executable line.
Parse the GCOV file for bc
and create a coverage
set, as in FunctionCoverageRunner
. Make this a ProgramCoverageRunner
class that would be constructed with a list of source files (bc.c
, main.c
, load.c
) to run gcov
on.
When you're done, don't forget to clean up:
!rm -fr bc-1.07.1 bc-1.07.1.tar.gz
When adding a new element to the list of candidates, AFL does actually not compare the coverage, but adds an element if it exercises a new branch. Using branch coverage from the exercises of the "Coverage" chapter, implement this "branch" strategy and compare it against the "coverage" strategy, above.
Design and implement a system that will gather a population of URLs from the Web. Can you achieve a higher coverage with these samples? What if you use them as initial population for further mutation?