In previous chapters, we have seen how Grammar-Based Fuzzing allows us to efficiently generate myriads of syntactically valid inputs. However, there are semantic input features that cannot be expressed in a context-free grammar, such as
In this chapter, we show how the ISLa framework allows us to express such features as constraints added to a grammar. By having ISLa solve these constraints automatically, we produce inputs that are not only syntactically valid, but actually semantically valid. Furthermore, such constraints allow us to very precisely shape the inputs we want for testing.
Prerequisites
# ignore
import sys
# ignore
if sys.version_info < (3, 10):
print("This code requires Python 3.10 or later")
sys.exit(0)
In this book, we have frequently used grammars to systematically generate inputs that cover input structure and more. But while it is relatively easy to express the syntax of an input using a grammar, there are input properties that cannot be expressed using a grammar. Such input properties are called semantic properties.
Let us illustrate semantic properties using a simple example. We want to test some system that is configured by two settings, a page size and a buffer size. Both these come as integer numbers as part of a human-readable configuration file. The syntax of this file is given by the following grammar:
CONFIG_GRAMMAR: Grammar = {
"<start>": ["<config>"],
"<config>": [
"pagesize=<pagesize>\n"
"bufsize=<bufsize>"
],
"<pagesize>": ["<int>"],
"<bufsize>": ["<int>"],
"<int>": ["<leaddigit><digits>"],
"<digits>": ["", "<digit><digits>"],
"<digit>": list("0123456789"),
"<leaddigit>": list("123456789")
}
assert is_valid_grammar(CONFIG_GRAMMAR)
Here's a visualization of this grammar as a railroad diagram, showing its structure:
# ignore
syntax_diagram(CONFIG_GRAMMAR)
start
config
pagesize
bufsize
int
digits
digit
leaddigit
Using this grammar, we can now use any of our grammar-based fuzzers to generate valid inputs. For instance:
fuzzer = GrammarFuzzer(CONFIG_GRAMMAR)
for i in range(10):
print(i)
print(fuzzer.fuzz())
0 pagesize=4057 bufsize=817 1 pagesize=9 bufsize=8 2 pagesize=5 bufsize=25 3 pagesize=1 bufsize=2 4 pagesize=62 bufsize=57 5 pagesize=2 bufsize=893 6 pagesize=1 bufsize=33 7 pagesize=7537 bufsize=3 8 pagesize=97 bufsize=983 9 pagesize=2 bufsize=2
So far, so good - and indeed, these random values will help us test our (hypothetical) system. But what if we want to control these values further, putting our system to the test?
A grammar gives us some control. If we want to ensure a page size of at least 100,000, for instance, a rule like
"<bufsize>": ["<leaddigit><digit><digit><digit><digit><digit>"]
would do the job. We could also express that the page size should be an odd number, by having it end in an odd digit. But if we want to state that the page size should be, say, a multiple of 8, or larger or less than the buffer size, we are out of luck.
In the chapter on fuzzing with generators, we have seen how to attach program code to individual rules - program code that would either generate individual elements right away or filter only these that satisfy specific conditions.
Attaching code makes things very flexible, but also has several disadvantages:
In essence, you have to code your own strategy for generating inputs, which at some point negates the advantage of having an abstract representation such as a grammar.
Hence, we are looking to a more general way to express semantic properties - and also a more declarative way to express semantic properties.
In recent work, Dominic Steinhöfel and Andreas Zeller (one of the authors of this book) have presented an infrastructure that allows producing inputs with arbitrary properties, but without having to go through the trouble of implementing producers or checkers. Instead, they suggest a dedicated language for specifying inputs, named ISLa (for input specification language). ISLa combines a standard context-free grammar with constraints that express semantic properties of the inputs and their elements. ISLa can be used as a fuzzer (producing inputs that satisfy the constraints) as well as a checker (checking inputs whether they satisfy the given constraints).
Let us illustrate ISLa by example. ISLa comes as a Python package named isla-solver
that can be easily installed using pip
:
$ pip install isla-solver
This also installs all dependent packages.
The core of ISLa is the ISLa Solver – the component that actually solves constraints to produce satisfying inputs.
The constructor of an ISLaSolver
takes two mandatory arguments.
To express a constraint, we have a variety of functions and predicates at our disposition.
These can be applied to individual elements of the grammar, notably their nonterminals.
The function str.len()
, for instance, returns the length of a string.
If we want to have inputs in which the page size has at least 6 digits, we can write:
solver = ISLaSolver(CONFIG_GRAMMAR, 'str.len(<pagesize>) >= 6')
The method solve()
returns the next produced string from the ISLa solver, as a derivation tree (seen in the Chapter on fuzzing with grammars).
To convert these into a string, we can use the str()
converter:
str(solver.solve())
'pagesize=989235516706537\nbufsize=3'
The print()
method converts its arguments to strings implicitly.
To get, say, the next 10 solutions, we can thus write
for _ in range(10):
print(solver.solve())
pagesize=989235516706537 bufsize=402493567181 pagesize=989235516706537 bufsize=1 pagesize=989235516706537 bufsize=96 pagesize=989235516706537 bufsize=81 pagesize=989235516706537 bufsize=514 pagesize=989235516706537 bufsize=2 pagesize=989235516706537 bufsize=635 pagesize=989235516706537 bufsize=7 pagesize=989235516706537 bufsize=3 pagesize=3608148 bufsize=44
... and we see that, indeed, each page size has exactly six digits.
To express a minimum numeric value, we can use a more elegant way.
The function str.to.int()
, for instance, converts a string into an integer.
To obtain a page size that of at least 100000, we can thus also write
solver = ISLaSolver(CONFIG_GRAMMAR,
'str.to.int(<pagesize>) >= 100000')
print(solver.solve())
pagesize=100008 bufsize=25
If we want the page size to be in the range of 100 to 200, we can state this as a logical conjunction (using and
)
solver = ISLaSolver(CONFIG_GRAMMAR,
'''
str.to.int(<pagesize>) >= 100 and
str.to.int(<pagesize>) <= 200
''')
print(solver.solve())
pagesize=199 bufsize=9
And if we want the page size to be a multiple of seven, we can write
solver = ISLaSolver(CONFIG_GRAMMAR,
'''
str.to.int(<pagesize>) mod 7 = 0
''')
print(solver.solve())
pagesize=14 bufsize=7
quiz("Which of the following constraints expresses "
"that the page size and the buffer size "
"have to be equal? Try it out!",
[
"`<pagesize> is <bufsize>`",
"`str.to.int(<pagesize>) = str.to.int(<bufsize>)`",
"`<pagesize> = <bufsize>`",
"`atoi(<pagesize>) == atoi(<bufsize>)`",
], "[4 ** 0.5, 9 ** 0.5]")
Indeed, ISLa constraints can also involve multiple elements. Expressing equality between two elements is easy, and uses a single equal sign. (There's no assignment in ISLa the =
could be confused with.)
solver = ISLaSolver(CONFIG_GRAMMAR,
'''
<pagesize> = <bufsize>
''')
print(solver.solve())
pagesize=2 bufsize=2
We can also use numerical constraints, stating that the buffer size should always be exactly one more than the page size:
solver = ISLaSolver(CONFIG_GRAMMAR,
'''
str.to.int(<pagesize>) > 1024 and
str.to.int(<bufsize>) = str.to.int(<pagesize>) + 1
''')
print(solver.solve())
pagesize=1025 bufsize=1026
All the above functions (like str.to.int()
) actually stem from the SMT-LIB library for satisfiability modulo theories (SMT), a standard for expressing constraints for constraint solvers (like ISLa).
The list of all theories defined in SMT-LIB lists dozens of functions and predicates that can be used in ISLa constraints.
quiz("Which constraints are necessary to "
"ensure that all digits are between 1 and 3?",
[
"`str.to.int(<digit>) >= 1`",
"`str.to.int(<digit>) <= 3`",
"`str.to.int(<leaddigit>) >= 1`",
"`str.to.int(<leaddigit>) <= 3`",
], "[1 ** 0, 4 ** 0.5, 16 ** 0.5]")
When you install isla-solver
, you also get an isla
command-line tool.
This allows you to create inputs from the command line or shell scripts.
Let us first create a grammar file suitable for isla
. isla
accepts grammars in Fuzzingbook format; they need to define a variable named grammar
.
with open('grammar.py', 'w') as f:
f.write('grammar = ')
f.write(str(CONFIG_GRAMMAR))
print_file('grammar.py')
grammar = {'<start>': ['<config>'], '<config>': ['pagesize=<pagesize>\nbufsize=<bufsize>'], '<pagesize>': ['<int>'], '<bufsize>': ['<int>'], '<int>': ['<leaddigit><digits>'], '<digits>': ['', '<digit><digits>'], '<digit>': ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'], '<leaddigit>': ['1', '2', '3', '4', '5', '6', '7', '8', '9']}
With this, we can use isla
as a grammar fuzzer, plain and simple
By default, isla solve
produces one single matching output:
!isla solve grammar.py
pagesize=71 bufsize=13475698200
The true power of isla
, however, comes to be as we (again) add constraints to be solved - either in separate constraint files or (easier) directly on the command line:
!isla solve grammar.py --constraint '<pagesize> = <bufsize>'
pagesize=8 bufsize=8
!isla solve grammar.py \
--constraint '<pagesize> = <bufsize> and str.to.int(<pagesize>) > 10000'
pagesize=99290248 bufsize=99290248
The isla
executable provides several options and commands, and is a great alternative on the command line.
!isla --help
usage: isla [-h] [-v] {solve,fuzz,check,find,parse,repair,mutate,create,config} ... The ISLa command line interface. options: -h, --help show this help message and exit -v, --version Print the ISLa version number Commands: {solve,fuzz,check,find,parse,repair,mutate,create,config} solve create solutions to ISLa constraints or check their unsatisfiability fuzz pass solutions to an ISLa constraint to a test subject check check whether an input satisfies an ISLa constraint find filter files satisfying syntactic & semantic constraints parse parse an input into a derivation tree if it satisfies an ISLa constraint repair try to repair an existing input such that it satisfies an ISLa constraint mutate mutate an input such that the result satisfies an ISLa constraint create create grammar and constraint stubs config dumps the default configuration file
So far, we have accessed nonterminals simply by referencing their name, as in <bufsize>
or <pagesize>
.
However, in some cases, this simple method is not sufficient.
In our configuration grammar, for instance, we may want to access (or constrain) <int>
elements.
However, we do not want to constrain all integers at once, but only those in a particular context – say, those that occur as a part of a <pagesize>
element, or only those that occur as part of a <config>
element.
To this end, ISLa allows referencing parts of a given element, using two special operators.
The expression <a>.<b>
refers to the immediate subpart <b>
of some element <a>
.
That is, <b>
has to appear in one of the expansion rules of <a>
.
For instance, <pagesize>.<int>
refers to the <int>
element of a page size.
Here is an example using the dot operator:
solver = ISLaSolver(CONFIG_GRAMMAR,
'''
<pagesize>.<int> = <bufsize>.<int>
''')
print(solver.solve())
pagesize=8 bufsize=8
The expression <a>..<b>
, however, refers to any subpart <b>
of some element <a>
. That is, <b>
can appear in the expansion of <a>
, but also in the expansion of any subelement (and any subelement thereof).
Here is an example using the double dot operator, enforcing every digit in a <config>
element to be 7
:
solver = ISLaSolver(CONFIG_GRAMMAR,
'''
<config>..<digit> = "7" and <config>..<leaddigit> = "7"
''')
print(solver.solve())
pagesize=7 bufsize=77
To reason about dots and double dots, it helps to visualize the string in question as a derivation tree discussed in the chapter on grammar-based fuzzing. The derivation tree of the input
pagesize=12
bufsize=34
for instance, looks like this:
# ignore
inp = 'pagesize=12\nbufsize=34'
parser = EarleyParser(CONFIG_GRAMMAR)
tree = next(parser.parse(inp))
display_tree(tree)
In this tree, the .
syntax refers to immediate children. <bufsize>.<int>
is the one <int>
node that is the immediate descendant of <bufsize>
(but not any other <int>
node).
In contrast, <config>..<digit>
refers to all <digit>
descendants of the <config>
node.
If an element has multiple immediate children of the same type, one can use the special <a>[$n$]
syntax to access the $n$-th child of type <a>
. To access the first child, $n$ is equal to one, not zero, as in the XPath abbreviated syntax. In our configuration grammar, there is no expansion including the same nonterminal more than once, so we do not need this feature.
For a demonstration of indexed dots, consider the following grammar, which produces lines of three "a" or "b" characters:
LINES_OF_THREE_AS_OR_BS_GRAMMAR: Grammar = {
'<start>': ['<A>'],
'<A>': ['<B><B><B>', '<B><B><B>\n<A>'],
'<B>': ['a', 'b']
}
fuzzer = GrammarFuzzer(LINES_OF_THREE_AS_OR_BS_GRAMMAR)
for i in range(5):
print(i)
print(fuzzer.fuzz())
0 aab bab 1 aaa aba 2 aba baa bba 3 bbb abb 4 baa
We can force, say, the second character in a line to always be a "b:"
solver = ISLaSolver(LINES_OF_THREE_AS_OR_BS_GRAMMAR,
'''
<A>.<B>[2] = "b"
''')
for i in range(5):
print(i)
print(solver.solve())
0 abb 1 bbb 2 abb 3 bbb 4 abb
By default, all nonterminals in ISLa constraints are universally quantified - that is, any constraint applying to, say, some <int>
element applies to all <int>
elements in the resulting string.
If you only want to constrain one element, though, you have to (and can) specify this in ISLa, using an existential quantifier.
To use an existential quantifier in ISLa, use the construct
exists TYPE VARIABLE in CONTEXT:
(CONSTRAINT)
where VARIABLE
is some identifier, TYPE
is its type (as a nonterminal), and CONTEXT
is the context (again a nonterminal) in which the constraint should hold.
CONSTRAINT
is again a constraint expression, in which you now can make use of VARIABLE
as the element whose existence you assume.
Let us illustrate existential quantification again using a simple example. We want to make sure that at least one integer in our generated string has a value of more than 1000. So we write
solver = ISLaSolver(CONFIG_GRAMMAR,
'''
exists <int> i in <start>:
str.to.int(i) > 1000
''')
for _ in range(10):
print(solver.solve())
pagesize=1003 bufsize=5 pagesize=1003 bufsize=2280617349521 pagesize=1003 bufsize=8 pagesize=1003 bufsize=7 pagesize=1003 bufsize=3 pagesize=1003 bufsize=93 pagesize=1003 bufsize=630 pagesize=1003 bufsize=4 pagesize=1003 bufsize=14 pagesize=1003 bufsize=5
We note that all generated inputs satisfy the constraint of having at least one integer that satisfies the constraint.
Specifying a variable name is optional; if you omit it, you can use the quantified nonterminal instead. The above constraint can thus also be expressed in a more compact fashion:
solver = ISLaSolver(CONFIG_GRAMMAR,
'''
exists <int> in <start>:
str.to.int(<int>) > 1000
''')
print(solver.solve())
pagesize=1009 bufsize=64
Besides existential quantification, there also is universal quantification, using the forall
keyword instead of exists
.
If we want all elements in some context to satisfy a constraint, this comes in handy.
solver = ISLaSolver(CONFIG_GRAMMAR,
'''
forall <int> in <start>:
str.to.int(<int>) > 1000
''')
for _ in range(10):
print(solver.solve())
pagesize=1001 bufsize=1001 pagesize=1001 bufsize=1002 pagesize=1001 bufsize=1003 pagesize=1001 bufsize=1004 pagesize=1002 bufsize=1001 pagesize=1002 bufsize=1002 pagesize=1002 bufsize=1003 pagesize=1002 bufsize=1004 pagesize=1003 bufsize=1001 pagesize=1003 bufsize=1002
We see that all <int>
elements satisfy the constraint.
By default, all nonterminals that are re-used directly in constraints are universally quantified within the <start>
symbol, so the above can actually be simplified to
solver = ISLaSolver(CONFIG_GRAMMAR,
'''
str.to.int(<int>) > 1000
''')
for _ in range(10):
print(solver.solve())
pagesize=1001 bufsize=1001 pagesize=1001 bufsize=1002 pagesize=1001 bufsize=1003 pagesize=1001 bufsize=1004 pagesize=1002 bufsize=1001 pagesize=1002 bufsize=1002 pagesize=1002 bufsize=1003 pagesize=1002 bufsize=1004 pagesize=1003 bufsize=1001 pagesize=1003 bufsize=1002
... and you realize that in all our initial constraints, we always had an implicit universal quantification.
Sometimes, we'd like a quantifier to apply only for a specific expansion alternative of a nonterminal. The form
forall TYPE VARIABLE=PATTERN in CONTEXT:
(CONSTRAINT)
means that the CONSTRAINT only applies to a VARIABLE that actually matches the expansion given in PATTERN.
(Again, we can replace forall
with exists
, and make this an existential quantification rather than a universal quantification.)
Here's an example of using forall
:
solver = ISLaSolver(CONFIG_GRAMMAR,
'''
forall <int>="<leaddigit><digits>" in <start>:
(<leaddigit> = "7")
''')
This ensures that when <int>
is expanded to a lead digit followed by more digits, the lead digit becomes 7
.
The effect is that all <int>
values now start with a 7
digit:
str(solver.solve())
'pagesize=71\nbufsize=770835426929713'
Likewise, we can constrain <int>
as a whole, and thus ensure that all numbers are greater than 100:
solver = ISLaSolver(CONFIG_GRAMMAR,
'''
forall <int> in <start>:
(str.to.int(<int>) > 100)
''')
str(solver.solve())
'pagesize=101\nbufsize=101'
By default, all variables are universally quantified in <start>
, so the above can also be expressed as
solver = ISLaSolver(CONFIG_GRAMMAR,
'''
str.to.int(<int>) > 100
''')
str(solver.solve())
'pagesize=101\nbufsize=101'
In a quantification pattern, we can also name individual nonterminal elements and use them in our constraints.
This is done by replacing the nonterminal <ID>
with the special form {<ID> VARIABLE}
(in curly braces) which then makes the variable VARIABLE
a placeholder for the value matched by ID
; VARIABLE
can then be used in constraints.
Here is an example. In the expansion <leaddigit><int>
, we want to ensure that the <leaddigit>
is always 9
.
Using the special brace form, we make lead
a variable holding the value of <leaddigit>
, and can then use it in a constraint:
solver = ISLaSolver(CONFIG_GRAMMAR,
'''
forall <int> i="{<leaddigit> lead}<digits>" in <start>:
(lead = "9")
''')
This (again) ensures that all lead digits should be 9
:
for _ in range(10):
print(solver.solve())
pagesize=92 bufsize=9168435097796 pagesize=9 bufsize=9188 pagesize=981 bufsize=999 pagesize=9 bufsize=9 pagesize=9 bufsize=9 pagesize=9 bufsize=9 pagesize=91 bufsize=9 pagesize=9 bufsize=9 pagesize=90 bufsize=9 pagesize=9 bufsize=9242
Could we express the above in a simpler fashion? Yes! For one, we can refer to <leaddigit>
directly rather than introducing variables like i
and lead
:
solver = ISLaSolver(CONFIG_GRAMMAR,
'''
forall <int>="<leaddigit><digits>" in <start>:
(<leaddigit> = "9")
''')
print(solver.solve())
pagesize=9 bufsize=941890257631
Furthermore, using implicit universal quantification and the dot notation introduced earlier, we could write, for instance
solver = ISLaSolver(CONFIG_GRAMMAR,
'''
<int>.<leaddigit> = "9"
''')
print(solver.solve())
pagesize=99 bufsize=9501387624
or just
solver = ISLaSolver(CONFIG_GRAMMAR,
'''
<leaddigit> = "9"
''')
print(solver.solve())
pagesize=98 bufsize=93762401955
and obtain the same result (not necessarily the exact same values, though, due to randomness):
But while universal quantification and dot notation are sufficient for many cases, the pattern matching notation is more general and more flexible – even if it may be harder to read.
Using an ISLaSolver
, we can also check if an string satisfies the constraints.
This can be applied to inputs, but also to outputs; ISLa constraints can thus server as oracles – that is, predicates that check a test result.
Let us check if in a given string, <pagesize>
and <bufsize>
are the same.
constraint = '<pagesize> = <bufsize>'
solver = ISLaSolver(CONFIG_GRAMMAR, constraint)
To check the tree, we can pass it into the evaluate()
method of the solver
– and find that the given input does not satisfy our constraint.
solver.check('pagesize=12\nbufsize=34')
False
If we repeat the above, however, with an input that satisfies the constraint, we obtain a True
result.
solver.check('pagesize=27\nbufsize=27')
True
Checking constraints is much more efficient than solving them, as ISLa does not have to search for possible solutions.
Let us further illustrate ISLa using a few case studies.
The Extensible Markup Language (XML) is a typical example of an input language that cannot be fully expressed using a context-free grammar. The problem is not so much expressing the syntax of XML – the basics are fairly easy:
XML_GRAMMAR: Grammar = {
"<start>": ["<xml-tree>"],
"<xml-tree>": ["<open-tag><xml-content><close-tag>"],
"<open-tag>": ["<<id>>"],
"<close-tag>": ["</<id>>"],
"<xml-content>": ["Text", "<xml-tree>"],
"<id>": ["<letter>", "<id><letter>"],
"<letter>": crange('a', 'z')
}
assert is_valid_grammar(XML_GRAMMAR)
# ignore
syntax_diagram(XML_GRAMMAR)
start
xml-tree
open-tag
close-tag
xml-content
id
letter
The problem becomes evident when we produce inputs from the grammar: The <id>
elements in <open-tag>
and <close-tag>
do not match.
fuzzer = GrammarFuzzer(XML_GRAMMAR)
fuzzer.fuzz()
'<xdps><s><x><f>Text</g></ka></k></hk>'
If we want the tag IDs to match, we need to come up with a finite set of tags (as in, say, HTML); then we can extend the grammar with one rule for each tag - <body>...</body>
, <p>...</p>
, <strong>...</strong>
, and so on.
For an infinite set of tags, though, as in our grammar, expressing that the two tag IDs must match is not possible in a context-free grammar.
With ISLa, however, constraining the grammar is easy.
All we need is the rule that constrains the <xml-tree>
:
solver = ISLaSolver(XML_GRAMMAR,
'''
<xml-tree>.<open-tag>.<id> = <xml-tree>.<close-tag>.<id>
''', max_number_smt_instantiations=1)
for _ in range(3):
print(solver.solve())
<p>Text</p> <p><p>Text</p></p> <p><p><p>Text</p></p></p>
and we see that the <id>
tags now indeed match each other.
In contrast to the "input generator" solution in the chapter on generators, our constraint-based solution is purely declarative - and can also be used to parse and check inputs. Plus, of course, we can easily add more constraints:
solver = ISLaSolver(XML_GRAMMAR,
'''
<xml-tree>.<open-tag>.<id> = <xml-tree>.<close-tag>.<id>
and
str.len(<id>) > 10
''', max_number_smt_instantiations=1)
for _ in range(3):
print(solver.solve())
<ppnxhklftkp>Text</ppnxhklftkp> <ppnxhklftkp><ppmpkkqktdh>Text</ppmpkkqktdh></ppnxhklftkp> <ppnxhklftkp><ppmpkkqktdh><ppnpscmnqlk>Text</ppnpscmnqlk></ppmpkkqktdh></ppnxhklftkp>
When testing compilers with generated program code, one often encounters the problem that before using an identifier, one has to declare it first - specifying its type, some initial value, and more.
This problem is easily illustrated in the following grammar, which produces sequences of assignments. Variable names consist of a single lowercase letter; values can only be digits; assignments are separated by semicolons.
LANG_GRAMMAR: Grammar = {
"<start>":
["<stmt>"],
"<stmt>":
["<assgn>", "<assgn>; <stmt>"],
"<assgn>":
["<lhs> := <rhs>"],
"<lhs>":
["<var>"],
"<rhs>":
["<var>", "<digit>"],
"<var>": list(string.ascii_lowercase),
"<digit>": list(string.digits)
}
assert is_valid_grammar(LANG_GRAMMAR)
syntax_diagram(LANG_GRAMMAR)
start
stmt
assgn
lhs
rhs
var
digit
Here are some assignment sequences produced by the grammar:
fuzzer = GrammarFuzzer(LANG_GRAMMAR)
for _ in range(10):
print(fuzzer.fuzz())
w := 7; g := 2 f := m; x := 3 p := 3 h := f; h := u n := k; k := z; z := 6 m := x; g := h d := 3 h := 6 s := m k := q
We see that the assignment syntax is similar to what we have in common programming languages. The semantics, however, are, well, questionable, as we commonly access variables whose values have not been previously defined. Again, this is a semantic property that cannot be expressed in a context-free grammar alone.
What we need here is a constraint specifying that on the right-hand side of an assignment, we can only have variable names that occur on the left-hand side. In ISLa, we achieve this through the following constraint:
solver = ISLaSolver(LANG_GRAMMAR,
'''
forall <rhs> in <assgn>:
exists <assgn> declaration:
<rhs>.<var> = declaration.<lhs>.<var>
''',
max_number_smt_instantiations=1,
max_number_free_instantiations=1)
for _ in range(10):
print(solver.solve())
y := 1 a := 0; t := 5 p := p u := 2; k := 4; o := 7 b := 9; s := 6; d := d h := 8; q := 3; m := h n := 4; h := 4; g := h p := 8; w := p f := 6; p := p r := p; p := 8
This is much better already, but not perfect yet - we might still have assignments like a := a
or a := b; b := 5
.
That is because our constraints do not yet take care of ordering – in a <rhs>
element, we can only use variables that are defined earlier.
For this purpose, ISLa provides a before()
predicate: before(A, B)
expresses that the element A
must occur before the element B
.
With before()
, we can rewrite our constraint as
solver = ISLaSolver(LANG_GRAMMAR,
'''
forall <rhs> in <assgn>:
exists <assgn> declaration:
(before(declaration, <assgn>) and
<rhs>.<var> = declaration.<lhs>.<var>)
''',
max_number_free_instantiations=1,
max_number_smt_instantiations=1)
... and thus ensure that on the right-hand-side of assignments, we only use identifiers defined earlier.
for _ in range(10):
print(solver.solve())
b := 6 f := 8; v := 5 p := 7; p := p; p := p; p := p d := 9; d := d; i := d; z := 4 h := 1; e := h b := 3; l := b p := 0; k := p p := 2; m := p b := 5; b := 5; x := b p := 6; p := 9; w := p
In case you find that the assignment sequences are too short, you can use the ISLa count()
predicate.
count(VARIABLE, NONTERMINAL, N)
ensures that the number of NONTERMINALs in VARIABLE is exactly N.
To have statements with exactly 5 assignments, write
solver = ISLaSolver(LANG_GRAMMAR,
'''
forall <rhs> in <assgn>:
exists <assgn> declaration:
(before(declaration, <assgn>) and
<rhs>.<var> = declaration.<lhs>.<var>)
and
count(start, "<assgn>", "5")
''',
max_number_smt_instantiations=1,
max_number_free_instantiations=1)
for _ in range(10):
print(solver.solve())
p := 3; m := p; f := 9; l := 5; j := 8 p := 0; n := p; w := 2; p := 6; e := p p := 7; k := p; s := 1; d := 4; o := p p := 4; p := p; y := 5; h := 0; z := p p := 4; q := p; p := 7; x := 7; b := p d := 9; t := d; v := d; i := 3; r := 2 p := 0; a := p; c := a; a := 1; g := 4 h := 4; h := h; u := h; l := 5; i := h h := 5; o := h; b := h; l := 6; j := h p := 1; h := p; i := h; p := 9; u := p
This chapter introduces the ISLa framework, consisting of
A typical usage of the ISLa solver is as follows. First, install ISLa, using
$ pip install isla-solver
Then, you can import the solver as
The ISLa solver needs two things. First, a grammar - say, US phone numbers.
Second, you need constraints – a string expressing a condition over one or more grammar elements. Common functions include
str.len()
, returning the length of a stringstr.to.int()
, converting a string to an integerHere, we instantiate the ISLa solver with a constraint stating that the area code should be above 900:
solver = ISLaSolver(US_PHONE_GRAMMAR,
"""
str.to.int(<area>) > 900
""")
With that, invoking solver.solve()
returns a solution for the constraints.
str(solver.solve())
'(902)805-6934'
solve()
returns a derivation tree, which typically is converted into a string using str()
as above. The print()
function does this implicitly.
Subsequent calls of solve()
return more solutions:
for _ in range(10):
print(solver.solve())
(902)671-8520 (902)308-8044 (902)737-2584 (902)500-2834 (902)429-5794 (902)292-0499 (902)977-9111 (902)209-4775 (902)565-2710 (909)223-7794
We see that the solver produces a number of inputs that all satisfy the constraint - the area code is always more than 900.
The ISLaSolver()
constructor provides several additional parameters to configure the solver, as documented below.
Additional ISLaSolver
methods allow checking inputs against constraints, and provide additional functionality.
# ignore
from ClassDiagram import display_class_hierarchy
# ignore
display_class_hierarchy([ISLaSolver],
public_methods=[
ISLaSolver.__init__,
ISLaSolver.solve,
ISLaSolver.check,
ISLaSolver.parse,
])
The ISLa functionality is also available on the command line:
!isla --help
usage: isla [-h] [-v] {solve,fuzz,check,find,parse,repair,mutate,create,config} ... The ISLa command line interface. options: -h, --help show this help message and exit -v, --version Print the ISLa version number Commands: {solve,fuzz,check,find,parse,repair,mutate,create,config} solve create solutions to ISLa constraints or check their unsatisfiability fuzz pass solutions to an ISLa constraint to a test subject check check whether an input satisfies an ISLa constraint find filter files satisfying syntactic & semantic constraints parse parse an input into a derivation tree if it satisfies an ISLa constraint repair try to repair an existing input such that it satisfies an ISLa constraint mutate mutate an input such that the result satisfies an ISLa constraint create create grammar and constraint stubs config dumps the default configuration file
In the next chapters, we will continue to focus on semantics. Among others, we will learn how to
A common way of representing strings is length-prefixed strings, a representation made popular by the PASCAL programming language.
A length-prefixed string starts with a few bytes that encode the length $L$ of the string, followed by the $L$ actual characters.
For instance, assuming that two bytes are used to encode the length, the string "Hello"
could be represented as the sequence
0x00 0x05 'H' 'e' 'l' 'l' 'o'
Write a grammar that defines the syntax of length-prefixed strings.
Use ISLa to produce valid length-prefixed strings. Make use of the SMT-LIB string library to find appropriate conversion functions.