**Abstract**: A partial evaluator automatically specializes a program
with respect to some of its input. This article shows how the idea
comes up naturally when you write program generators by hand, then
builds a basic online partial evaluation library and puts it to work
transforming an interpreter into a compiler.

Mainstream programmers think of writing an interpreter or a compiler as a major job, worth doing only for a major problem. They know this because the languages they use every day have big, serious implementations, and the compiler class they took in school had them write just one big, semi-serious compiler. Lispers know better: all the textbooks show how to write a Lisp interpreter in about a page of code. While their underlying Lisp system is just as big and serious as the mainstream compiler, they know how and why to whip up a little special-purpose minilanguage.

But even in the Lisp world, partial evaluation is more obscure. For most hackers, if they've heard of it at all, it was through either a fancy general-purpose partial evaluator or a toy example. This tutorial article tries to introduce the subject in the same style as our familiar one-page interpreters: small examples that both show the ideas in a tractable context and suggest how to use them on real problems. After studying this, hopefully, you'll be able to not just apply a general-purpose partial evaluator like PGG, but also write a small custom one targeted to your problem, as needed.

These ideas aren't new, not even the idea of popularizing them this way: see the further reading for original sources. (That's especially recommended if you don't know Scheme, as I present a lot of Scheme code without explanation. Some helper functions for this code are defined separately.)

Section 1 sets the stage by presenting a typical hand-written code
generation function, the sort of thing thoroughly familiar to anyone
who's written Lisp macros. We go through it in detail to bring out
why we'd want to automate the job better. Sections 2 and 3 refactor
this example and reformulate what we've done as symbolic execution.
Section 4 applies symbolic execution to a bigger program: the
rewrite-rule interpreter from *Paradigms of AI Programming*,
converting it into a compiler almost automatically. Section 5
reflects on the result and the principles it illustrates.

Say we have a `power`

function:

(define (power x n) (cond ((= n 0) 1) ((odd? n) (* x (power x (- n 1)))) (else (square (power x (/ n 2)))))) (define (square x) (* x x))

and for speed we want custom versions of it for particular `n`

, like

`(* x x)`

when `n`

=2, `(* x (* x x))`

when `n`

=3, and so on. Here's a
first cut at the solution:

(define (emit-power x n) (cond ((= n 0) 1) ((odd? n) `(* x ,(emit-power x (- n 1)))) (else `(square ,(emit-power x (/ n 2))))))> (emit-power 'x 5); (* x (square (square (* x 1))))

(Interactive input appears in

, with the response in **bold**

.)
The emitted code looks right, but we wanted to make it as efficient as
possible -- so, first off, the *italic*`(* x 1)`

has got to go:

(define (emit-power x n) (cond ((= n 0) 1) ((= n 1) 'x) ((odd? n) `(* x ,(emit-power x (- n 1)))) (else `(square ,(emit-power x (/ n 2))))))> (emit-power 'x 5); (* x (square (square x)))

Better, but there are still those runtime calls to `square`

. Let's
inline them:

(define (emit-power x n) (cond ((= n 0) 1) ((= n 1) 'x) ((odd? n) `(* x ,(emit-power x (- n 1)))) (else (emit-square (emit-power x (/ n 2)))))) (define (emit-square x) `(* ,x ,x))> (emit-power 'x 5); (* x (* (* x x) (* x x)))

Wait a minute, we wound up duplicating the `(* x x)`

. We need to
factor it explicitly:

(define (emit-square x) `(let ((y ,x)) (* y y)))> (emit-power 'x 5); (* x (let ((y (let ((y x)) (* y y)))); (* y y)))

Argh, too many `let`

s! Get rid of the redundant one:

(define (emit-square x) (if (symbol? x) `(* ,x ,x) `(let ((y ,x)) (* y y))))> (emit-power 'x 5); (* x (let ((y (* x x))) (* y y)))

And we're done! Or are we? There's been a bug all along --
`emit-power`

ignores its `x`

parameter:

> (emit-power 'input 2); (* x x)

We need to unquote the `x`

:

(define (emit-power x n) (cond ((= n 0) 1) ((= n 1) x) ((odd? n) `(* ,x ,(emit-power x (- n 1)))) (else (emit-square (emit-power x (/ n 2))))))> (emit-power 'input 2); (* input input)

That wasn't hard, but it was kind of annoying and fiddly. Surely
there's a better way? The code would have been simpler if we could
depend on our Scheme compiler to optimize out the redundant `let`

and
the `(* x 1)`

for us. In general, though, the compiler won't know all
the optimizations we need; this article covers techniques that go
beyond relying on the underlying compiler for everything. The next
most convenient choice for us would be if we could push the
optimizations into an underlying layer we could forget about, as if it
were part of the compiler. So let's try gathering all the emission
code into a new function, `emit-*`

, where we'll put the optimizations:

(define (emit-power x n) (cond ((= n 0) 1) ((odd? n) (emit-* x (emit-power x (- n 1)))) (else (emit-square (emit-power x (/ n 2)))))) (define (emit-square x) (emit-* x x)) (define (emit-* x y) `(* ,x ,y))> (emit-power 'x 5); (* x (* (* (* x 1) (* x 1)) (* (* x 1) (* x 1))))

We're back to a completely unoptimized output. We need to restore the
optimizations, putting them in `emit-*`

this time:

(define (emit-* x y) (cond ((eqv? y 1) x) ((and (equal? x y) (not (symbol? x))) `(let ((y ,x)) (* y y))) (else `(* ,x ,y))))> (emit-power 'x 5); (* x (let ((y (* x x))) (* y y)))

Now we can use the original code unchanged to emit its specialized
version, just by rebinding the `*`

it uses:

(define (emit-power x n) (let ((* emit-*)) (define (power x n) (cond ((= n 0) 1) ((odd? n) (* x (power x (- n 1)))) (else (square (power x (/ n 2)))))) (define (square x) (* x x)) (power x n)))

Looking back over this result suggests another way to think about what we're doing: to emit specialized code, run the original program on symbolic values instead of numbers. Where an operation takes a symbolic value, have it return code that computes the runtime value, instead of returning a value immediately. Running the program generates a trace of the operations to be done.

If this is truly general it should work on another example, so let's try something else, specializing it the `right' way from the start. We evaluate a polynomial represented by a list of coefficients, highest-order first:

(define (poly-value coeffs x) (foldl (lambda (value coeff) (+ (* value x) coeff)) 0 coeffs))> (poly-value '(5 0 1) 2); 21

If we specialize `poly-value`

for particular polynomials, the residual
code will use `+`

and `*`

. We need to define an `emit-+`

, and while
we're at it we'll factor out common code between the emitters and
invest in some more optimizations -- defined in Appendix B.

(define (emit-poly-value coeffs x) (let ((* emit-*) (+ emit-+)) (foldl (lambda (value coeff) (+ (* value x) coeff)) 0 coeffs)))> (emit-poly-value '(5 0 1) 'x); (+ (* (* 5 x) x) 1)

It's rather nice that we could write `poly-value`

in as high-level a
style as we liked, using `foldl`

instead of a loop, without seeing any
trace of that abstraction in the output code. This works for the same
reason we rarely need to worry about the speed of macroexpansion, only
we didn't even have to make the decision up front to write a macro
instead of ordinary code.

There are several directions we could go from here:

- Do something about having to hand-collate the residual operators and then write emitters for them all.
- Build more optimizations into the emitters to generate smarter code for more complicated programs.
- Handle a wider variety of programs, yielding residual code with assignments, conditionals, loops, and so on.

We could tackle the first by writing emitters for all Scheme
functions, but if we did that we'd find that it works poorly in
general -- the `let`

hack used in `emit-*`

and `emit-+`

is too weak.
If instead of `square`

we'd started from something like

(define (foo x) (* x (- x)))

then our `emit-*`

would not have noticed the code duplication. Here's
a general common subexpression eliminator to solve this problem and
simultaneously attack the second item above.

(define (with-cse receiver) (let ((bindings '())) (define (cseify emitter) (lambda operands (let ((exp (apply emitter operands))) (cond ((or (symbol? exp) (number? exp)) exp) ((assoc exp bindings) => cadr) (else (let ((name (fresh-name))) (set! bindings (cons (list exp name) bindings)) name)))))) (let ((exp (receiver cseify))) `(let* ,(reverse (map reverse bindings)) ,exp))))

`(with-cse receiver)`

returns an expression produced by calling
`(receiver cseify)`

but with common subexpressions eliminated.
`cseify`

is a function that transforms an emitter into an equivalent
one that remembers and reuses the expressions produced, to avoid
redundant code. `receiver`

should return an expression built partly
out of calls to emitters built with `cseify`

.

This is clearer with an example:

(define (emit-poly-value coeffs x) (with-cse (lambda (cseify) (let ((* (cseify emit-*)) (+ (cseify emit-+))) (foldl (lambda (value coeff) (+ (* value x) coeff)) 0 coeffs)))))> (emit-poly-value '(5 0 1) 'x); (let* ((x0 (* 5 x)) (x1 (* x0 x)) (x2 (+ x1 1))) x2)

*Exercise:* Improve `with-cse`

so it doesn't bind the redundant
variable `x2`

in the output code above.

*Exercise:* Specialize the *n*-point Fast Fourier Transform for
constant *n*. It's possible to write the FFT in a straightforward,
inefficient style (using O(*n*^{2}) operations) and let our specialization
library optimize it to O(*n* log *n*) automatically. You'll still need to
use the Danielson-Lanczos lemma: see Appendix C. (Answer)

*Exercise:* Extend the code to handle impure primitives like
`set-car!`

and `write`

. This might work like:

> (with-cse (lambda (cseify cseify-impure)> (let ((write (cseify-impure 'write)))> (for-each write '(x y z))))); (let* ((x0 (write x)) (x1 (write y)) (x2 (write z))) #f)

*Exercise:* Instead of using `with-cse`

, write a `let`

macro
shadowing the normal one. That is, when you use a costly computation
more than once, it's normal for your code to have a `let`

naming that
computation, so you should be able to exploit this information instead
of reconstructing it with `with-cse`

. What other special forms would
need to be shadowed to make this work transparently?

For our next trick, we'll take an interpreter and turn it into a
compiler. The example interpreter is based on the rewrite-rule
interpreter in Peter Norvig's *Paradigms of AI Programming* -- it's a nice example that also
comes with a hand-coded compiler we can compare ours to. His code is
in Common Lisp, so we reconstruct a simplified version in Scheme
first. (Alternatively I could've ported the symbolic execution code
to Common Lisp, which is probably simpler, but I just didn't feel like
it. And using two different Lisp dialects in this article would
narrow its audience even further.)

We can use rewrite rules to simplify algebraic expressions. For example,

> (simplify '(+ (* 3 x) (* x 3))); (* 6 x)

This works by applying a list of rules to all parts of the subject expression repeatedly until no more simplifications are possible:

(define *simplification-rules* '(((+ ?x ?x) (* 2 ?x)) ((* ?s ?n) (* ?n ?s)) ((* ?n (* ?m ?x)) (* (* ?n ?m) ?x)) ((* ?x (* ?n ?y)) (* ?n (* ?x ?y))) ((* (* ?n ?x) ?y) (* ?n (* ?x ?y)))))

The left hand column has patterns to match, while the right hand holds
responses. The first rule says, if you see `(+ foo foo)`

, rewrite it
into `(* 2 foo)`

. Variables like `?x`

can match anything, while `?m`

and `?n`

can only match numbers. (Norvig uses lots more rules; this
is just a sample.)

So here's the code. I won't explain it in any detail, since this article isn't about writing interpreters.

;; Returnexpsimplified until no more rules apply.(define (simplify exp) (if (pair? exp) (simplify-exp (map simplify exp)) exp));; Simplify an expression whose subexpressions are already simplified.(define (simplify-exp exp) (or (arithmetic-eval exp) (rule-based-translator exp *simplification-rules* (lambda (env response) (simplify (sublis env response)))) exp));; Find a rule that matchesinput, and applyactionto the matching;; bindings and the response part of the matching rule. If none;; matches, return #f.(define (rule-based-translator input rules action) (let checking ((rules rules)) (and (not (null? rules)) (let ((pattern (rule-pattern (car rules))) (response (rule-response (car rules)))) (cond ((pat-match pattern input '()) => (lambda (env) (action env response))) (else (checking (cdr rules))))))));; The fields of a rule.(define rule-pattern car) (define rule-response cadr);; Try to matchpatternagainstinputgiven existing bindings in;;env, returning an extendedenv. If this is impossible, return;; #f.envmay be #f already, and then no match is possible.(define (pat-match pattern input env) (cond ((not env) #f) ((pair? pattern) (and (pair? input) (pat-match (cdr pattern) (cdr input) (pat-match (car pattern) (car input) env)))) ((variable? pattern) (match-variable pattern input env)) ((equal? pattern input) env) (else #f)));; Try to match a variablevaragainstinputgivenenv.(define (match-variable var input env) (cond ((assq var env); already bound=> (lambda (pair) (and (equal? input (cdr pair)) env))) ((memq var '(?m ?n)); numeric variables(and (number? input) (acons var input env))) (else (acons var input env))));; Return true iffxis a variable.(define (variable? x) (and (symbol? x) (< 0 (string-length (symbol->string x))) (char=? #\? (string-ref (symbol->string x) 0))));; Perform the substitutions inenvuponexp.(define (sublis env exp) (cond ((null? exp) '()) ((pair? exp) (cons (sublis env (car exp)) (sublis env (cdr exp)))) ((assq exp env) => cdr) (else exp)))

Besides the rule-based translator, we also simplify with
`arithmetic-eval`

which just reduces constant expressions like `(+ 2 3)`

to numbers.

;; Try to reduceexpas a constant expression; return #f if impossible.(define (arithmetic-eval exp) (and (all number? (exp-args exp)) (or (memq (exp-op exp) '(+ - * /)) (and (eq? (exp-op exp) '^) (integer? (cadr (exp-args exp))))) (apply (eval-primitive-name (exp-op exp)) (exp-args exp)))) (define (eval-primitive-name op) (cadr (assq op op-meanings))) (define op-meanings `((+ ,+) (- ,-) (* ,*) (/ ,/) (^ ,expt)));; The parts of a compound expression: operator and arguments.(define exp-op car) (define exp-args cdr)

Norvig sped up this simplifier by compiling out what
`rule-based-translator`

does with `*simplification-rules*`

--
converting a list of rules into Lisp code that tests a subject
expression and builds a new expression out of the response of the
matching rule, if any. This is just what we'd get by partially
evaluating `rule-based-translator`

against `*simplification-rules*`

--
can we do that, instead of hand-coding a compiler?

Let's look at which operations depend on the dynamic input. Each such operation will have to show up in the compiled code.

(define (rule-based-translator input rules action) (let checking ((rules rules)) (and (not (null? rules)) (let ((pattern (rule-pattern (car rules))) (response (rule-response (car rules)))) (cond ((pat-match pattern input '()) => (lambda (env) (action env response))) (else (checking (cdr rules))))))))

Here, all the rules are static -- only `input`

and `env`

vary
dynamically. (`env`

is an interesting case: we know that if we
succeed in matching a particular pattern like `(+ ?x ?x)`

, we'll get
an env like `((?x . `

back, where *something*))*something*

depends on dynamic input. This is called a *partially static*
structure.) The `cond`

is dynamic -- it depends on the match against
dynamic data, and therefore we'll need a `cond`

or equivalent in the
compiled code.

(define (pat-match pattern input env) (cond ((not env) #f) ((pair? pattern) (and (pair? input) (pat-match (cdr pattern) (cdr input) (pat-match (car pattern) (car input) env)))) ((variable? pattern) (match-variable pattern input env)) ((equal? pattern input) env) (else #f)))

For `pat-match`

, the dynamic variables are `input`

and `env`

.
Operations on them are `(not env)`

, `(pair? input)`

, `(car input)`

,

`(cdr input)`

, and `(equal? pattern input)`

. All but the first are
reasonable things to have to do at runtime, but why are we testing
`env`

for failure? `env`

will be false only if some earlier step in
the matching failed, and then we should simply return #f for the whole
match. Instead, this code will discover failure at some point in the
pattern tree, then repeatedly check for #f at each parent node back to
the root. What was convenient in the interpreter leads to lousy code
in the compiler, the root cause being that it unnecessarily let a
dynamic dependence leak into its control flow. We can factor that
dependence out with a continuation we'll name `succeed`

:

;; Try to matchpatternagainstinputgiven existing bindings in;;env, returning(succeed env2)whereenv2is an extendedenv.;; If instead we fail to match, return #f.(define (pat-match pattern input env succeed) (cond ((pair? pattern) (and (pair? input) (pat-match (car pattern) (car input) env (lambda (env) (pat-match (cdr pattern) (cdr input) env succeed))))) ((variable? pattern) (match-variable pattern input env succeed)) (else (and (equal? pattern input) (succeed env)))))

Note that we're adding the new parameter to `match-variable`

as well.
We need to change `rule-based-translator`

to supply the new parameter:

(define (rule-based-translator input rules action) (let checking ((rules rules)) (and (not (null? rules)) (let ((pattern (rule-pattern (car rules))) (response (rule-response (car rules)))) (or (pat-match pattern input '() (lambda (env) (action env response))) (checking (cdr rules)))))))

`match-variable`

changes in the same way. It adds another dynamic
operation, `(number? input)`

:

(define (match-variable var input env succeed) (cond ((assq var env) => (lambda (pair) (and (equal? (cdr pair) input) (succeed env)))) ((memq var '(?m ?n)) (and (number? input) (succeed (acons var input env)))) (else (succeed (acons var input env)))))

Finally, `sublis`

adds a `cons`

operation. What about the association
list functions `acons`

and `assq`

? These operate on partially static
structures here, as mentioned above. `assq`

will look up a static
variable yielding a dynamic result. As long as we're happy with that
dynamic expression appearing in the code wherever the `assq`

generates
it, we're home free -- we don't need to build the association list at
runtime. Where is it safe for `assq`

to move that expression?
Anywhere but outside the scope of any variables it references. (If
the code also used side effects and recursion, we'd have to worry
about evaluation order and termination, too.) But adding the

`succeed`

continuation didn't just eliminate a test, it also moved our
build-a-result operation into the scope of the pattern match -- just
what we needed! Hurray.

Concretely, for a rule like `(foo ?x)`

rewriting to `(bar ?x)`

, the
`succeed`

made the difference between output code like:

(let ((env (and (pair? subject) (let ((t1 (car subject)) (t2 (cdr subject))) (and (eq? t1 'foo) (pair? t2) (null? (cdr t2)) (acons '?x (car t2) '())))))) (and env (list 'bar (cdr (assq '?x env)))))

and the much better:

(and (pair? subject) (let ((t1 (car subject)) (t2 (cdr subject))) (and (eq? t1 'foo) (pair? t2) (null? (cdr t2)) (list 'bar (car t2)))))

Now we're ready to wrap the interpreter inside our partial evaluator, except for a couple things:

- We need to produce dynamic conditionals, and we don't have any emitters for them yet.
- Instead of numbers, this interpreter's values are general
s-expressions, so we need a general way to distinguish static values
at compile time from dynamic code to emit. (Previously we just called
`number?`

.)

Tackling the second issue first: we leave static values alone and attach a unique tag to dynamic values. (Could we do it the other way around? Should we?)

(define dynamic-tag (list 'dynamic)) (define (emit code) (list dynamic-tag code)) (define (dynamic.code dynamic) (cadr dynamic)) (define (dynamic? obj) (and (pair? obj) (eq? dynamic-tag (car obj)))) (define (static? obj) (not (dynamic? obj))) (define (as-code obj) (cond ((dynamic? obj) (dynamic.code obj)) ((or (null? obj) (pair? obj) (symbol? obj)) (list 'quote obj)) (else obj)));; Generic emitter maker with constant folding(define (make-emitter op op-name) (lambda operands (if (all static? operands) (apply op operands) (emit `(,op-name ,@(map as-code operands))))))

Dynamic conditionals interact with common subexpression elimination:
if you generate code like `(if foo (bar ...) (baz ...))`

, you don't
want to eliminate subexpressions of `(baz ...)`

that happen to match
a subexpression of `(bar ...)`

. We'll handle this by having

`with-cse`

supply a new procedure, `(run thunk)`

, that calls `(thunk)`

while keeping its subexpressions in a new subcontext of the current
context.

(define (with-cse receiver) (let ((bindings '())) (define (run emitter-thunk) (let* ((old-bindings bindings) (value (emitter-thunk)) (result (wrap-let* (list-difference bindings old-bindings) value))) (set! bindings old-bindings) result)) (define (cseify emitter) (lambda operands (let ((value (apply emitter operands))) (if (static? value) value (let ((code (dynamic.code value))) (cond ((symbol? code) value) ((assoc code bindings) => (lambda (pair) (emit (cadr pair)))) (else (let ((name (fresh-name))) (set! bindings (cons (list code name) bindings)) (emit name))))))))) (define (wrap-let* bindings exp) (emit `(let* ,(reverse (map reverse bindings)) ,(as-code exp)))) (let ((value (receiver cseify run))) (as-code (wrap-let* bindings value)))))

Ready to roll!

(define (compile-rules) (with-cse (lambda (cse run) (let ((cons (cse (make-emitter cons 'cons))) (car (cse (make-emitter car 'car))) (cdr (cse (make-emitter cdr 'cdr))) (pair? (cse (make-emitter pair? 'pair?))) (number? (cse (make-emitter number? 'number?))) (equal? (cse (make-emitter equal? 'equal?))) (%and (lambda (test then-proc) (emit-and test (run then-proc)))) (%or (lambda (test else-proc) (emit-or test (run else-proc))))) (define (rule-based-translator input rules action) (let checking ((rules rules)) (and (not (null? rules)) (let ((pattern (rule-pattern (car rules))) (response (rule-response (car rules)))) (%or (pat-match pattern input '() (lambda (env) (action env response))) (lambda () (checking (cdr rules)))))))) (define (pat-match pattern input env succeed) (cond ((pair? pattern) (%and (pair? input) (lambda () (pat-match (car pattern) (car input) env (lambda (env) (pat-match (cdr pattern) (cdr input) env succeed)))))) ((variable? pattern) (match-variable pattern input env succeed)) (else (%and (equal? pattern input) (lambda () (succeed env)))))) (define (match-variable var input env succeed) (cond ((assq var env) => (lambda (pair) (%and (equal? (cdr pair) input) (lambda () (succeed env))))) ((memq var '(?m ?n)) (%and (number? input) (lambda () (succeed (acons var input env))))) (else (succeed (acons var input env))))) (define (sublis a-list exp) (cond ((null? exp) '()) ((pair? exp) (cons (sublis a-list (car exp)) (sublis a-list (cdr exp)))) ((assq exp a-list) => cdr) (else exp))) (rule-based-translator (emit 'subject) *simplification-rules* sublis)))))

All we've done to the body is change dynamic instances of `and`

and

`or`

to `%and`

and `%or`

with thunked arguments; we could have avoided
that change as well by using macros. We still need emitters for those
dynamic conditionals:

(define (emit-and . operands) (emit `(and ,@(map as-code operands)))) (define (emit-or . operands) (emit `(or ,@(map as-code operands))))

and a new top-level function to call the compiled code:

(define (simplify-exp exp) (cond ((arithmetic-eval exp)) ((apply-rules exp) => simplify) (else exp))) (define apply-rules (eval `(lambda (subject) ,(compile-rules))))

How does this compiler compare to Norvig's? The structure is essentially the same, but his was built by looking at some hypothetical sample output and then writing new code by hand that would generate it. The continuation-passing came in as a eureka step. Our continuation was kind of eurekaish, too, but at least it wasn't needed to get a working compiler. A fancy partial evaluator might have found that optimization on its own, as a ``continuation-based binding-time improvement''.

Norvig does two optimizations we don't: first, his compiler assumes the subject expressions are arithmetic expressions with a fixed number of arguments, so the pattern-matching makes fewer tests on it. We could do the same with changes to the pattern matcher.

Second, different rules share tests they hold in common -- for
example, if two successive patterns are `(+ u v)`

and `(+ x y)`

, the
compiled code only checks for a `+`

once. This is a form of common
subexpression elimination across branches, implemented with rewrite
rules on the output code like
`(or (and x e1) (and x e2)) --> (and x (or e1 e2))`

. This is a
reusable kind of optimization we could profitably add to our library.

*Project:* Generalize the code emission library, letting users
supply custom optimization rules in the rewrite rule language we just
implemented. Look up Norvig's `and`

/`or`

optimizations and implement
them here; you'll find you need to fix `with-cse`

to give variables
canonical names instead of monotonically-increasing ones (otherwise
pattern-matching on output code would miss opportunities when
equivalent code uses different variables). Provide a way to generate
output code with loops and recursions.

More important than the quality of the code is the principle we've
just drawn out: a compiler can be derived from an interpreter *automatically* by analyzing how the interpreter depends on runtime
input. (There's an implication for how to write interpreters: write
so this analysis stays simple. Don't unnecessarily mix binding
times.) More generally, we can think of many, many code optimization
problems as moving binding times around, where the same logic applies.

We've also seen how simple, quickly understandable tools can deliver a significant fraction of the benefit of a sophisticated partial evaluator.

*Exercise:* Our simplifier still has a significant interpretive
component: we only compiled away the `rule-based-translator`

part.
Write a compiler from a set of rules to a standalone Scheme program.
Benchmark it against the simplifier above.

- Ken Anderson, ``Freeing the Essence of a Computation.'' Presents a similar partial evaluator in Common Lisp along with a nice example using it -- I wrote this article to explore the ideas in Anderson's paper for myself.
- Peter Thiemann, PGG. A sophisticated program specializer for full Scheme.
- Berlin and Weise,
``IEEE Computer 23(12):25-37, December 1990.
The placeholder technique and its application to specializing real
scientific programs beyond the range of the fancier partial evaluators
of the day. Uses a hacked version of the MIT Scheme interpreter, with
backends for C or RTL code as well as Scheme.
- Hatcliff, Mogensen, and Thiemann (editors),
*Partial Evaluation: Practice and Theory: Proceedings of the 1998 DIKU International Summer School*. Volume 1706 of Lecture Notes in Computer Science, Springer-Verlag, 1999. This seems to be the most up-to-date book-length overview. - Jones, Gomard, and Sestoft,
*Partial Evaluation and Automatic Program Generation*. The standard text. Emphasizes offline techniques and self-application, neither of which is touched on here. - Frank Pagan,
*Partial Computation and the Construction of Language Processors*. Prentice-Hall, 1991. A basic textbook on this approach to compiler design, doing partial evaluation by hand. Examples include parsing, source-to-source translation, code generation, decompiling, and macroexpansion. - Richard Fateman, ``Compiling Functional Pipe/Stream Abstractions Into Conventional Programs: Software Pipelines.'' Some more examples of generating numerical code in Lisp.
- Matteo Frigo, ``A Fast Fourier Transform Compiler.''
*PLDI*1999. FFTW is an industrial-strength development of the line of thought shown in the FFT exercise in Section 3. - Rajeev Surati. ``Practical Partial Evaluation.'' Masters Thesis, Massachussets Institute of Technology, May 1995. Similar to the project in Section 5 for a user-customizable partial evaluator.

It might seem like I'm implying Norvig should have written his match compiler my way, but that's not so at all -- hand-coding it made perfect sense in a book where this material would be a lengthy detour. The book also included a reference to the Berlin/Weise paper above.

Thanks to Richard Uhtenwoldt, Attila Lendvai, and Gene Michael Stover for problem reports and suggestions.

;; Extend an association list.(define (acons x y a-list) (cons (cons x y) a-list));; Return true ifftest?is true of each element inls.(define (all test? ls) (or (null? ls) (and (test? (car ls)) (all test? (cdr ls))))) (define pi (* 2 (atan 1 0))) (define fresh-name-counter -1) (define (fresh-name) (set! fresh-name-counter (+ fresh-name-counter 1)) (string->symbol (string-append "x" (number->string fresh-name-counter))));; Return the list of integers from 0 top- 1.(define (iota p) (let loop ((p p) (ls '())) (if (= p 0) ls (loop (- p 1) (cons (- p 1) ls))))) (define (foldl fn acc ls) (if (null? ls) acc (foldl fn (fn acc (car ls)) (cdr ls))));; Pre:smalliseq?to some tail ofbig.;; Return the head ofbigbefore that tail.(define list-difference (lambda (big small) (let loop ((ls big)) (if (eq? ls small) '() (cons (car ls) (loop (cdr ls)))))))

(define (emit-* x y) (cond ((or (eqv? x 0) (eqv? y 0)) 0) ((eqv? x 1) y) ((eqv? y 1) x) (else (emit-binary-op * '* x y)))) (define (emit-+ x y) (cond ((eqv? x 0) y) ((eqv? y 0) x) (else (emit-binary-op + '+ x y)))) (define (emit-binary-op op op-name x y) (cond ((and (number? x) (number? y)) (op x y)) ((and (equal? x y) (not (symbol? x))) `(let ((y ,x)) (,op-name y y))) (else `(,op-name ,x ,y))))

An *n*-point Fourier transform evaluates a polynomial of order *n*
at *n* points (the complex *n*th roots of 1), for a total of O(*n*^2) complex multiplications. The time can be reduced to O(*n* log

*n*) when *n* is a power of 2, using a clever, tightly coded
algorithm that exploits commonalities among the *n* evaluations.
Or, if we're feeling lazy, we can just specialize the simple algorithm
for a particular *n* and let the common subexpression eliminator
take care of it all. Ours isn't smart enough to do that unless we
help it along a little with the Danielson-Lanczos lemma:

for a polynomial *p*(*x*) = *c*_{0} + *c*_{1} *x* + *c*_{2} *x*^{2} + ...

we rewrite it as *p*(*x*) = *q*(*x*^{2}) + *x* *r*(*x*^{2})

where *q*(*x*) = *c*_{0} + *c*_{2} *x* + ... (using the even-numbered coeffs)

and *r*(*x*) = *c*_{1} + *c*_{3} *x* + ... (using the odd-numbered coeffs)

This recursion is the central insight of the Fast Fourier Transform; we leave the remaining coding details for the partial evaluator to work out.

;; Pre:(length inputs)is a power of 2.(define (dft inputs) (let ((n (length inputs))) (map (lambda (i) (poly inputs i n)) (iota n))));; Evaluate the complex polynomial with the given coefficients;; at argument(root-of-unity p n).;; Pre:(length coeffs)is a power of 2.(define (poly coeffs p n) (if (null? (cdr coeffs)) (car coeffs) (alternate coeffs (lambda (evens odds) (let ((p2 (modulo (* 2 p) n))) (complex+ (poly evens p2 n) (complex* (root-of-unity p n) (poly odds p2 n))))))));; Return(receiver x y), wherexis the elements oflsat;; even positions, andyis the elements at odd positions.;; E.g. forls= '(A B C D), return(receiver'(A C)'(B D)).;; Pre:(length ls)is even.(define (alternate ls receiver) (do ((ls ls (cddr ls)) (evens '() (cons (car ls) evens)) (odds '() (cons (cadr ls) odds))) ((null? ls) (receiver (reverse evens) (reverse odds)))));; Complex numbers - constructor and accessors.(define (make-complex re im) (list re im)) (define re car) (define im cadr) (define (complex+ a b) (cmake (+ (re a) (re b)) (+ (im a) (im b))));;(a + ib) (x + iy) = ax - by + i(bx + ay)(define (complex* a x) (cmake (- (* (re a) (re x)) (* (im a) (im x))) (+ (* (im a) (re x)) (* (re a) (im x)))));; exp(2 pi i/n) = cos(...) + i sin(...)(define (root-of-unity i n) (let ((theta (* (* 2 pi) (/ i n)))) (make-complex (cos theta) (sin theta))))

(define (emit-fft inputs) (with-cse (lambda (cse) (let ((* (cse emit-*)) (+ (cse emit-+)) (- (cse emit--)));; Return the discrete fourier transform of a list of complex numbers;; (whose length must be a power of 2).(define (dft inputs) (let ((n (length inputs))) (map (lambda (i) (poly inputs i n)) (iota n))));; Evaluate the complex polynomial with the given coefficients;; at argument(root-of-unity p n).(define (poly coeffs p n) (if (null? (cdr coeffs)) (car coeffs) (alternate coeffs (lambda (evens odds) (let ((p2 (modulo (* 2 p) n))) (complex+ (poly evens p2 n) (complex* (root-of-unity p n) (poly odds p2 n))))))));; Return(receiver x y), wherexis the elements oflsat;; even positions, andyis the elements at odd positions.;; E.g. forls= '(A B C D), return(receiver'(A C)'(B D)).(define (alternate ls receiver) (do ((ls ls (cddr ls)) (evens '() (cons (car ls) evens)) (odds '() (cons (cadr ls) odds))) ((null? ls) (receiver (reverse evens) (reverse odds))))) (define (complex+ a b) (make-complex (+ (re a) (re b)) (+ (im a) (im b)))) (define (complex* a b) (make-complex (- (* (re a) (re b)) (* (im a) (im b))) (+ (* (im a) (re b)) (* (re a) (im b))))) (dft inputs))))) (define (make-complex real imag) (list real imag)) (define re car) (define im cadr);; Return exp(i*2*pi*k/n).(define (root-of-unity k n) (let ((theta (* (* 2 pi) (/ k n)))) (make-complex (cos theta) (sin theta))))> (emit-fft (list (make-complex 'r0 'i0) (make-complex 'r1 'i1))); (let* ((x0 (+ r0 r1)); (x1 (+ i0 i1)); (x2 (* -1.0 r1)); (x3 (* 1.2246063538223772e-16 i1)); (x4 (- x2 x3)); (x5 (* 1.2246063538223772e-16 r1)); (x6 (* -1.0 i1)); (x7 (+ x5 x6)); (x8 (+ r0 x4)); (x9 (+ i0 x7))); ((x0 x1) (x8 x9)))

Well, that didn't quite work, because `(sin pi)`

is not 0 in our
floating-point math, and similarly for `cos`

. A hack around this:

;; exp(2 pi i/n) = cos(...) + i sin(...)(define (root-of-unity i n) (let ((theta (* (* 2 pi) (/ i n)))) (make-complex (cosine theta) (sine theta)))) (define (sine x) (cond ((= x pi) 0) (else (sin x)))) (define (cosine x) (cond ((= x (/ pi 2)) 0) ((= x (* 1.5 pi)) 0) (else (cos x))))

More interesting is the way we cheated, above, by using our ordinary
runtime representation of complex numbers, but with symbolic values in
the real and imaginary components. We could have written emitters for
`make-complex`

and its accessors, and strictly speaking we should
have, since there are residual `make-complex`

operations at the end --
the last line of the output ought to be

; ((make-complex x0 x1) (make-complex x8 x9)))

But we didn't bother because the emitters are just extra bureaucracy in this case -- leaving them out gives us the convenience of pretending the intermediate objects are ordinary complex numbers, and now we can patch up the last line at our leisure.

Home | © 1994-2002 by Darius Bacon