|
|
|
|
File: [venge] / src / mkc / ols-lang-eng.mgp
(download)
Revision: 1.2, Mon Apr 29 05:37:16 2002 UTC (8 years, 4 months ago) by graydon Branch: MAIN Changes since 1.1: +217 -21 lines too tired to compose useful message |
%include "default.mgp"
%page
%nodefault
%center, size 7, font "standard", fore "white", vgap 20
"One-Day Compilers"
%size 4
or
%size 7
How I learned to stop worrying
and love static metaprogramming
%font "typewriter", size 4
graydon@redhat.com
%page
Greetings
This is a talk about rapidly constructing compilers for
%fore "red"
domain-specific languages
%cont, fore "white"
(DSLs). It is a little dense.
To get through it, I am going to start with a number
of facts which I will simply state and assume as true.
Domain-specific languages are good
"nearly ideal" software system abstractions
better than commands, APIs, object frameworks
very hard to identify and refine
even harder to implement well
DSL implementations are typically
a lot of effort
poor quality
slow
unsafe
incorrect
%page
Evidence
Considering the number of DSLs which have been developed,
the number which have succeeded widely is disappointing.
Make
Lex & Yacc
TeX & Postscript
Octave
RPM
Magicpoint
CGEN & Sid
%size 4, fore "lightblue"
uh.. LD scripts?
%pause, center, fore "yellow"
It is especially bad if you consider
their minimal feature sets, and
how much effort they took to make.
%page
Goals
I am going to show you how to make a DSL
%cont, fore "red"
compiler
%cont, fore "white"
in a single
sitting. It will be capable of:
consuming the DSL's concrete syntax
lexing and parsing
(you should already know how to do this)
constructing a semantic model
with all the bells and whistles
variables, functions, etc.
types and type inference
and strong safety properties
static compile-time checks
file:line:offset: error messages
emitting native code
which might actually be fast
%page
Example
I am going to make a compiler for a small subset of the Make(1)
language. It accepts simple Makefiles like this:
%font "typewriter", fore "yellow"
# my awesome example
pooka=fiddle faddle
zug=bleh $(pooka)
main: foo $(zug)
gcc -o $@ $<
foo: bar.o baz.o bleh.o $(zug)
gcc -o foo bar.o baz.o $(pooka) bleh.o
%page
Tools
I am going to use some weird tools.
They are not all from the FSF.
There is nothing magic about them.
Objective Caml (
%cont, fore "red"
ocaml
%cont, fore "white"
)
ML dialect
Functional / Imperative / OO language
From INRIA
LGPL runtime / QPL compiler
Camlp4
Ocaml pre-processor-pretty-printer
"like
%cont, font "typewriter"
(defmacro ...)
%cont, font "standard"
on steroids"
Cquot
C code quasiquoting for camlp4
written by me
%page
How it's going to work
Obviously we cannot write an entire compiler in one sitting
But we can steal parts of other compilers.
The two compilers we've got to work with are
%cont, fore "red", font "typewriter"
ocamlc
%cont, fore "white", font "standard"
and
%cont, fore "red", font "typewriter"
gcc
%cont, fore "white", font "standard"
.
We need to decide which to use, or whether to use both.
The decision depends on which runtime we'd prefer to use.
Ocaml runtime is
large
complex
relatively obscure
C runtime is
tiny
simple
universally available
I expect most people in this audience would prefer the C runtime.
%page
Getting to the C runtime
The Ocaml compiler and runtime is not for everyone.
Ocaml is really good at producing C code though.
So we can make a translator from Makefiles to Ocaml programs
which generate C code, then
%cont, fore "red"
run
%cont, fore "white"
those programs to get our C code.
The end-user, in fact, doesn't even need to know about ocaml.
You can just give them the C code.
This is what we're going to do with Makefiles.
%page
The Big Picture (textual)
We are going to stitch together a "virtual compiler"
Makefile is translated to Ocaml program
Lexed and Parsed
Ocaml AST produced
Ocaml program is fed into
%cont, fore "red", font "typewriter"
ocamlc
%fore "white", font "standard"
Typechecked and compiled
Compiled Ocaml program is executed
Produces a C program
C program is fed into
%cont, fore "red", font "typewriter"
gcc
%fore "white", font "standard"
Resulting object file depends on nearly nothing
%page
The Big Picture (graphical)
%center, image "vcompiler.png"
%pause, center, fore "yellow"
Isn't that incredibly stupid?
%pause, center, fore "yellow"
No, it's the Unix Philosophy in action
%pause, center, fore "yellow"
(Remember, we are on a budget here)
%page
%nodefault
%center, size 7, font "standard", fore "white", vgap 20
Act 1
Consumption
%page
Front End (part 1)
%size 3
Some easy stuff to warm up.
Makefiles have a pretty simple lexical structure.
Our lexer talks to
%cont, fore "red", font "typewriter"
camlp4
%cont, fore "white", font "standard"
so we need to return ("CLASS","lexeme") pairs.
This is the entire lexer. It is written in
%cont, fore "red", font "typewriter"
ocamllex
%cont, fore "white", font "standard"
language.
&code
let special = ['$' '@' '<' '^' '(' ')' '=' ':' '%']
let nq = ("\\\""|[^'"'])
let sym = ['-' '_' '.' '/' 'a'-'z''A'-'Z''0'-'9']*
rule token = parse
' '* { token lexbuf }
| ('#' [^'\n']*)? ('\n'+) { ("EOL", "") }
| '\t' { ("TAB", "") }
| eof { ("EOF", "") }
| (sym | '"' nq* '"') { ("WORD", lexeme lexbuf) }
| special { ("", lexeme lexbuf)}
| _ { ("", lexeme lexbuf)}
%page
Front End (part 2)
%size 3
More easy stuff.
Makefiles have a pretty simple syntactic structure.
Our parser is recursive-descent, extended LL(1).
It is written in the
%cont, fore "red", font "typewriter"
pa_extend
%cont, fore "white", font "standard"
extension language of
%cont, fore "red", font "typewriter"
camlp4
%fore "white", font "standard"
This language is designed for extending the Ocaml grammar.
But we are just going to replace the grammar altogether.
&code EXTEND
makefile_item:
[ [ EOL -> None
| s = WORD; "="; ws = words; EOL -> ...
| t = WORD; ":"; ws = words; EOL; a = actions -> ... ] ];
actions: [ [ az = LIST0 action -> ... ] ];
action: [ [ TAB; ws = words; EOL -> ... ] ];
words: [ [ ws = LIST0 word -> ... ] ];
word: [ [ w = WORD -> ...
| "$"; "^" -> ...
| "$"; "<" -> ...
| "$"; "@" -> ...
| "$"; "("; w = WORD; ")" -> ... ] ];
END
%page
Front end (part 3)
Let's ignore the
%cont, font "typewriter", fore "yellow", size 3
...
%cont, font "standard", fore "white", size 4
sections on the previous slide for now.
All we've got so far is a lexer and parser.
The parser builds "something" that represents the Makefile.
Now we have to decide
What to build
How to build it
What to do with the thing we build
%page
Semantic Model (part 1)
Our next step will be to build a semantic model
of the contents of a Makefile.
A collection of variables
Each var, a list of strings
%font "typewriter", fore "yellow"
doggies = dingo.o poodle.o
%font "standard", fore "white"
Or variable references
%font "typewriter", fore "yellow"
animals = llama.o $(doggies) loris.o
%font "standard", fore "white"
Referenced varialbes flatten
%font "typewriter", fore "yellow"
= llama.o dingo.o poodle.o loris.o
%font "standard", fore "white"
A dependency tree
Each node has a target and some dependencies
%font "typewriter", fore "yellow"
thezoo: $(animals) $(food) tourism.h
%font "standard", fore "white"
And an action to remake it
%font "typewriter", fore "yellow"
gcc -o $@ $(animals) $(food)
%font "standard", fore "white"
An action is a list of shell commands
which are just nested string lists, as with variables
%page
Semantic Model (part 2)
Assume for the moment that we have a technique for
synthesizing any Ocaml program we like, given a
Makefile as input.
We will get to that technique later.
For now, we're going to focus on building a semantic
model of a Makefile by embedding it in an Ocaml program.
By performing this embedding, we will get a lot of
semantic analysis from the Ocaml compiler for free
Variable binding
Rich native datatypes (lists, strings, etc.)
Functions
Lexical environments
Type inference and type checking
%page
Semantic Model (part 3)
We will embed the variable model directly in Ocaml semantics
Makefile string
Ocaml list with single string
Makefile variable
Ocaml function returning flattened list
Makefile variable reference
Ocaml function call
%fore "lightblue", size 2.5
There is an obscure type-inference reason why functions are used here instead of variables.
When in doubt, try it with functions.
%left, fore "white", size 3
Example Makefile Fragment:
&code doggies = dingo.o poodle.o
animals = llama.o $(doggies) loris.o
...
%fore "white", font "standard"
Becomes Ocaml Functions:
&code let rec
doggies _ = List.flatten [["dingo.o"]; ["poodle.o"]]
and animals _ = List.flatten [["llama.o"]; doggies (); ["loris.o"]]
...
%page
Semantic Model (part 4)
%size 3
For the dependency/action tree, we will define
an auxiliary Ocaml recursive datatype:
%fore "yellow", font "typewriter"
type file = string
type actions = string list
type rule = Rule of (file * rule list * actions)
%fore "white", font "standard"
Example Makefile Fragment:
&code lisp.o: lisp.c
gcc -c $<
emacs: emacs.c lisp.o buffer.h
gcc -o $@ emacs.c lisp.o
%fore "white", font "standard"
Becomes Ocaml Value:
&code Rule ("emacs",
[ Rule ("emacs.c", [], []);
Rule ("lisp.o",
[ Rule ("lisp.c", [], []) ],
[ "gcc -c lisp.c" ] );
Rule ("buffer.h", [] []); ],
[ "gcc -o emacs emacs.c lisp.o" ] )
%page
Semantic Model (part 5)
%size 3
Ok, that was a bit of a simplification.
What we're really going to do is construct a function which
constructs the rule tree. The example is more like this:
&code let rec
lisp_o _ =
let targ = "lisp.o" in
let dep_names = List.flatten [ ["lisp.c"] ] in
let deps = List.map resolve dep_names in
let actions = [String.concat " "
(List.flatten [ ["gcc"]; ["-c"];
[List.hd dep_names] ]) ] in
Rule (targ, deps, actions)
and emacs _ =
let targ = "emacs" in
let dep_names = List.flatten [ "emacs.c"; "lisp.o"; "buffer.h"] in
let deps = List.map resolve dep_names in
let actions = [String.concat " "
(List.flatten [ ["gcc"]; ["-o"]; targ;
["emacs.c"]; ["lisp.o"] ]) ] in
Rule (targ, deps, actions)
%page
Semantic Model (summary)
Make sure you have a semantic model worked out
Capture the meaning of your input language
Keep the model simple!
Translation is tricky without one
You won't get much typechecking either
Embed in native datatypes when possible
Lists, Strings, Tuples, Numbers, etc.
Variables
Functions
Make new types and constructors when necessary (or helpful)
Ocaml datatypes are recursive, branching, polymorphic
Very language-like
All user-visible errors will use your type names
Rename types when useful (eg: "file")
%page
%nodefault
%center, size 7, font "standard", fore "white", vgap 20
Act 2
Plagiarism
%page
Quoting (part 1)
So far, we have been talking about consuming code.
Lexing, parsing, modelling semantics.
Next, we will be dealing with producing code.
Computing Abstract Syntax Trees (ASTs).
Our front-end will compute Ocaml ASTs for
%cont, fore "red", font "typewriter"
ocamlc
%cont, fore "white", font "standard"
.
To take advantage of ocamlc's semantic analysis.
Our back-end will compute C ASTs for
%cont, fore "red", font "typewriter"
gcc
%cont, fore "white", font "standard"
.
To take advantage of gcc's code generator and runtime.
ASTs are very tedious to write down.
In fact, even
%cont, fore "lightblue"
small fragments
%cont, fore "white"
of ASTs are tedious to write down.
%page
Quoting (part 2)
For example, the AST for this Ocaml code:
&code List.map (fun x -> x+1) [1; 2; 3]
%size 4, font "standard", fore "white"
Is written (in Ocaml) as this value:
&code ExApp (loc, ExApp (loc, ExAcc (loc,
ExUid (loc, "List"), ExLid (loc, "map")),
ExFun (loc, [(PaLid (loc, "x"), None,
ExApp (loc, ExApp (loc, ExLid (loc, "+"),
ExLid (loc, "x")), ExInt (loc, "1")))])),
ExApp (loc, ExApp (loc, ExUid (loc, "::"), ExInt (loc, "1")),
ExApp (loc, ExApp (loc, ExUid (loc, "::"),
ExInt (loc, "2")), ExApp (loc,
ExApp (loc, ExUid (loc, "::"), ExInt (loc, "3")),
ExUid (loc, "[]")))))
%size 4, font "standard", fore "white"
Which is completely tedious to write down.
%page
Quoting (part 3)
We are are trying to avoid tedious stuff.
Luckily, two groups of people have seen this \
problem before.
Lisp hackers
Think about
%cont, fore "yellow", font "typewriter"
(defmacro ...)
%fore "white", font "standard"
Logicians
Think about proving programs
%size 4
They both invented the same thing: quoting.
%font "typewriter", fore "lightblue"
Quoting (v):
1. Denoting an AST by providing a piece
of text from the language you are trying
to construct.
%page
Quoting (part 4)
The way quoting works is simple and dirty.
You want an AST for some code:
%fore "green", font "typewriter", size 3
List.map (fun x -> x+1) [1; 2; 3]
&norm So you write that code in magic quotes:
&code <:expr<
%cont, fore "green"
List.map (fun x -> x+1) [1; 2; 3]
%cont, fore "yellow"
>>
&norm Then you run it through a pre-processor.
The pre-processor substitutes the AST of the quoted code.
The compiler never sees the magic quotes.
It is as though you wrote the AST there instead.
%page
Quoting (part 5)
In lisp
The magic quotes are
%cont, font "typewriter", fore "yellow"
(quote ... )
%fore "white", font "standard"
also written as
%cont, font "typewriter", fore "yellow"
'...
%fore "white", font "standard"
The pre-processor is built in to the lisp reader
You can only quote lisp
which if fine, for macros
%size 4
In Ocaml
The magic quotes are
%cont, font "typewriter", fore "yellow"
<:foo< ... >>
%fore "white", font "standard"
where foo is an "expander name"
The pre-processor is
%cont, font "typewriter", fore "red"
camlp4
%font "standard", fore "white"
You can quote any language with an expander
You can add expanders
%page
Quoting (part 6)
%size 5, center
The quoted language is called the
%cont, fore "red"
Object Language.
%size 3, fore "white"
(the word "object" has nothing to do with OOP here)
%fore "white", size 5
The quoting language is called the
%cont, fore "red"
Meta Language.
%fore "white"
%size 3
(subtle suggestion: ponder what the
%cont, fore "red"
ML
%cont, fore "white"
stands for in Oca
%cont, fore "red"
ml
%cont, fore "white"
)
%page
Quoting (final exam)
When we quote Ocaml code in Ocaml, like this:
%fore "yellow", font "typewriter", size 3
let my_fn = <:expr<
%cont, fore "green"
List.iter (fun x -> x+1) [1; 2; 3]
%cont, fore "yellow"
>>
%font "standard", fore "white", size 4
It is
%cont, fore "red"
NOT
%cont, fore "white"
equivalent to writing this:
%fore "yellow", font "typewriter", size 3
let my_fn = List.iter (fun x -> x+1) [1; 2; 3]
%font "standard", fore "white", size 4
But rather, it is equivalent to writing this:
%fore "yellow", font "typewriter", size 3
let my_fn = ExApp (loc, ExApp (loc, ExAcc (loc,
ExUid (loc, "List"), ExLid (loc, "iter")),
ExFun (loc, [(PaLid (loc, "x"), None,
ExApp (loc, ExApp (loc, ExLid (loc, "+"),
ExLid (loc, "x")), ExInt (loc, "1")))])),
ExApp (loc, ExApp (loc, ExUid (loc, "::"), ExInt (loc, "1")),
ExApp (loc, ExApp (loc, ExUid (loc, "::"),
ExInt (loc, "2")), ExApp (loc,
ExApp (loc, ExUid (loc, "::"), ExInt (loc, "3")),
ExUid (loc, "[]")))))
%page
Quoting (summary, Q&A)
Quoting makes synthesizing ASTs completely trivial.
Avoid it at your own peril.
Q: What if my Object language has the lexeme
%cont, font "typewriter", fore "yellow"
>>
%cont, font "standard", fore "white"
in it?
A: Escape it:
%cont, font "typewriter", fore "yellow"
\>\>
%font "standard", fore "white"
Q: What if there is a parse error in a quotation?
A: The pre-processor will emit an error.
Q: Doesn't pre-processing distort source co-ordinates?
A: No, the pre-processor preserves source co-ordinates.
Q: Is my code parsed a second time after pre-processing?
A: No, the pre-processor and compiler are tightly integrated.
%page
Quasiquoting (part 1)
Quoting is OK for entering ASTs you know already.
But you don't always know all ASTs in advance.
If you did, you wouldn't need a compiler.
At best, you know parameterized ASTs.
AST templates, in other words, with gaps in them.
Terminology:
A normal quoted AST is called a
%cont, fore "red"
quotation
%cont, fore "white"
.
A quotation with gaps to fill in is called a
%cont, fore "red"
quasiquotation
%cont, fore "white"
.
A gap in a quasiquotation is called an
%cont, fore "red"
antiquotation
%cont, fore "white"
.
%page
Quasiquoting (part 3)
&norm In
%cont, font "typewriter", fore "red"
camlp4
%cont, fore "white", font "standard"
, an antiquotation is written as
%cont, font "typewriter", fore "red"
$ty:var$
&norm It means "put the AST named
%cont, fore "red", font "typewriter"
var
%cont, fore "white", font "standard"
, of type
%cont, fore "red", font "typewriter"
ty
%cont, fore "white", font "standard"
, here".
For example, writing:
&code let ast1 = <:expr<
%cont, fore "green"
(fun y -> y + 1)
%cont, fore "yellow"
>>
let ast2 = <:expr<
%cont, fore "green"
List.map
%cont, fore "red"
$expr:ast1$
%cont, fore "green"
[1; 2; 3]
%cont, fore "yellow"
>>
&norm Is equivalent to writing:
&code let ast2 = <:expr<
%cont, fore "green"
List.map (fun y -> y + 1) [1; 2; 3]
%cont, fore "yellow"
>>
&norm Or in lisp, using backquote:
&code (define ast1 '
%cont, fore "green"
(lambda (y) (+ y 1))
%cont, fore "yellow"
)
(define ast2 `
%cont, fore "green"
(map
%cont, fore "red"
,ast1
%cont, fore "green"
(1 2 3))
%cont, fore "yellow"
)
%page
Quasiquoting (final exam)
Here is a more complex example, with C ASTs.
This is an Ocaml function which builds ASTs that represent
"calling a function on every value of an N-entry array"
&code let iterate n arr fn =
<:cstmt<
%fore "green"
for (i = 0; i <
%cont, fore "red"
$int:n$
%cont, fore "green"
; i++) {
%cont, fore "red"
$ident:fn$
%cont, fore "green"
(
%cont, fore "red"
$expr:arr$
%cont, fore "green"
[i]); }
%fore "yellow"
>>
&norm The pre-processor expands this to:
&code let iterate n arr fn =
FOR
(BINARY
(ASSIGN, VARIABLE "i", CONSTANT (CONST_INT "0")),
BINARY
(LT, VARIABLE "i",
CONSTANT (CONST_INT (string_of_int n))),
UNARY (POSINCR, VARIABLE "i"),
COMPUTATION
(CALL (VARIABLE fn, [INDEX (arr, VARIABLE "i")])))
%page
Quasiquotations (summary)
Quasiquotations permit combination of ASTs, without having to give up
the convenience of quotation.
With
%cont, fore "red", font "typewriter"
camlp4
%cont, fore "white", font "standard"
we can make new types of quasi and anti quotations.
Whenever you are faced with having to build ASTs, start by making sure
you have a good collection of quotations for your object language.
In the remaining section, we are going to complete the compiler.
The front end will produce quotations of Ocaml code.
The back end will produce quotations of C code.
Emacs hates quotations.
%page
%nodefault
%center, size 7, font "standard", fore "white", vgap 20
Act 3
Regurgitation
%page
Front end (part 4)
Recall, we left a number of unfinished
%cont, font "typewriter", fore "yellow"
...
%cont, font "standard", fore "white"
things in our parser.
&code word:
[ [ w = WORD -> ...
| "$"; "^" -> ...
| "$"; "<" -> ...
| "$"; "@" -> ...
| "$"; "("; w = WORD; ")" -> ... ]
];
&norm We now fill these in with Ocaml quasiquotations.
They come from our Ocaml model of Makefile semantics.
Aren't you glad we worked that out in advance?
&code word:
[ [ w = WORD -> <:expr<
%cont, fore "green"
[
%cont, fore "red"
$str:w$
%cont, fore "green"
]
%cont, fore "yellow"
>>
| "$"; "^" -> <:expr<
%cont, fore "green"
dep_names
%cont, fore "yellow"
>>
| "$"; "<" -> <:expr<
%cont, fore "green"
[List.hd dep_names]
%cont, fore "yellow"
>>
| "$"; "@" -> <:expr<
%cont, fore "green"
[targ]
%cont, fore "yellow"
>>
| "$"; "("; w = WORD; ")" -> let w2 = tidy w in
<:expr<
%cont, fore "green"
(
%cont, fore "red"
$lid:w2$
%cont, fore "green"
())
%cont, fore "yellow"
>> ]
];
%page
Front end (part 5)
Now we fill in the quasiquotations for the
%cont, font "typewriter", fore "yellow"
makefile_item
%cont, font "standard", fore "white"
parser:
%size 3, font "typewriter", fore "yellow"
makefile_item:
[ [ EOL -> None
| s = WORD; "="; ws = words; EOL ->
Some (s, <:expr<
%cont, fore "red"
$ws$
%cont, fore "yellow"
>>, ASSIGN)
| t = WORD; ":"; ws = words; EOL; a = actions ->
let ex = <:expr<
%fore "green"
let targ =
%cont, fore "red"
$str:t$
%cont, fore "green"
in
let dep_names =
%cont, fore "red"
$ws$
%cont, fore "green"
in
let deps = List.map resolve dep_names in
let actions =
%cont, fore "red"
$a$
%cont, fore "green"
in
Mk.Rule (targ, deps, actions)
%cont, fore "yellow"
>>
in Some (t, ex, RULE) ] ];
actions: [ [ az = LIST0 action -> (mk_list az loc) ] ];
action: [ [ TAB; ws = words; EOL ->
<:expr<
%cont, fore "green"
String.concat " "
%cont, fore "red"
$ws$
%cont, fore "yellow"
>> ] ];
words: [ [ ws = LIST0 word ->
(let ls = mk_list ws loc in
<:expr<
%cont, fore "green"
List.flatten
%cont, fore "red"
$ls$
%cont, fore "yellow"
>>) ] ];
%page
Front end (part 6)
Now we have a parser which constructs an AST for an Ocaml function
corresponding to an item in a Makefile (an assignment or a rule).
We are going to replace the core Ocaml grammar with a grammar
that reads a sequence of Makefile items, builds Ocaml functions,
and emits an AST containing the sequence of functions and a single
call into our backend with the result of those functions.
We are then going to plug this grammar into
%cont, font "typewriter", fore "red"
camlp4
%cont, font "standard", fore "white"
and
%cont, font "typewriter", fore "red"
ocamlc
%cont, font "standard", fore "white"
,
and we'll be done the front end.
The AST node
%cont, font "typewriter", fore "red"
Pcaml.implem
%cont, font "standard", fore "white"
is the root of the Ocaml grammar.
It represents a single compilation unit in Ocaml terminology.
It is what we need to make, to feed to
%cont, font "typewriter", fore "red"
ocamlc
%cont, font "standard", fore "white"
.
%page
Front end (finale)
Here is the code to build the
%cont, font "typewriter", fore "red"
Pcaml.implem
%cont, font "standard", fore "white"
AST node.
This part is very boring. It is about 20 more lines, and is very technical.
You aren't expected to read this now, just see that it is rather short.
%font "typewriter", size 2, fore "yellow"
Pcaml.implem: [ [ res = makefile -> ([(res, loc)], False) ] ];
makefile: [ [ maybe_items = LIST0 makefile_item; EOF ->
let items = some maybe_items in
let rule_p (_, _, x) = match x with RULE -> true | ASSIGN -> false in
let (rules, assigns) = partition rule_p items in
let dead_rule = (<:patt< x >>, None, <:expr< Mk.Rule (x, [], []) >>) in
let live_rules = map (fun (n,_,_) -> let n1 = tidy n in
(<:patt< $str:n$ >> , None, <:expr< $lid:n1$ () >>)) rules
in
let resz = live_rules @ [dead_rule] in
let resolver = (<:patt< resolve >>, <:expr< fun [ $list:resz$ ] >>) in
let bind (n, e, _) = let pwel = [<:patt< _ >>, None, e] in
let n1 = tidy n in <:patt< $lid:n1$ >> , <:expr< fun [$list:pwel$] >>
in
let (r_binds, a_binds) = (map bind rules, map bind assigns) in
let funs = resolver :: (a_binds @ r_binds) in
let (top,_,_) = hd rules in
let tree = <:expr< let rec $list:funs$ in $lid:top$ () >> in
let entry = [( <:patt< _ >>, <:expr< Be.trans $tree$ >> )] in
let recur = false in
let items = [ <:str_item< value $rec:recur$ $list:entry$ >> ] in
<:str_item< declare $list:items$ end >> ] ];
%page
Back end (part 1)
Recall that the compiler we are building is a 2-stage system.
Front End
Lex and parse Makefile
Syntax checking happens here
Synthesize Ocaml program
Compile Ocaml program with
%cont, font "typewriter", fore "red"
ocamlc
%font "standard", fore "white"
Static semantic checking happens here
Back End
Run compiled Ocaml program
Capture synthesized C program
Compile C program with
%cont, font "typewriter", fore "red"
gcc
%font "standard", fore "white", size 4
We have completed the guts of the front end.
The back end really only requires us to write one function.
It is the function that calculates a C program from an Ocaml value.
%page
Back end (part 2)
Recall also that our Ocaml program will produce a single value
of the following
%cont, font "typewriter", fore "yellow"
rule
%cont, font "standard", fore "white"
type:
&code type file = string
type actions = string list
type rule = Rule of (file * rule list * actions)
&norm For example:
&code Rule ("emacs",
[ Rule ("emacs.c", [], []);
Rule ("lisp.o",
[ Rule ("lisp.c", [], []) ],
[ "gcc -c lisp.c" ] );
Rule ("buffer.h", [] []); ],
[ "gcc -o emacs emacs.c lisp.o" ] )
&norm Which we must translate into a C program
%page
Back end (part 3)
The C program will do "the work" of a Makefile.
The recursive algorithm (vaguely) is:
&code int check_node_foo_c (time_t parent) {
time_t mtime = 0;
int rebuild = 0;
struct stat target;
if (stat ("foo.c", &target) == -1)
rebuild = 1;
else
mtime = statr.st_mtime;
rebuild = check_first_dep (mtime) || rebuild;
// ...
if (rebuild) {
write (1, "gcc -c foo.c\n", 13);
system ("gcc -c foo.c");
return 1
}
else
return mtime > parent;
}
&norm With some extra error handling thrown in.
%page
Back end (part 4)
Now that we've worked out the structure of the C program
we're going to generate, we just write it down as a bunch
of factored quasiquotations. Here is an example quote
from the back end:
&code let fstmt = <:cstmt<
%fore "green"
if (stat (
%cont, fore "red"
$str:file$
%cont, fore "green"
, &target) == -1) {
rebuild = 1;
} else {
mtime = target.st_mtime;
}
%fore "red"
$stmt:satisfy$
%fore "yellow"
>>
&norm Look familliar?
The back end is about 70 lines of code. It just produces
C quotes isomorphic to the
%cont, font "typewriter", fore "yellow"
rule
%cont, font "standard", fore "white"
tree constructed by the
front end.
%page
Back end (summary)
The back end is probably the easiest part to understand,
since we've already developed a keen understanding of
quasiquotation.
Produces a C function for each rule
Trades space for speed
Not that much space either: ~ 30 insns / rule at -O2
Pre-calculates a lot
Only needs
%cont, fore "yellow", font "typewriter"
stat()
%cont, fore "white", font "standard"
,
%cont, fore "yellow", font "typewriter"
system()
%cont, fore "white", font "standard"
, and
%cont, fore "yellow", font "typewriter"
write()
%fore "white", font "standard"
Many other tradeoffs possible
Can perform domain-specific optimizations
Topological sort done at compile time
Missing build rules directly fail when called
Dead build rules not compiled into executable
Common dependency DAG nodes refer to same function
All messages pre-formatted
All string lengths known, for
%cont, fore "yellow", font "typewriter"
write()
%page
Grand Finale
All that remains is to write a driver.
We use the "shell" module of Ocaml, since it looks a lot like
writing shell scripts with flexible I/O and exception handling.
It does the following:
Run
%cont, fore "red", font "typewriter"
ocamlc
%cont, font "standard", fore "white"
on input, with with custom
%cont, fore "red", font "typewriter"
camlp4
%cont, font "standard", fore "white"
pre-processor
Chmod output file
Run output file, capture output to C file
Run
%cont, fore "red", font "typewriter"
gcc
%cont, font "standard", fore "white"
on C file
Chmod output file
Delete temporaries
Some command-line option processing and exception handling,
make the driver about as large as the back end.
| graydon hoare |
Powered by ViewCVS 0.9.2 |