(file) Return to ols-lang-eng.mgp CVS log (file) (dir) Up to [venge] / src / mkc

File: [venge] / src / mkc / ols-lang-eng.mgp (download)
Revision: 1.2, Mon Apr 29 05:37:16 2002 UTC (8 years, 4 months ago) by graydon
Branch: MAIN
Changes since 1.1: +217 -21 lines
too tired to compose useful message

%include "default.mgp"
%page
%nodefault
%center, size 7, font "standard", fore "white", vgap 20



"One-Day Compilers"

%size 4
or

%size 7
How I learned to stop worrying
and love static metaprogramming

%font "typewriter", size 4
graydon@redhat.com


%page

Greetings

This is a talk about rapidly constructing compilers for
%fore "red"
domain-specific languages
%cont, fore "white"
 (DSLs). It is a little dense. 
To get through it, I am going to start with a number 
of facts which I will simply state and assume as true.

	Domain-specific languages are good
		"nearly ideal" software system abstractions
			better than commands, APIs, object frameworks
		very hard to identify and refine
			even harder to implement well

	DSL implementations are typically
		a lot of effort
		poor quality
			slow
			unsafe
			incorrect	
%page

Evidence

Considering the number of DSLs which have been developed,
the number which have succeeded widely is disappointing. 

	Make
	Lex & Yacc
	TeX & Postscript
	Octave
	RPM
	Magicpoint
	CGEN & Sid

%size 4, fore "lightblue"
	uh.. LD scripts?

%pause, center, fore "yellow"
It is especially bad if you consider 
their minimal feature sets, and 
how much effort they took to make.

%page

Goals

I am going to show you how to make a DSL 
%cont, fore "red"
compiler
%cont, fore "white"
 in a single
sitting. It will be capable of:

	consuming the DSL's concrete syntax
		lexing and parsing
			(you should already know how to do this)

	constructing a semantic model
		with all the bells and whistles
			variables, functions, etc.
			types and type inference
		and strong safety properties
			static compile-time checks
			file:line:offset: error messages			

	emitting native code
		which might actually be fast

%page

Example

I am going to make a compiler for a small subset of the Make(1)
language. It accepts simple Makefiles like this:

%font "typewriter", fore "yellow"

 # my awesome example

 pooka=fiddle faddle
 zug=bleh $(pooka)

 main: foo $(zug)
         gcc -o $@ $<

 foo: bar.o baz.o bleh.o $(zug)
         gcc -o foo bar.o baz.o $(pooka) bleh.o

%page

Tools

	I am going to use some weird tools.
		They are not all from the FSF.
		There is nothing magic about them.

	Objective Caml (
%cont, fore "red"
ocaml
%cont, fore "white"
)
		ML dialect
			Functional / Imperative / OO language
		From INRIA
			LGPL runtime / QPL compiler

	Camlp4
		Ocaml pre-processor-pretty-printer
			"like 
%cont, font "typewriter"
(defmacro ...)
%cont, font "standard"
 on steroids"

	Cquot
		C code quasiquoting for camlp4
			written by me

%page

How it's going to work

	Obviously we cannot write an entire compiler in one sitting
		But we can steal parts of other compilers.

	The two compilers we've got to work with are 
%cont, fore "red", font "typewriter"
ocamlc
%cont, fore "white", font "standard"
 and 
%cont, fore "red", font "typewriter"
gcc
%cont, fore "white", font "standard"
.
		We need to decide which to use, or whether to use both.

	The decision depends on which runtime we'd prefer to use.
		Ocaml runtime is 
			large
			complex
			relatively obscure
		C runtime is
			tiny
			simple
			universally available

	I expect most people in this audience would prefer the C runtime.
%page

Getting to the C runtime

The Ocaml compiler and runtime is not for everyone.

Ocaml is really good at producing C code though.

So we can make a translator from Makefiles to Ocaml programs
which generate C code, then 
%cont, fore "red"
run 
%cont, fore "white"
those programs to get our C code.

The end-user, in fact, doesn't even need to know about ocaml.

You can just give them the C code.

This is what we're going to do with Makefiles.

%page

The Big Picture (textual)

We are going to stitch together a "virtual compiler"

	Makefile is translated to Ocaml program
		Lexed and Parsed
		Ocaml AST produced

	Ocaml program is fed into 
%cont, fore "red", font "typewriter"
ocamlc
%fore "white", font "standard"
		Typechecked and compiled

	Compiled Ocaml program is executed
		Produces a C program

	C program is fed into 
%cont, fore "red", font "typewriter"
gcc
%fore "white", font "standard"
		Resulting object file depends on nearly nothing

%page

The Big Picture (graphical)

%center, image "vcompiler.png"


%pause, center, fore "yellow"
Isn't that incredibly stupid?
%pause, center, fore "yellow"
No, it's the Unix Philosophy in action
%pause, center, fore "yellow"
(Remember, we are on a budget here)

%page
%nodefault
%center, size 7, font "standard", fore "white", vgap 20




Act 1
Consumption

%page

Front End (part 1)

%size 3
Some easy stuff to warm up.
Makefiles have a pretty simple lexical structure.
Our lexer talks to
%cont, fore "red", font "typewriter"
 camlp4 
%cont, fore "white", font "standard"
so we need to return ("CLASS","lexeme") pairs.
This is the entire lexer. It is written in 
%cont, fore "red", font "typewriter"
 ocamllex 
%cont, fore "white", font "standard"
language.

&code 
let special = ['$' '@' '<' '^' '(' ')' '=' ':' '%']
let nq = ("\\\""|[^'"'])
let sym = ['-' '_' '.' '/' 'a'-'z''A'-'Z''0'-'9']*

rule token = parse
    ' '*                     { token lexbuf }
  | ('#' [^'\n']*)? ('\n'+)  { ("EOL", "") }
  | '\t'                     { ("TAB", "") }
  | eof                      { ("EOF", "") }
  | (sym | '"' nq* '"')      { ("WORD", lexeme lexbuf) }
  | special                  { ("", lexeme lexbuf)}
  | _                        { ("", lexeme lexbuf)}

%page

Front End (part 2)

%size 3
More easy stuff.
Makefiles have a pretty simple syntactic structure.
Our parser is recursive-descent, extended LL(1).
It is written in the 
%cont, fore "red", font "typewriter"
 pa_extend 
%cont, fore "white", font "standard"
extension language of
%cont, fore "red", font "typewriter"
 camlp4 
%fore "white", font "standard"
This language is designed for extending the Ocaml grammar.
But we are just going to replace the grammar altogether.

&code EXTEND
makefile_item:
         [ [ EOL -> None
           | s = WORD; "="; ws = words; EOL -> ... 
           | t = WORD; ":"; ws = words; EOL; a = actions -> ... ] ]; 
actions: [ [ az = LIST0 action -> ... ]  ];
action:  [ [ TAB; ws = words; EOL -> ... ] ];
words:   [ [ ws = LIST0 word -> ... ] ];
word:    [ [ w = WORD -> ...
           | "$"; "^" -> ...
           | "$"; "<" -> ...
           | "$"; "@" -> ...
           | "$"; "("; w = WORD; ")" -> ... ] ];
END

%page

Front end (part 3)

Let's ignore the 
%cont, font "typewriter", fore "yellow", size 3
... 
%cont, font "standard", fore "white", size 4
sections on the previous slide for now.
All we've got so far is a lexer and parser.
The parser builds "something" that represents the Makefile.

Now we have to decide 
	What to build
	How to build it
	What to do with the thing we build
%page

Semantic Model (part 1)

Our next step will be to build a semantic model 
of the contents of a Makefile.

	A collection of variables
		Each var, a list of strings
%font "typewriter",  fore "yellow"
			doggies = dingo.o poodle.o
%font "standard", fore "white"
		Or variable references
%font "typewriter",  fore "yellow"
			animals = llama.o $(doggies) loris.o 
%font "standard", fore "white"
		Referenced varialbes flatten
%font "typewriter",  fore "yellow"
			        = llama.o dingo.o poodle.o loris.o
%font "standard", fore "white"

	A dependency tree
		Each node has a target and some dependencies
%font "typewriter",  fore "yellow"
			thezoo: $(animals) $(food) tourism.h
%font "standard", fore "white"		
		And an action to remake it
%font "typewriter",  fore "yellow"
			gcc -o $@ $(animals) $(food)
%font "standard", fore "white"					
		An action is a list of shell commands
			which are just nested string lists, as with variables

%page

Semantic Model (part 2)

Assume for the moment that we have a technique for
synthesizing any Ocaml program we like, given a
Makefile as input. 

We will get to that technique later.

For now, we're going to focus on building a semantic
model of a Makefile by embedding it in an Ocaml program.

By performing this embedding, we will get a lot of 
semantic analysis from the Ocaml compiler for free 	
	Variable binding
	Rich native datatypes (lists, strings, etc.)
	Functions
	Lexical environments
	Type inference and type checking

%page

Semantic Model (part 3)

	We will embed the variable model directly in Ocaml semantics
		Makefile string
			Ocaml list with single string
		Makefile variable
			Ocaml function returning flattened list
		Makefile variable reference
			Ocaml function call

%fore "lightblue", size 2.5
There is an obscure type-inference reason why functions are used here instead of variables. 
When in doubt, try it with functions.
%left, fore "white", size 3

Example Makefile Fragment:
&code doggies = dingo.o poodle.o
animals = llama.o $(doggies) loris.o
...
%fore "white", font "standard"

Becomes Ocaml Functions:
&code let rec
     doggies _ = List.flatten [["dingo.o"]; ["poodle.o"]]
 and animals _ = List.flatten [["llama.o"]; doggies (); ["loris.o"]]
...



%page

Semantic Model (part 4)

%size 3
For the dependency/action tree, we will define 
an auxiliary Ocaml recursive datatype:

%fore "yellow", font "typewriter"
type file = string
type actions = string list
type rule = Rule of (file * rule list * actions)

%fore "white", font "standard"
Example Makefile Fragment:

&code lisp.o: lisp.c
       gcc -c $<
emacs: emacs.c lisp.o buffer.h
       gcc -o $@ emacs.c lisp.o
%fore "white", font "standard"

Becomes Ocaml Value:

&code Rule ("emacs", 
  [ Rule ("emacs.c", [], []);
    Rule ("lisp.o", 
      [ Rule ("lisp.c", [], []) ],
      [ "gcc -c lisp.c" ] );
    Rule ("buffer.h", [] []); ],
  [ "gcc -o emacs emacs.c lisp.o" ] )

%page

Semantic Model (part 5)

%size 3
Ok, that was a bit of a simplification.

What we're really going to do is construct a function which
constructs the rule tree. The example is more like this:

&code let rec
    lisp_o _ =
      let targ = "lisp.o" in
      let dep_names = List.flatten [ ["lisp.c"] ] in
      let deps = List.map resolve dep_names in
      let actions = [String.concat " "
                       (List.flatten [ ["gcc"]; ["-c"]; 
                                       [List.hd dep_names] ]) ] in
      Rule (targ, deps, actions)
and emacs _ =
      let targ = "emacs" in
      let dep_names = List.flatten [ "emacs.c"; "lisp.o"; "buffer.h"] in
      let deps = List.map resolve dep_names in
      let actions = [String.concat " "
                       (List.flatten [ ["gcc"]; ["-o"]; targ; 
                                       ["emacs.c"]; ["lisp.o"] ]) ] in
      Rule (targ, deps, actions)

%page

Semantic Model (summary)


	Make sure you have a semantic model worked out
		Capture the meaning of your input language
			Keep the model simple!
		Translation is tricky without one
			You won't get much typechecking either

	Embed in native datatypes when possible
		Lists, Strings, Tuples, Numbers, etc.
		Variables
		Functions

	Make new types and constructors when necessary (or helpful)
		Ocaml datatypes are recursive, branching, polymorphic
			Very language-like
		All user-visible errors will use your type names
			Rename types when useful (eg: "file")


%page
%nodefault
%center, size 7, font "standard", fore "white", vgap 20




Act 2
Plagiarism

%page

Quoting (part 1)

	So far, we have been talking about consuming code.
		Lexing, parsing, modelling semantics.

	Next, we will be dealing with producing code.
		Computing Abstract Syntax Trees (ASTs).

	Our front-end will compute Ocaml ASTs for 
%cont, fore "red", font "typewriter"
ocamlc
%cont, fore "white", font "standard"
.
		To take advantage of ocamlc's semantic analysis.

	Our back-end will compute C ASTs for 
%cont, fore "red", font "typewriter"
gcc
%cont, fore "white", font "standard"
.
		To take advantage of gcc's code generator and runtime.

	ASTs are very tedious to write down.
		In fact, even 
%cont, fore "lightblue"
small fragments 
%cont, fore "white"
of ASTs are tedious to write down.

%page

Quoting (part 2)

For example, the AST for this Ocaml code:

&code List.map (fun x -> x+1) [1; 2; 3] 

%size 4, font "standard", fore "white"
Is written (in Ocaml) as this value:

&code  ExApp (loc, ExApp (loc, ExAcc (loc, 
   ExUid (loc, "List"), ExLid (loc, "map")),
     ExFun (loc, [(PaLid (loc, "x"), None, 
       ExApp (loc, ExApp (loc, ExLid (loc, "+"),
         ExLid (loc, "x")), ExInt (loc, "1")))])),
   ExApp (loc, ExApp (loc, ExUid (loc, "::"), ExInt (loc, "1")),
     ExApp (loc, ExApp (loc, ExUid (loc, "::"),
       ExInt (loc, "2")), ExApp (loc,
         ExApp (loc, ExUid (loc, "::"), ExInt (loc, "3")),
         ExUid (loc, "[]")))))

%size 4, font "standard", fore "white"
Which is completely tedious to write down.

%page

Quoting (part 3)

We are are trying to avoid tedious stuff.

Luckily, two groups of people have seen this \
problem before.
	Lisp hackers 
		Think about
%cont, fore "yellow", font "typewriter"
(defmacro ...) 
%fore "white", font "standard"
	Logicians
		Think about proving programs

%size 4
They both invented the same thing: quoting.

%font "typewriter", fore "lightblue"
Quoting (v): 

1. Denoting an AST by providing a piece 
   of text from the language you are trying 
   to construct.

%page

Quoting (part 4)

The way quoting works is simple and dirty.

You want an AST for some code:

%fore "green", font "typewriter", size 3
List.map (fun x -> x+1) [1; 2; 3] 

&norm So you write that code in magic quotes:

&code <:expr< 
%cont, fore "green" 
List.map (fun x -> x+1) [1; 2; 3] 
%cont, fore "yellow"
>>

&norm Then you run it through a pre-processor.
The pre-processor substitutes the AST of the quoted code.
The compiler never sees the magic quotes.
It is as though you wrote the AST there instead.

%page

Quoting (part 5)

In lisp
	The magic quotes are 
%cont, font "typewriter", fore "yellow"
(quote ... )
%fore "white", font "standard"
		also written as 
%cont, font "typewriter", fore "yellow"
'...
%fore "white", font "standard"
	The pre-processor is built in to the lisp reader
	You can only quote lisp
		which if fine, for macros

%size 4
In Ocaml
	The magic quotes are 
%cont, font "typewriter", fore "yellow"
<:foo< ... >>
%fore "white", font "standard"
		where foo is an "expander name"
	The pre-processor is 
%cont, font "typewriter", fore "red"
camlp4
%font "standard", fore "white"
	You can quote any language with an expander
		You can add expanders


%page

Quoting (part 6)




%size 5, center
The quoted language is called the 
%cont, fore "red"
Object Language.
%size 3, fore "white"
(the word "object" has nothing to do with OOP here)



%fore "white", size 5
The quoting language is called the 
%cont, fore "red"
Meta Language.
%fore "white"
%size 3
(subtle suggestion: ponder what the 
%cont, fore "red"
ML 
%cont, fore "white"
stands for in Oca
%cont, fore "red"
ml
%cont, fore "white"
)

%page

Quoting (final exam)

When we quote Ocaml code in Ocaml, like this:

%fore "yellow", font "typewriter", size 3
let my_fn = <:expr<  
%cont, fore "green"
List.iter (fun x -> x+1) [1; 2; 3] 
%cont, fore "yellow"
>>

%font "standard", fore "white", size 4
It is 
%cont, fore "red"
NOT 
%cont, fore "white"
equivalent to writing this:

%fore "yellow", font "typewriter", size 3
let my_fn = List.iter (fun x -> x+1) [1; 2; 3] 

%font "standard", fore "white", size 4
But rather, it is equivalent to writing this:

%fore "yellow", font "typewriter", size 3
let my_fn = ExApp (loc, ExApp (loc, ExAcc (loc, 
   ExUid (loc, "List"), ExLid (loc, "iter")),
     ExFun (loc, [(PaLid (loc, "x"), None, 
       ExApp (loc, ExApp (loc, ExLid (loc, "+"),
         ExLid (loc, "x")), ExInt (loc, "1")))])),
   ExApp (loc, ExApp (loc, ExUid (loc, "::"), ExInt (loc, "1")),
     ExApp (loc, ExApp (loc, ExUid (loc, "::"),
       ExInt (loc, "2")), ExApp (loc,
         ExApp (loc, ExUid (loc, "::"), ExInt (loc, "3")),
         ExUid (loc, "[]")))))

%page

Quoting (summary, Q&A)

Quoting makes synthesizing ASTs completely trivial.
Avoid it at your own peril.

Q: What if my Object language has the lexeme 
%cont, font "typewriter", fore "yellow"
>> 
%cont, font "standard", fore "white"
in it?
A: Escape it: 
%cont, font "typewriter", fore "yellow"
\>\> 

%font "standard", fore "white"
Q: What if there is a parse error in a quotation?
A: The pre-processor will emit an error.

Q: Doesn't pre-processing distort source co-ordinates?
A: No, the pre-processor preserves source co-ordinates.

Q: Is my code parsed a second time after pre-processing?
A: No, the pre-processor and compiler are tightly integrated.

%page

Quasiquoting (part 1)

Quoting is OK for entering ASTs you know already.
But you don't always know all ASTs in advance.
If you did, you wouldn't need a compiler.

At best, you know parameterized ASTs.
AST templates, in other words, with gaps in them.


Terminology:

A normal quoted AST is called a 
%cont, fore "red"
quotation
%cont, fore "white"
.
A quotation with gaps to fill in is called a 
%cont, fore "red"
quasiquotation
%cont, fore "white"
.
A gap in a quasiquotation is called an 
%cont, fore "red"
antiquotation
%cont, fore "white"
.

%page

Quasiquoting (part 3)

&norm In 
%cont, font "typewriter", fore "red"
camlp4
%cont, fore "white", font "standard"
, an antiquotation is written as 
%cont, font "typewriter", fore "red"
$ty:var$

&norm It means "put the AST named 
%cont, fore "red", font "typewriter"
var
%cont, fore "white", font "standard"
, of type 
%cont, fore "red", font "typewriter"
ty
%cont, fore "white", font "standard"
, here".

For example, writing:

&code let ast1 = <:expr< 
%cont, fore "green"
(fun y -> y + 1)
%cont, fore "yellow"
 >>
let ast2 = <:expr< 
%cont, fore "green"
List.map 
%cont, fore "red"
$expr:ast1$ 
%cont, fore "green"
[1; 2; 3] 
%cont, fore "yellow"
>>

&norm Is equivalent to writing:

&code let ast2 = <:expr< 
%cont, fore "green"
List.map (fun y -> y + 1) [1; 2; 3] 
%cont, fore "yellow"
>>

&norm Or in lisp, using backquote:

&code (define ast1 '
%cont, fore "green"
(lambda (y) (+ y 1))
%cont, fore "yellow"
)
(define ast2 `
%cont, fore "green"
(map 
%cont, fore "red"
,ast1 
%cont, fore "green"
(1 2 3))
%cont, fore "yellow"
) 

%page

Quasiquoting (final exam)

Here is a more complex example, with C ASTs.
This is an Ocaml function which builds ASTs that represent
"calling a function on every value of an N-entry array"

&code let iterate n arr fn = 
<:cstmt< 
%fore "green"
  for (i = 0; i < 
%cont, fore "red"
$int:n$
%cont, fore "green"
; i++) { 
%cont, fore "red"
$ident:fn$
%cont, fore "green"
(
%cont, fore "red"
$expr:arr$
%cont, fore "green"
[i]); }
%fore "yellow"
>>

&norm The pre-processor expands this to:

&code let iterate n arr fn =
  FOR
    (BINARY
       (ASSIGN, VARIABLE "i", CONSTANT (CONST_INT "0")),
     BINARY
       (LT, VARIABLE "i",
        CONSTANT (CONST_INT (string_of_int n))),
     UNARY (POSINCR, VARIABLE "i"),
     COMPUTATION
       (CALL (VARIABLE fn, [INDEX (arr, VARIABLE "i")])))

%page

Quasiquotations (summary)

Quasiquotations permit combination of ASTs, without having to give up
the convenience of quotation.

With 
%cont, fore "red", font "typewriter"
camlp4
%cont, fore "white", font "standard"
 we can make new types of quasi and anti quotations.

Whenever you are faced with having to build ASTs, start by making sure
you have a good collection of quotations for your object language.

In the remaining section, we are going to complete the compiler.
The front end will produce quotations of Ocaml code.
The back end will produce quotations of C code.

Emacs hates quotations.

%page
%nodefault
%center, size 7, font "standard", fore "white", vgap 20




Act 3
Regurgitation

%page

Front end (part 4)

Recall, we left a number of unfinished  
%cont, font "typewriter", fore "yellow"
...
%cont, font "standard", fore "white"
 things in our parser.

&code word:
  [ [ w = WORD -> ...
    | "$"; "^" -> ...
    | "$"; "<" -> ...
    | "$"; "@" -> ...
    | "$"; "("; w = WORD; ")" -> ... ]
  ];

&norm We now fill these in with Ocaml quasiquotations.
They come from our Ocaml model of Makefile semantics.
Aren't you glad we worked that out in advance?

&code word:
  [ [ w = WORD -> <:expr< 
%cont, fore "green"
[
%cont, fore "red"
$str:w$
%cont, fore "green"
] 
%cont, fore "yellow"
>> 
    | "$"; "^" -> <:expr< 
%cont, fore "green"
dep_names 
%cont, fore "yellow"
>>
    | "$"; "<" -> <:expr< 
%cont, fore "green"
[List.hd dep_names] 
%cont, fore "yellow"
>>
    | "$"; "@" -> <:expr< 
%cont, fore "green"
[targ] 
%cont, fore "yellow"
>>
    | "$"; "("; w = WORD; ")" -> let w2 = tidy w in 
                                 <:expr< 
%cont, fore "green"
(
%cont, fore "red"
$lid:w2$ 
%cont, fore "green"
()) 
%cont, fore "yellow"
>> ]
  ];

%page

Front end (part 5)

Now we fill in the quasiquotations for the 
%cont, font "typewriter", fore "yellow"
makefile_item 
%cont, font "standard", fore "white"
parser:

%size 3, font "typewriter", fore "yellow"
makefile_item:
  [ [ EOL -> None
    | s = WORD; "="; ws = words; EOL -> 
          Some (s, <:expr< 
%cont, fore "red"
$ws$ 
%cont, fore "yellow"
>>, ASSIGN)
    | t = WORD; ":"; ws = words; EOL; a = actions -> 
          let ex = <:expr<
%fore "green" 
                 let targ = 
%cont, fore "red"
$str:t$ 
%cont, fore "green"
in
                 let dep_names = 
%cont, fore "red"
$ws$ 
%cont, fore "green"
in
                 let deps = List.map resolve dep_names in
                 let actions = 
%cont, fore "red"
$a$ 
%cont, fore "green"
in
                   Mk.Rule (targ, deps, actions) 
%cont, fore "yellow"
>>
        in Some (t, ex, RULE) ] ];
 
actions: [ [ az = LIST0 action -> (mk_list az loc) ]  ];
action:  [ [ TAB; ws = words; EOL -> 
             <:expr< 
%cont, fore "green"
String.concat " " 
%cont, fore "red"
$ws$ 
%cont, fore "yellow"
>> ] ];
words:   [ [ ws = LIST0 word -> 
             (let ls = mk_list ws loc in 
              <:expr< 
%cont, fore "green"
List.flatten 
%cont, fore "red"
$ls$ 
%cont, fore "yellow"
>>) ] ];


%page

Front end (part 6)

Now we have a parser which constructs an AST for an Ocaml function
corresponding to an item in a Makefile (an assignment or a rule).

We are going to replace the core Ocaml grammar with a grammar
that reads a sequence of Makefile items, builds Ocaml functions,
and emits an AST containing the sequence of functions and a single
call into our backend with the result of those functions.

We are then going to plug this grammar into 
%cont, font "typewriter", fore "red"
camlp4
%cont, font "standard", fore "white"
 and 
%cont, font "typewriter", fore "red"
ocamlc
%cont, font "standard", fore "white"
,
and we'll be done the front end.

The AST node 
%cont, font "typewriter", fore "red"
Pcaml.implem
%cont, font "standard", fore "white"
 is the root of the Ocaml grammar.
It represents a single compilation unit in Ocaml terminology.
It is what we need to make, to feed to  
%cont, font "typewriter", fore "red"
ocamlc
%cont, font "standard", fore "white"
.

%page

Front end (finale)

Here is the code to build the 
%cont, font "typewriter", fore "red"
Pcaml.implem
%cont, font "standard", fore "white"
 AST node.
This part is very boring. It is about 20 more lines, and is very technical.
You aren't expected to read this now, just see that it is rather short.

%font "typewriter", size 2, fore "yellow"
Pcaml.implem: [ [ res = makefile -> ([(res, loc)], False) ] ];

makefile: [ [ maybe_items = LIST0 makefile_item; EOF -> 
        let items = some maybe_items in
        let rule_p (_, _, x) = match x with RULE -> true | ASSIGN -> false in
        let (rules, assigns) = partition rule_p items in
        let dead_rule = (<:patt< x >>, None, <:expr< Mk.Rule (x, [], []) >>) in
        let live_rules = map (fun (n,_,_) -> let n1 = tidy n in 
            (<:patt< $str:n$ >> , None, <:expr< $lid:n1$ () >>)) rules 
        in
        let resz = live_rules @ [dead_rule] in
        let resolver = (<:patt< resolve >>, <:expr< fun [ $list:resz$ ] >>) in
        let bind (n, e, _) = let pwel = [<:patt< _ >>, None, e] in
            let n1 = tidy n in <:patt<  $lid:n1$ >> , <:expr< fun [$list:pwel$] >> 
        in
        let (r_binds, a_binds)  = (map bind rules, map bind assigns) in
        let funs = resolver :: (a_binds @ r_binds) in
        let (top,_,_) = hd rules in
        let tree = <:expr< let rec $list:funs$ in $lid:top$ () >> in
        let entry = [( <:patt< _ >>, <:expr< Be.trans $tree$ >> )] in
        let recur = false in
        let items = [ <:str_item< value $rec:recur$ $list:entry$ >> ] in
          <:str_item< declare $list:items$ end >> ] ];

%page

Back end (part 1)

Recall that the compiler we are building is a 2-stage system.

	Front End
		Lex and parse Makefile
			Syntax checking happens here
		Synthesize Ocaml program
		Compile Ocaml program with 
%cont, font "typewriter", fore "red"
ocamlc
%font "standard", fore "white"
			Static semantic checking happens here 

	Back End
		Run compiled Ocaml program
		Capture synthesized C program
		Compile C program with 
%cont, font "typewriter", fore "red"
gcc
%font "standard", fore "white", size 4

We have completed the guts of the front end.
The back end really only requires us to write one function.
It is the function that calculates a C program from an Ocaml value.

%page

Back end (part 2)

Recall also that our Ocaml program will produce a single value
of the following 
%cont, font "typewriter", fore "yellow"
rule
%cont, font "standard", fore "white"
 type:

&code type file = string
type actions = string list
type rule = Rule of (file * rule list * actions)

&norm For example:

&code Rule ("emacs", 
  [ Rule ("emacs.c", [], []);
    Rule ("lisp.o", 
      [ Rule ("lisp.c", [], []) ],
      [ "gcc -c lisp.c" ] );
    Rule ("buffer.h", [] []); ],
  [ "gcc -o emacs emacs.c lisp.o" ] )

&norm Which we must translate into a C program

%page

Back end (part 3)

The C program will do "the work" of a Makefile.
The recursive algorithm (vaguely) is:

&code int check_node_foo_c (time_t parent) {
        time_t mtime = 0;
        int rebuild = 0;
        struct stat target;
        if (stat ("foo.c", &target) == -1)
                rebuild = 1;
        else
                mtime = statr.st_mtime;
        rebuild = check_first_dep  (mtime) || rebuild;
        // ...
        if (rebuild) {
                write (1, "gcc -c foo.c\n", 13);
                system ("gcc -c foo.c");
                return 1
        }
        else
                return mtime > parent;
}

&norm With some extra error handling thrown in.

%page

Back end (part 4)

Now that we've worked out the structure of the C program 
we're going to generate, we just write it down as a bunch 
of factored quasiquotations. Here is an example quote
from the back end: 

&code let fstmt = <:cstmt< 
%fore "green"
        if (stat (
%cont, fore "red"
$str:file$
%cont, fore "green"
, &target) == -1) {
                rebuild = 1; 
        } else {
                mtime = target.st_mtime;
        }
%fore "red"
        $stmt:satisfy$
%fore "yellow"
>>

&norm Look familliar? 

The back end is about 70 lines of code. It just produces
C quotes isomorphic to the 
%cont, font "typewriter", fore "yellow"
rule
%cont, font "standard", fore "white"
 tree constructed by the 
front end.

%page

Back end (summary)

The back end is probably the easiest part to understand,
since we've already developed a keen understanding of 
quasiquotation.

	Produces a C function for each rule
		Trades space for speed
			Not that much space either: ~ 30 insns / rule at -O2
		Pre-calculates a lot
			Only needs 
%cont, fore "yellow", font "typewriter"
stat()
%cont, fore "white", font "standard"
, 
%cont, fore "yellow", font "typewriter"
system()
%cont, fore "white", font "standard"
, and 
%cont, fore "yellow", font "typewriter"
write()
%fore "white", font "standard"
		Many other tradeoffs possible

	Can perform domain-specific optimizations
		Topological sort done at compile time
			Missing build rules directly fail when called
			Dead build rules not compiled into executable
			Common dependency DAG nodes refer to same function
		All messages pre-formatted
			All string lengths known, for 
%cont, fore "yellow", font "typewriter"
write()

%page

Grand Finale

All that remains is to write a driver.
We use the "shell" module of Ocaml, since it looks a lot like
writing shell scripts with flexible I/O and exception handling.
It does the following:

	Run 
%cont, fore "red", font "typewriter"
ocamlc
%cont, font "standard", fore "white"
 on input, with with custom 
%cont, fore "red", font "typewriter"
camlp4
%cont, font "standard", fore "white"
 pre-processor
	Chmod output file
	Run output file, capture output to C file
	Run 
%cont, fore "red", font "typewriter"
gcc
%cont, font "standard", fore "white"
 on C file
	Chmod output file
	Delete temporaries

Some command-line option processing and exception handling,
make the driver about as large as the back end.

graydon hoare
Powered by
ViewCVS 0.9.2