The Alvilda Optimizing Compiler

Back in 1993 I had the good fortune to be part of a three-man team writing a compiler for a compiler course at DAIMI, Aarhus University, Denmark.  The group was Anders Skovsted Buch, Jan Hvid Sørensen, and me (Tommy Thorn).  Now, we weren't content with just making a compiler -- we wanted an optimizing compiler with state-of-the-art register allocation (scheduling wasn't a big thing at that time).  Furthermore, we set out to try the unproven internal representation CORTL, suggested by Carl McConnell in his Thesis proposal "Tree Based Optimizations".  The project was highly successful and vastly superior to any of the other groups (but probably also took orders more effort). A few useful lessons were learned on the way:
  1. Naively I had assumed that we didn't need an AST representation of the program but could generate CORTL directly as we parsed, via the parsing actions.  Well, we could, but it turned out to make the front much more convoluted than necessary.
  2. Using CORTL wasn't an obvious advantage.  The advantages of always having structured representation was offset by the pain of maintaining this representation.  Notably code motion was tedious.
  3. Ambition and deadlines don't match.  We didn't finish for the dead-line and rather than submitting a partial solution, we instead submitted nothing, re-enrolled in the course the following year, and submitted the completed project.
In May 2004 I dug out the sources again and only had to change three lines in the Makefile to make it compile.

The compiler (written in Gnu C) translates a program in the Pascal-like language Alvilda to SPARC assembly.

Download

  1. The complete project with both the Alvilda and Amanda compiler: Alvilda-2004-05-04.tar.gz
  2. The final rapport (alas, without the benchmark results).  In danish: dOvs-rapport-uden-kode.ps.gz

Using Alvilda

Build it with a simple make command.  The default operating mode is to compile with optimization and generate a SPARC assembly file, however there are a number of options to change that:

./alvilda -h
Usage: ./alvilda [-hptcrdD] [-O level] [-n num] [-m num] [-R symbol] [-o file] file

where options are
    -h                  print this text
    -p                  pipe mode. write code to standard output
    -t                  trivial memorybased register allocation
    -c                  check the consistency of the rtl-program
    -r                  include code for range checking
    -d                  include code for checking pointer dereferences
    -D                  debugging mode. write what the compiler is doing
    -O level            set optimizing level (default = 0)
    -n number           max number of arguments in registers
    -m number           max number of words to copy without using a loop
    -o filename         write code to named file
    -R symbol           output rtl code.
                        symbol must be a symbol printed in debugging mode or
                        REMINOR redefine register minor numbers

The option -R is useful to examine the result of a stage.  To know the name of a stage, first run alvilda with debugging enabled.  Examples:

./alvilda -D Test/C/unionfind.alv
Parsing Alvilda Program (PARSE)
Building Procedure Define and Use Information (BCDU)
Conversion to Static Single Assignment Form (SSA)
Common Subexpression Elimination (CSE)
Dead Code Elimination (DCE1)
Dead Assignment Elimination (DAE1)
Resolving Memory Registers (MEM)
Constant Folding (CF)
Common Subexpression Elimination (CSE)
Dead Code Elimination (DCE1)
Dead Assignment Elimination (DAE1)
Converting Comparisions to If-instructions (CTI)
Multiplication by Constant values (MCONST)
Multiplication by Procedure Call (MCALL)
Removing Big Constants (BCONST)
Combining Instructions (COMBINE)
Dead Assignment Elimination (DAE2)
Register Allocation (ALLOC)
Stripping SSA Information (STRIP)
Jump Elimination (JUMPELIM)
Dead Code Elimination (DCE2)

./alvilda -R DCE1  Test/C/unionfind.alv  
Procedure: 3 [use: %g7.1, m35.1, %fp.1, %i0.1, m149.1, %i1.1, m152.1, m153.1, m34.1, %sp.1][def: m35.3[UNDEF], m149.3[UNDEF], m1
52.3[UNDEF], m153.6]
221:11: { [dep:1, rep:0]
228:  12: { [dep:1, rep:0]
229:    r148.1 := %i0.1
233:    r151.1 := %i1.1
245:    m153.2 := update(m153.1, r151.1, r148.1)
246:    13: { [dep:1, rep:1]
742:      r239.0 := phi(r148.1, r162.1)
449:      m153.3 := phi(m153.2, m153.4)
250:      r159.4 := r239.0
252:      r162.1 := access(m34.1, r239.0)
752:      r249.0 := r162.1  =  0
253:      r161.1 := r249.0
753:      r250.0 := not r249.0
254:      r163.1 := r250.0
754:      r251.0 := not r250.0
255:      r165.1 := r251.0
256:      if r165.1 then
257:        depart 13 [0]
269:      m153.4 := update(m153.3, r151.1, r162.1)
270:      repeat 13 [0]
246:    }
447:    m153.5 := m153.3
271:    depart 12 [0]
228:  }
428:  m153.6 := m153.3
275:  depart 11 [0]
221:}
...

Notice, procedure 3 happens to be the Alvilda procedure find:

procedure find(p: in ufe; q: in out ufe) is
  q:=p;
  while not(q.up=null) loop
    q:=q.up;
  end loop;
end find;


./alvilda -R DCE2  Test/C/unionfind.alv

Procedure: 3 [use: %g7, m35, %fp, %i0, m149, %i1, m152, m153, m34, %sp][def: m35, m149, m152, m153]
221:11:
228:  12: {
245:    m153 := update(m153, %i1, %i0)
837:    %l0 := %i0
246:    13: {
252:      %l0 := access(m34, %l0)
818:      if %l0  =  0 then
257:        depart 13
269:      m153 := update(m153, %i1, %l0)
270:      repeat 13
246:    }
228:  }
...

./alvilda -p Test/C/unionfind.alv
.text
.global _main
.align 4
Proc3:
save %sp, -72, %sp
st %i0, [%i1]
mov %i0, %l0
L13repeat:
ld [%l0], %l0
subcc %l0, 0, %g0
be L13depart
nop
b L13repeat
st %l0, [%i1]
L13depart:
ret

restore
...

A few explanations: memory access (aka. pointer dereference) is modeled with an access operation which takes two parameters: the abstract memory (~ alias class) and the address.  Memory updates accordingly takes three arguments: the abstract memory, the address, and the value (always a full word).  Updating the memory creates a new abstract memory, thus everything fits into simple scheme where memory operations aren't special and the dependences are explicit (the only additional constraint is that abstract memories are single-threaded, thus an update cannot be moved ahead of any access using the same memory).

As can be seen from this little example, there's still room for improvement but in general it's hard to move the update out of the loop due to Alvilda's out parameter semantics which leads to interesting alias problems.

Comments and questions are most welcome: alvilda$numba-tu.com (replace the '$' with a '@').

Last update: 2006-03-20