Alien-LibJIT

 view release on metacpan or  search on metacpan

libjit/doc/libjit.texi  view on Meta::CPAN

\input texinfo	@c -*-texinfo-*-
@c %** start of header
@setfilename libjit.info
@settitle Just-In-Time Compiler Library
@setchapternewpage off
@c %** end of header

@dircategory Libraries
@direntry
* Libjit: (libjit).                Just-In-Time Compiler Library
@end direntry

@ifinfo
The libjit library assists with the process of building
Just-In-Time compilers for languages, virtual machines,
and emulators.

Copyright @copyright{} 2004 Southern Storm Software, Pty Ltd
@end ifinfo

@titlepage
@sp 10
@center @titlefont{Just-In-Time Compiler Library}

@vskip 0pt plus 1fill
@center{Copyright @copyright{} 2004 Southern Storm Software, Pty Ltd}
@end titlepage

@syncodeindex fn cp
@syncodeindex vr cp
@syncodeindex tp cp

@c -----------------------------------------------------------------------

@node Top, Introduction, , (dir)
@menu
* Introduction::            Introduction and rationale for libjit
* Features::                Features of libjit
* Tutorials::               Tutorials in using libjit
* Initialization::          Initializing the JIT
* Functions::               Building and compiling functions with the JIT
* Types::                   Manipulating system types
* Values::                  Working with temporary values in the JIT
* Instructions::            Working with instructions in the JIT
* Basic Blocks::            Working with basic blocks in the JIT
* Intrinsics::              Intrinsic functions available to libjit users
* Exceptions::              Handling exceptions
* Breakpoint Debugging::    Hooking a breakpoint debugger into libjit
* ELF Binaries::            Manipulating ELF binaries
* Utility Routines::        Miscellaneous utility routines
* Diagnostic Routines::     Diagnostic routines
* C++ Interface::           Using libjit from C++
* Porting::                 Porting libjit to new architectures
* Index::                   Index of concepts and facilities
@end menu

@c -----------------------------------------------------------------------

@node Introduction, Features, Top, Top
@chapter Introduction and rationale for libjit
@cindex Introduction

Just-In-Time compilers are becoming increasingly popular for executing
dynamic languages like Perl and Python and for semi-dynamic languages
like Java and C#.  Studies have shown that JIT techniques can get close to,
and sometimes exceed, the performance of statically-compiled native code.

However, there is a problem with current JIT approaches.  In almost every
case, the JIT is specific to the object model, runtime support library,
garbage collector, or bytecode peculiarities of a particular system.
This inevitably leads to duplication of effort, where all of the good
JIT work that has gone into one virtual machine cannot be reused in another.

JIT's are not only useful for implementing languages.  They can also be used
in other programming fields.  Graphical applications can achieve greater
performance if they can compile a special-purpose rendering routine
on the fly, customized to the rendering task at hand, rather than using
static routines.  Needless to say, such applications have no need for
object models, garbage collectors, or huge runtime class libraries.

Most of the work on a JIT is concerned with arithmetic, numeric type
conversion, memory loads/stores, looping, performing data flow analysis,
assigning registers, and generating the executable machine code.
Only a very small proportion of the work is concerned with language specifics.

The goal of the @code{libjit} project is to provide an extensive set of
routines that takes care of the bulk of the JIT process, without tying the
programmer down with language specifics.  Where we provide support for
common object models, we do so strictly in add-on libraries,
not as part of the core code.

Unlike other systems such as the JVM, .NET, and Parrot, @code{libjit}
is not a virtual machine in its own right.  It is the foundation upon which a
number of different virtual machines, dynamic scripting languages,
or customized rendering routines can be built.

The LLVM project (@uref{http://www.llvm.org/}) has some similar
characteristics to @code{libjit} in that its intermediate format is
generic across front-end languages.  It is written in C++ and provides
a large set of compiler development and optimization components;
much larger than @code{libjit} itself provides.  According to its author,
Chris Lattner, a subset of its capabilities can be used to build JIT's.

Libjit should free developers to think about the design of their front
ends, and not get bogged down in the details of code execution.
Meanwhile, experts in the design and implementation of JIT's can concentrate
on solving code execution problems, instead of front end support issues.

This document describes how to use the library in application programs.
We start with a list of features and some simple tutorials.  Finally,
we provide a complete reference guide for all of the API functions in
@code{libjit}, broken down by function category.

@section Obtaining libjit

The latest version of @code{libjit} can be obtained from Southern
Storm Software, Pty Ltd's Web site:

@quotation
@uref{http://www.southern-storm.com.au/libjit.html}
@end quotation

@section Further reading

While it isn't strictly necessary to know about compiler internals
to use @code{libjit}, you can make more effective use of the library
if you do.  We recommend the "Dragon Book" as an excellent resource
on compiler internals, particularly the sections on code generation
and optimization:

@quotation
Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman, "Compilers:
Principles, Techniques, and Tools", Addison-Wesley, 1986.
@end quotation

IBM, Intel, and others have done a lot of research into JIT implementation
techniques over the years.  If you are interested in working on the
internals of @code{libjit}, then you may want to make yourself familiar
with the relevant literature (this is by no means a complete list):

@quotation
IBM's Jikes RVM (Research Virtual Machine), @*
@uref{http://www-124.ibm.com/developerworks/oss/jikesrvm/}.

Intel's ORP (Open Runtime Platform), @*
@uref{http://orp.sourceforge.net/}.
@end quotation

libjit/doc/libjit.texi  view on Meta::CPAN

short tutorial exercises.  Full source for these tutorials can be found
in the @code{tutorial} directory of the @code{libjit} source tree.

For simplicity, we will ignore errors such as out of memory conditions,
but a real program would be expected to handle such errors.

@menu
* Tutorial 1::              Tutorial 1 - mul_add
* Tutorial 2::              Tutorial 2 - gcd
* Tutorial 3::              Tutorial 3 - compiling on-demand
* Tutorial 4::              Tutorial 4 - mul_add, C++ version
* Tutorial 5::              Tutorial 5 - gcd, with tail calls
* Dynamic Pascal::          Dynamic Pascal - A full JIT example
@end menu

@c -----------------------------------------------------------------------

@node Tutorial 1, Tutorial 2, Tutorials, Tutorials
@section Tutorial 1 - mul_add
@cindex mul_add tutorial

In the first tutorial, we will build and compile the following function
(the source code can be found in @code{tutorial/t1.c}):

@example
int mul_add(int x, int y, int z)
@{
    return x * y + z;
@}
@end example

@noindent
To use the JIT, we first include the @code{<jit/jit.h>} file:

@example
#include <jit/jit.h>
@end example

All of the header files are placed into the @code{jit} sub-directory,
to separate them out from regular system headers.  When @code{libjit}
is installed, you will typically find these headers in
@code{/usr/local/include/jit} or @code{/usr/include/jit}, depending upon
how your system is configured.  You should also link with the
@code{-ljit} option.

@noindent
Every program that uses @code{libjit} needs to call @code{jit_context_create}:

@example
jit_context_t context;
...
context = jit_context_create();
@end example

Almost everything that is done with @code{libjit} is done relative
to a context.  In particular, a context holds all of the functions
that you have built and compiled.

You can have multiple contexts at any one time, but normally you will
only need one.  Multiple contexts may be useful if you wish to
run multiple virtual machines side by side in the same process,
without them interfering with each other.

Whenever we are constructing a function, we need to lock down the
context to prevent multiple threads from using the builder at a time:

@example
jit_context_build_start(context);
@end example

The next step is to construct the function object that will represent
our @code{mul_add} function:

@example
jit_function_t function;
...
function = jit_function_create(context, signature);
@end example

The @code{signature} is a @code{jit_type_t} object that describes the
function's parameters and return value.  This tells @code{libjit} how
to generate the proper calling conventions for the function:

@example
jit_type_t params[3];
jit_type_t signature;
...
params[0] = jit_type_int;
params[1] = jit_type_int;
params[2] = jit_type_int;
signature = jit_type_create_signature
    (jit_abi_cdecl, jit_type_int, params, 3, 1);
@end example

This declares a function that takes three parameters of type
@code{int} and returns a result of type @code{int}.  We've requested
that the function use the @code{cdecl} application binary interface (ABI),
which indicates normal C calling conventions.  @xref{Types}, for
more information on signature types.

Now that we have a function object, we need to construct the instructions
in its body.  First, we obtain references to each of the function's
parameter values:

@example
jit_value_t x, y, z;
...
x = jit_value_get_param(function, 0);
y = jit_value_get_param(function, 1);
z = jit_value_get_param(function, 2);
@end example

Values are one of the two cornerstones of the @code{libjit} process.
Values represent parameters, local variables, and intermediate
temporary results.  Once we have the parameters, we compute
the result of @code{x * y + z} as follows:

@example
jit_value_t temp1, temp2;
...
temp1 = jit_insn_mul(function, x, y);
temp2 = jit_insn_add(function, temp1, z);
@end example

This demonstrates the other cornerstone of the @code{libjit} process:
instructions.  Each of these instructions takes two values as arguments
and returns a new temporary value with the result.

Students of compiler design will notice that the above statements look
very suspiciously like the "three address statements" that are described
in compiler textbooks.  And that is indeed what they are internally within
@code{libjit}.

If you don't know what three address statements are, then don't worry.
The library hides most of the details from you.  All you need to do is
break your code up into simple operation steps (addition, multiplication,
negation, copy, etc).  Then perform the steps one at a time, using
the temporary values in subsequent steps.  @xref{Instructions}, for
a complete list of all instructions that are supported by @code{libjit}.

Now that we have computed the desired result, we return it to the caller
using @code{jit_insn_return}:

@example
jit_insn_return(function, temp2);
@end example

We have completed the process of building the function body.  Now we
compile it into its executable form:

@example
jit_function_compile(function);
jit_context_build_end(context);
@end example

As a side-effect, this will discard all of the memory associated with
the values and instructions that we constructed while building the
function.  They are no longer required, because we now have the
executable form that we require.

We also unlock the context, because it is now safe for other threads
to access the function building process.

Up until this point, we haven't executed the @code{mul_add} function.
All we have done is build and compile it, ready for execution.  To execute it,
we call @code{jit_function_apply}:

@example
jit_int arg1, arg2, arg3;
void *args[3];
jit_int result;
...
arg1 = 3;
arg2 = 5;
arg3 = 2;
args[0] = &arg1;
args[1] = &arg2;
args[2] = &arg3;
jit_function_apply(function, args, &result);
printf("mul_add(3, 5, 2) = %d\n", (int)result);
@end example

We pass an array of pointers to @code{jit_function_apply}, each one
pointing to the corresponding argument value.  This gives us a very
general purpose mechanism for calling any function that may be
built and compiled using @code{libjit}.  If all went well, the
program should print the following:

@example
mul_add(3, 5, 2) = 17
@end example

You will notice that we used @code{jit_int} as the type of the arguments,
not @code{int}.  The @code{jit_int} type is guaranteed to be 32 bits
in size on all platforms, whereas @code{int} varies in size from platform
to platform.  Since we wanted our function to work the same everywhere,
we used a type with a predictable size.

If you really wanted the system @code{int} type, you would use
@code{jit_type_sys_int} instead of @code{jit_type_int} when you
created the function's signature.  The @code{jit_type_sys_int} type
is guaranteed to match the local system's @code{int} precision.

@noindent
Finally, we clean up the context and all of the memory that was used:

@example
jit_context_destroy(context);
@end example

@c -----------------------------------------------------------------------

@node Tutorial 2, Tutorial 3, Tutorial 1, Tutorials
@section Tutorial 2 - gcd
@cindex gcd tutorial

In this second tutorial, we implement the subtracting Euclidean
Greatest Common Divisor (GCD) algorithm over positive integers.
This tutorial demonstrates how to handle conditional branching
and function calls.  In C, the code for the @code{gcd} function
is as follows:

libjit/doc/libjit.texi  view on Meta::CPAN

    mul_add_function(jit_context& context) : jit_function(context)
    @{
        create();
        set_recompilable();
    @}

    virtual void build();

protected:
    virtual jit_type_t create_signature();
@};
@end example

Where we used @code{jit_function_t} and @code{jit_context_t} before,
we now use the C++ @code{jit_function} and @code{jit_context} classes.

In our constructor, we attach ourselves to the context and then call
the @code{create()} method.  This is in turn will call our overridden
virtual method @code{create_signature()} to obtain the signature:

@example
jit_type_t mul_add_function::create_signature()
@{
    // Return type, followed by three parameters,
    // terminated with "end_params".
    return signature_helper
        (jit_type_int, jit_type_int, jit_type_int,
         jit_type_int, end_params);
@}
@end example

The @code{signature_helper()} method is provided for your convenience,
to help with building function signatures.  You can create your own
signature manually using @code{jit_type_create_signature} if you wish.

The final thing we do in the constructor is call @code{set_recompilable()}
to mark the @code{mul_add} function as recompilable, just as we did in
Tutorial 3.

The C++ library will create the function as compilable on-demand for
us, so we don't have to do that explicitly.  But we do have to override
the virtual @code{build()} method to build the function's body on-demand:

@example
void mul_add_function::build()
@{
    jit_value x = get_param(0);
    jit_value y = get_param(1);
    jit_value z = get_param(2);

    insn_return(x * y + z);
@}
@end example

This is similar to the first version that we wrote in Tutorial 1.
Instructions are created with @code{insn_*} methods that correspond
to their @code{jit_insn_*} counterparts in the C library.

One of the nice things about the C++ API compared to the C API is that we
can use overloaded operators to manipulate @code{jit_value} objects.
This can simplify the function build process considerably when we
have lots of expressions to compile.  We could have used @code{insn_mul}
and @code{insn_add} instead in this example and the result would have
been the same.

Now that we have our @code{mul_add_function} class, we can create
an instance of the function and apply it as follows:

@example
jit_context context;
mul_add_function mul_add(context);

jit_int arg1 = 3;
jit_int arg2 = 5;
jit_int arg3 = 2;
jit_int args[3];
args[0] = &arg1;
args[1] = &arg2;
args[2] = &arg3;

mul_add.apply(args, &result);
@end example

@noindent
@xref{C++ Interface}, for more information on the @code{libjitplus}
library.

@c -----------------------------------------------------------------------

@node Tutorial 5, Dynamic Pascal, Tutorial 4, Tutorials
@section Tutorial 5 - gcd, with tail calls
@cindex gcd with tail calls

Astute readers would have noticed that Tutorial 2 included two instances
of "tail calls".  That is, calls to the same function that are immediately
followed by a @code{return} instruction.

Libjit can optimize tail calls if you provide the @code{JIT_CALL_TAIL}
flag to @code{jit_insn_call}.  Previously, we used the following code
to call @code{gcd} recursively:

@example
temp3 = jit_insn_call
    (function, "gcd", function, 0, temp_args, 2, 0);
jit_insn_return(function, temp3);
@end example

@noindent
In Tutorial 5, this is modified to the following:

@example
jit_insn_call(function, "gcd", function, 0, temp_args, 2, JIT_CALL_TAIL);
@end example

There is no need for the @code{jit_insn_return}, because the call
will never return to that point in the code.  Behind the scenes,
@code{libjit} will convert the call into a jump back to the head
of the function.

Tail calls can only be used in certain circumstances.  The source
and destination of the call must have the same function signatures.



( run in 1.848 second using v1.01-cache-2.11-cpan-97f6503c9c8 )