Language-SIOD

 view release on metacpan or  search on metacpan

siod.html  view on Meta::CPAN

<TR><TD>siod.scm</TD>
    <TD>mostly obsolete collection of utility subroutines</TD></TR>
<TR><TD>smtp.scm</TD>
    <TD>smtp client subroutines</TD></TR>
<TR><TD>sql_oracle.scm</TD>
    <TD>utilities for oracle database client</TD></TR>
<TR><TD>sql_rdb.scm</TD>
    <TD>utilities for rdb database client</TD></TR>
<TR><TD>sql_sybase.scm</TD>
    <TD>utilities for sybase database client</TD></TR>
<TR><TD>piechart.scm</TD>
    <TD>a CGI script that returns a piechart as a GIF</TD></TR>
</TABLE>

<A name="garbage"></a><H2>Garbage Collection</h2>

When SIOD was first released in April of 1988, it had a stop-and-copy
garbage collector which could only run at the toplevel of the
read-eval-print loop because it had no certain way of determine which
objects on the stack were truly pointers; unless of course the runtime
system was modified to make use of information the compiler provided to
the debugger, in a way that a real lisp compiler would be required to
provide. About that same time Jim O'Dell related to me the debugging
heuristic he had used in his port of Franz Lisp to the Macintosh, by
scanning the stack for items which appeared to be pointers but which
had not been properly found by the mark phase of the gc. I suggested
at the time that he might as well get rid of the explicit book-keeping
code spread throughout Franz Lisp and stick with the heuristic. But since
he had already found all the bugs in the book-keeping in the runtime
and compiler output there was little benefit and great risk to doing it.

<P>Then, by the time SIOD had been out for a year there had been
enough complaints about the lack of fully available GC that I was
motivated to utilize a stack heuristic, since I had no intention of
maintaining any explicit book-keeping code in the source. The published
arguments in favor of the conservative approach described by Hans
Boehm of Xerox Parc then reduced this design decision to a no-brainer,
and his implementation suggested the use of setjmp as a sufficiently
portable way for C code to get at the machine register set without
introducing assembling language. To SIOD then I added only about 300
bytes of VAX instructions to the size of the runtime system. 

<P>
There are two storage management techniques which may be chosen at runtime
by specifying the -g argument flag. 

<UL>
<LI>-g1 is stop-and-copy. This is the simplest and most
portable implementation. GC is only done at toplevel.
<LI>-g0 (the default) is mark-and-sweep. GC is done at any time, 
required or requested. However, the implementation is not as portable.
</UL>

<P>If you get strange errors on a machine architecture not listed
then you may be forced to use -g1 until you investigate and contact
the author for advise.

<h3>Stop and Copy</h3>

As one can see from the source, garbage collection is really quite an easy
thing. The procedure gc_relocate is about 25 lines of code, and
scan_newspace is about 15.

<P>
The real tricks in handling garbage collection are (in a copying gc):
<OL>
<LI>keeping track of locations containing objects
<LI>parsing the heap (in the space scanning)
</OL>

<P>The procedure gc_protect is called once (e.g. at startup) on each
<B>global</B> location which will contain a lisp object.

<P>That leaves the stack. The beleive is that if we had chosen not to 
use the argument
and return-value passing mechanism provided by the C-language
implementation, (also known as the "machine stack" and "machine
procedure calling mechanism) this lisp would be larger, slower, and
rather more difficult to read and understand. Furthermore it would be
considerably more painful to *add* functionality in the way of SUBR's
to the implementation.

<P>Aside from writing a very machine and compiler specific assembling language
routine for each C-language implementation, embodying assumptions about
the placement choices for arguments and local values, etc, we
are left with the following limitation: 

<BLOCKQUOTE>YOU CAN GC ONLY AT TOP-LEVEL</BLOCKQUOTE>

<P>However, this fits in perfectly with the programming style imposed in
many user interface implementations including the MIT X-Window Toolkit.
In the X Toolkit, a callback or work procedure is not supposed to spend
much time implementing the action. Therefore it cannot have allocated
much storage, and the callback trampoline mechanism can post a work
procedure to call the garbage collector when needed.

<P>Our simple object format makes parsing the heap rather trivial.
In more complex situations one ends up requiring object headers or markers
of some kind to keep track of the actual storage lengths of objects
and what components of objects are lisp pointers.

<P>Because of the usefulness of strings, they were added by default into
SIOD 2.6. The implementation requires a hook that calls the C library
memory free procedure when an object is in oldspace and never
got relocated to newspace. Obviously this slows down the stop-and-sweep
GC, and removes one of the usual advantages it has over mark-and-sweep.

<H3>Mark and Sweep</H3>

In a mark-and-sweep GC the objects are not relocated. Instead
one only has to LOOK at objects which are referenced by the argument
frames and local variables of the underlying (in this case C-coded)
implementation procedures. If a pointer "LOOKS" like it is a valid
lisp object (see the procedure mark_locations_array) then it may be marked,
and the objects it points to may be marked, as being in-use storage which
is not linked into the freelist in the gc_sweep phase.

<P>Another advantage of the mark_and_sweep storage management technique is
that only one heap is required.

<P>The main disadvantages are:
<OL>
<LI>start-up cost to initially link freelist.
    (can be avoided by more general but slower NEWCELL code).
<LI>does not COMPACT or LOCALIZE the use of storage. This is poor engineering
    practice in a virtual memory environment.
<LI>the entire heap must be looked at, not just the parts with useful storage.
</OL>

<P>In general, mark-and-sweep is slower in that it has to look at more
memory locations for a given heap size, however the heap size can
be smaller for a given problem being solved. More complex analysis
is required when READ-ONLY, STATIC, storage spaces are used
(which we do not support, currently).
Additionally the most sophisticated stop-and-copy storage management
techniques take into account considerations of object usage temporality.

<P>The technique assumes that all machine registers the GC needs to
look at will be saved by a setjmp call into the save_regs_gc_mark data,
and that every thing else is on the C runtime stack. Hence we
have some assumptions that impact portability.

<A name="porting"></a><H2>Porting</h2>

The most heavily ifdef'd module is slibu.c because it utilizes "unix"
runtime functionality. You may want to start out by porting
the "sample" main program along with the modules needed to be static
linked with it.

<P>If your system or C runtime needs to poll for the interrupt signal
mechanism to work, then define INTERRUPT_CHECK to be something
useful.

<P>The STACK_LIMIT and STACK_CHECK macros may need to be conditionized.
They currently assume stack growth downward in virtual address.
The subr (%%stack-limit setting non-verbose) may be used to change the
limits at runtime.

<P>The stack and register marking code used in the mark-and-sweep GC
is unlikely to work on machines that do not keep the procedure call
stack in main memory at all times. It is assumed that setjmp saves
all registers into the jmp_buff data structure. If your target machine
architecture is radically different, such as using linked procedure
call frames of some kind, not organized as a stack, then it would be
best if you could find vendor-supported routines for walking these
frames, such as would be utilized by a debugger. The mark_locations
procedure can then be invoked multiple times with the proper start
and end addresses.

<P>If the stack is not always aligned (in LISP-PTR sense) then the 

siod.html  view on Meta::CPAN

<LI>For runtime efficiency.
<LI>To take advantage of operating system, or other runtime library
provided functionality.
<LI>To play games with evaluator semantics.
</OL>

<P>Some examples of the first class are the functions memq, and nth,
study them. These extensions are straightforward, and easy to debug from the C
language debugger, with the functions <b>err0</b>, <b>pr</b>, and
<b>prp</b> being provided to call back into the lisp runtime system
from the C debugger. 

<P>Some examples from the second class are the <b>ndbm</b>
and <b>regex</b> modules, and the support for commercial database
client interfaces. In many cases it is convenient to define
new scheme data types to encapsulate the complex state of
an API. Study how to utilize allocate_user_tc,
set_gc_hooks and set_print_hooks. Careful ordering of
storage allocation and interrupt management are important.
Also don't forget that most C programming API functions do not
handle being longjump'd through very well, so beware of how
you handle callbacks and SIGINT.

<P>Functions such as get_c_string, get_c_string_dim, get_c_long,
get_c_double, and get_c_file are usually all you need to get at the
data you require to get the job done. But beware of spoofing the
garbage collector. For example, never do something equivalent to this:

<BLOCKQUOTE><PRE><TT>{LISP <b>x</b>;
 char *<b>z</b>;
 <B>x</B> = strcons(100,NULL);
 z = get_c_string(z);
 /* no further references to <B>x</B>, but <b>z</b> is used */
 }</TT></PRE></BLOCKQUOTE>

<P>Because there are no further references to <B>x</B>, the C compiler
might very well reuse the location on the stack in which <B>x</B> resided.
If there is any other consing then the garbage collector will go off
at some point in the future inside this function, and it will free
the memory pointed to by <b>z</b>. A potential example of this
sort of thing is the built-in procedure <B>lexec</b>. In theory
a C compiler might store envp and gcsafe in the same memory
location. But of course for other reasons it is impossible for that
to cause problems unless get_c_string was extended to
invoke the evaluator in some cases.


<P>If you want to play with evaluator semantics you need to study
the <b>leval</b> function and perhaps the <b>lapply</b> function too.
The <b>tc_fsubr</b> object is the conventional way to extend an
evaluator, but the <b>tc_msubr</b> is more powerfull and allows
for a modular tail recursion. The set_eval_hooks function allows for
arbitrary evalution semantics when the first element of a form
evaluates to a new datatype.

<H3>User Type Extension</H3>
If you use them then you must provide some information 
to the garbage collector.
To do this you can supply 4 functions,
<OL>
<LI>a user_relocate, takes an object and returns a new copy.
<LI>a user_scan, takes an object and calls relocate on its subparts.
<LI>a user_mark, takes an object and calls gc_mark on its subparts or
it may return one of these to avoid stack growth.
<LI>a user_free, takes an object to hack before it gets onto the freelist.
</OL>

<BLOCKQUOTE><PRE><TT>
set_gc_hooks(type,
             user_relocate_fcn,
             user_scan_fcn,
             user_mark_fcn,
             user_free_fcn,
             &amp;kind_of_gc);
</TT></PRE></BLOCKQUOTE>

<P>The variable kind_of_gc should be a long. 
It will receive 0 for mark-and-sweep, 1 for
stop-and-copy. Therefore set_gc_hooks should be called AFTER process_cla.
You must specify a relocate function with stop-and-copy. The scan
function may be NULL if your user types will not have lisp objects in them.
Under mark-and-sweep the mark function is required but the free function
may be NULL.

<P>You might also want to extend the printer. This is optional.

<BLOCKQUOTE><PRE><TT>
set_print_hooks(type,fcn);
</TT></PRE></BLOCKQUOTE>

The fcn receives the object which should be printed to its second
argument, a struct gen_printio *, using procedures such as gput_st.

<A name="extuse"></a><H2>LIBSIOD use as an extension language for C programs</h2>

See the modules <b>sample.c</b> and <b>siod.c</b> for example main
program usage. Once the storage system is initialized the
most simple and useful interface into the interpreter is probably
the <B>repl_c_string</b> function. Sample called with argument "xx"
illustrates the most general case of string->string transformation
using repl_c_string. Here is an even smaller example, with only two
procedure calls into the siod shared library, marked in bold font.
The return value of repl_c_string is zero unless an error took
place in either the read, eval, or print portions of execution.
This example wraps a call to *catch 'errobj around the expression, which
will cause evaluation errors to return a pair of error message string
and the offending object. 


<BLOCKQUOTE><PRE><TT>#include &lt;stdio.h&gt;
#include &lt;stdlib.h&gt;
#include &lt;string.h&gt;
#include "siod.h"

int main(int argc,char **argv)
{int j,retval = 0;
 long iobufflen = 1000;
 char *iobuff,*sargv[4];
 sargv[0] = argv[0];
 sargv[1] = "-v0";
 sargv[2] = "-g0";
 sargv[3] = "-h10000:10";
 <B>siod_init</B>(4,sargv);
 iobuff = (char *) malloc(iobufflen);
 retval = 0;
 for(j=1;j&lt;argc;++j)
   {sprintf(iobuff,"(*catch 'errobj (begin %s))",argv[j]);
    printf("Evaluating %s, ",argv[j]);
    retval = <B>repl_c_string</B>(iobuff,0,0,iobufflen);
    printf("retval = %d\n%s\n",retval,iobuff);}
 return(retval);}
</TT></PRE></BLOCKQUOTE>

<A name="EVAL"></a><H2>Implementation of EVAL and environment representation</H2>

Some procedures take an optional (last) argument which is a pointer to
an environment. Here is a table giving the scheme and C names of 
the most important ones:




( run in 0.404 second using v1.01-cache-2.11-cpan-5511b514fd6 )