Perl6-Pugs
view release on metacpan or search on metacpan
docs/talks/hw2005.tex view on Meta::CPAN
initial input to the design team, calling for improved functional,
object-oriented, concurrency-based, data-driven and logic programming support
in the base language. In order to support these requirements, the design team
focused on Perl 6's metaprogramming capabilities.
Unlike most languages, Perl 5 has a dynamic grammar: its parser is tightly
coupled with its evaluator, so compile-time computation can affect how the
parser handles the rest of the program. This allows for a limited form of
macros known as \emph{source filters}, where a Perl module can transform the
importing program's source code before its parsing.
Many CPAN modules use source filters to implement Perl dialects and embedded
domain-specific languages. However, as the transformation operates on the
source string level instead of the AST level, multiple filters cannot reliably
work together.
Perl 6 proposes an unusual execution model to address this problem. The parser
and compiler are implemented in Perl 6, with an initial set of Rules to parse
source code into ASTs. Rules are composable, first-class functions, written in
an embedded language in Perl 6 that defines parsers for context sensitive,
infinite look-ahead grammars.
The parser parses the program with a top-level \code{program} rule, composed of
sub-rules that handle parts of the program. Rules may contain embedded Perl 6
code to produce side effects. For example, the \code{BEGIN \{...\}} syntax
defines a block of code to be executed during compilation.
Right after parsing a \code{BEGIN} block, the compiler will generate object
code for the block, then immediately run that code. The generated code has
access to the compilation environment; it may rewrite the partially produced
parse tree, define new grammar rules, or replace the compiler entirely from the
next token onwards.
With a default grammar that supports user-defined operators of arbitrary fixity
and precedence, programmers can easily introduce new language semantics under
this execution model.
During runtime, the program still has access to the built-in parser and
compiler modules. The \code{eval} primitive triggers the compilation and linking
process again, with full access to the current runtime environment.
\subsection{Language Interoperability}
\label{sec:LanguageInteroperability}
Perl 6 is designed to operate in a mixed-language environment. The Inline.pm
framework already implements this capability in Perl 5; it provides a unified
interface to three strategies for interoperating with other languages.
\begin{enumerate}
\item Translate: Sometimes it is possible to transform the foreign language
into equivalent Perl code. In that case, it is a simple matter of applying a
source filter, or of \code{eval}ing the translated code in its own namespace.
\item Native call: Statically compiled languages (such as C, C++ and assembly)
are compiled into dynamically loaded shared libraries. Inline.pm automatically
generates wrappers to call foreign functions; it also exports Perl's runtime
API so the embedded language can call functions defined in Perl.
\item Runtime harness: For languages with a rich runtime (e.g. Java and Python),
Inline.pm will run them alongside Perl's own runtime, sharing the execution
context in a coroutine-like fashion. Types and global bindings are encoded
both ways; the Perl program can invoke a foreign object's methods, and vice
versa.
\end{enumerate}
In addition to these strategies, Perl 6 is also designed to run on a shared
high-level virtual machine, such as the Java VM or .Net CLR. As part of the
Perl 6 project, the Parrot VM is the reference implementation for a
cross-language runtime to support Perl 6 and other dynamic languages.
% % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % %
% % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % %
% 3.
\section{Related Work}
\label{sec:RelatedWork}
Perl 6 is designed to be self-hosting, unlike previous versions that were
implemented in C. Its execution model demands a mixed parsing, compilation and
execution cycle, which complicates the bootstrap process. In this section, we
briefly review various bootstrap strategies proposed over the years.
% 3.1
\subsection{Perl6::Rules}
\label{sec:Perl6Rules}
To explore and prototype Perl 6's Rules mechanism, Damian Conway implemented
\texttt{Perl6::Rules}, a Perl 5 module that translates Perl 6 Rules into Perl 5's
regular expressions. An unreleased prototype was written in 2003, which led to
extensive modifications of the design of Rules.
In 2004, Conway released a revised version of \texttt{Perl6::Rules}. The
implementation was constrained by limitations of Perl 5's non-reentrant regular
expression engine; moreover, \texttt{Perl6::Rules} relied on various
experimental features that no longer work with newer Perl 5 revisions. As
such, \texttt{Perl6::Rules} remains a partial prototype, not suitable as a
basis for implementing a parser for the Perl 6 grammar.
% 3.2.
\subsection{P6C}
\label{sec:P6C}
The initial plan was to use Perl 5 for bootstrapping: First write the Perl 6 to
Parrot compiler in Perl 5, then translate the compiler itself to Perl 6 by
instrumenting Perl 5's bytecode compiler to emit Perl 6 code.
Started in 2002, P6C was the first concrete prototype of Perl 6. It is
written in Perl 5 and distributed as one of the language prototypes in the
Parrot codebase. Instead of \texttt{Perl6::Rules}, P6C uses the
\texttt{Parse::RecDescent} module to construct a recursive-descent parser for
Perl 6.
P6C's design is based on multiple passes on the parse tree. In the first
pass, \texttt{Parse::RecDescent} produces \texttt{raw parse objects}: each node
has a \texttt{tree} method that turns itself to \emph{op-tree objects},
produced by recursively calling the \code{tree} method on its subnodes. In
the next pass, the compiler traverses the op-tree to annotate each term with
its inferred context. Finally, the \code{val} method is recursively called
on each node to produce the compiled code in PIR, a high-level assembly code
for Parrot.
With this design, each term's reduction rules are hard-coded in PIR assembly,
docs/talks/hw2005.tex view on Meta::CPAN
{-# OPTIONS_GHC -fth #-}
compiledAt :: String
compiledAt = $( runIO $
fmap (LitE . StringL . show) getClockTime )
\end{lstlisting}
Because the RuleParser monad is \emph{pure}, we need to use
\code{unsafePerformIO} to get to the evaluator:
\begin{lstlisting}
unsafeEvalExp :: Exp -> RuleParser Exp
unsafeEvalExp exp = do
env <- getRuleEnv
let val = unsafePerformIO $ do
runEvalIO (evalExp exp)
return (Val val)
\end{lstlisting}
The call to unsafePerformIO is \emph{safe}, because we immediately use its result
value in the strict constructor \code{Val}; the result is not reused anywhere else.
% 4.3.
\subsection{The Eval Monad}
\label{sec:TheEvalMonad}
The code snippet above uses \code{runEvalIO} to enter the Eval monad from IO.
Inside the Eval monad, it calls \code{evalExp} to reduce the AST from Exp to Val:
\begin{lstlisting}
runEvalIO :: Env -> Eval Val -> IO Val
evalExp :: Exp -> Eval Val
\end{lstlisting}
Currently, we define the Eval monad as:
\begin{lstlisting}
type Eval x =
EvalT (ContT Val (ReaderT Env SIO)) x
newtype EvalT m a = EvalT { runEvalT :: m a }
\end{lstlisting}
The \code{SIO} is an abstraction over IO and STM computations; we will discuss
it in section 5.2. On top of \code{SIO}, a \code{ReaderT} monad transformer
provides the evaluation environment, \code{Env}. We chose \code{ReaderT} over
\code{StateT}, because \code{ReaderT} facilitates lexical bindings to
sub-terms. Lexical and global symbol bindings are stored as fields in the
\code{Env} record structures:
\begin{lstlisting}
data Env = MkEnv
{ envLexical :: !Pad
, envGlobal :: !(TVar Pad)
, -- ...more fields...
}
\end{lstlisting}
As such, sub-terms can override global bindings by writing to the shared
\code{TVar} storage, but new lexical bindings will never leak out.
We implement control flow with the \code{ContT} monad transformer, because it
can directly support Perl 6's continuation and coroutine constructs. We
represent the dynamic scope with \code{resetT}/\code{shiftT}: calling a
subroutine will push a new prompt via \code{resetT}, so we can simply express
the \code{return} primitive as:
\begin{lstlisting}
doReturn :: Val -> Eval a
doReturn = shiftT . const . return
\end{lstlisting}
Finally, we layer a new transformer EvalT on top of \code{ContT}, so we can
override the monad transformer library's default \code{callCC}, \code{ask} and
\code{fail} definitions. For example, in order to provide full stack traces,
and support user-defined error propagation logic, \code{fail} is defined as:
\begin{lstlisting}
fail :: String -> Eval a
fail str = do
pos <- asks envPos
doReturn (VError str [pos])
\end{lstlisting}
Evaluation inside the Eval monad is controlled by \code{evalExp}:
\begin{lstlisting}
evalExp :: Exp -> Eval Val
evalExp exp = do
evl <- asks envEval
evl exp
\end{lstlisting}
This makes use of the envEval field in the Env structure, which defines the
active evaluator of type \code{(Exp $\to$ Eval Val)}. This allows the user to
change the reduction logic during evaluation, such as by adding watch points or
profiling instruments.
% % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % %
% % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % %
% 5.
\section{Experimental Features}
\label{sec:ExperimentalFeatures}
Pugs depends on the Glasgow Haskell Compiler (GHC), because GHC is the only
widely available Haskell implementation, of which we are aware, which offers
reasonable performance as a host language. Because of that, we are able to
improve clarity and efficiency in the Pugs source code with GHC-specific
extensions, such as pattern guards, liberalized type synonyms and generalized
derived instances for newtypes.
GHC 6.4 was released in March 2005, one month after Pugs's inception. Pugs
dropped support for GHC 6.2 in the same month, as we found it difficult to
keep source code portable between GHC 6.2 and GHC 6.4. This was due to
incompatible changes in Template Haskell, as well as new module interfaces
for Data.Map and Data.Set.
During the three months that followed, we explored GHC's new features to
clarify and expand on Perl 6's current design, as demonstrated with prototype
implementations in Pugs. Below we discuss three such uses of GHC's
experimental features.
docs/talks/hw2005.tex view on Meta::CPAN
instance MonadSTM SIO where
liftSTM stm = MkSTM stm
\end{lstlisting}
The SIO monad encapsulates three types of strict computations: pure functions
(\code{MkSIO}), actions that modify memory storage (\code{MkSTM}), and IO
actions with full side effects (\code{MkIO}). We use the \code{runIO} and
\code{runSTM} functions to convert SIO computations to IO and STM ones. One
can safely lift STM actions into the IO monad with GHC's \code{atomically}
function, but casting IO actions to STM will throw an exception:
\begin{lstlisting}
runIO (MkSTM stm) = atomically stm
runSTM (MkIO _) = fail "Unsafe IO in STM"
\end{lstlisting}
With this definition, we can implement the \code{atomically} primitive by
imposing a \emph{safe mode} that denies all IO actions inside it:
\begin{lstlisting}
runEvalSTM :: Env -> Eval Val -> STM Val
runEvalSTM env = runSTM . (`runReaderT` env) .
(`runContT` return) . runEvalT
opRunAtomically :: Exp -> Eval Val
opRunAtomically exp = do
env <- ask
liftSTM (runEvalSTM env $ evalExp exp)
\end{lstlisting}
The \code{async} primitive also becomes trivial to implement:
\begin{lstlisting}
runEvalIO :: Env -> Eval Val -> IO Val
runEvalIO env = runIO . (`runReaderT` env) .
(`runContT` return) . runEvalT
opRunAsync :: Exp -> Eval Val
opRunAsync exp = do
env <- ask
lock <- liftSTM newEmptyTMVar
let fork | rtsSupportsBoundThreads = forkOS
| otherwise = forkIO
tid <- liftIO . fork $ do
val <- runEvalIO env $ evalExp exp
liftSTM $ tryPutTMVar lock val
return ()
return $ VThread (MkThread tid lock)
\end{lstlisting}
Prompted by our success in introducing STM to Pugs, Parrot developers are
working to integrate STM support into the virtual machine, so Perl 6 programs
compiled to run on Parrot can continue to take advantage of STM.
% 5.3.
\subsection{Reified Continuations}
\label{sec:ReifiedContinuations}
In the previous subsection, we saw how Pugs uses \code{runEvalIO} and
\code{runEvalSTM} to resume execution from the IO and STM monad.
Unfortunately, although both can restore the evaluation environment,
they invalidate previously captured continuations by starting a new
\code{runContT}, rendering coroutines unusable across an \code{async} boundary:
\begin{lstlisting}
coro foo () { yield 1; yield 2 }
async { say foo() } # fails to resume "foo"
\end{lstlisting}
To solve this problem, we need to represent the continuation context as data
structure. Here we make use of the \code{CC\_2CPST} monad from Oleg Kiselyov
et al, based on previous work by Dybvig et al~\cite{Dybvig}.
We further use Kiselyov's \code{ZipperD} design to represent the evaluation as
a traversal on the \code{Exp} type, keeping the position as a
\code{[Direction]} type. This allows us to re-express the reduction logic as a
traversal function over the \code{Exp} type. We can freely augment the AST with
new nodes; each AST node's position is uniquely represented as a
\code{[Direction]}.
Because the new Eval monad will update the cursor at each reduction step, we can
define a pair of serialize/deserialize functions:
\begin{lstlisting}
data EvalD = MkEvalD Exp Env [Direction]
reifyEval :: Eval Val -> EvalD
runEval :: EvalD -> Eval Val
\end{lstlisting}
This \emph{suspend to memory} capability is sufficient to make coroutines work across
\code{async\{\}} calls. In the future, we may also implement a \emph{suspend
to disk} feature, by snapshotting the values of all \code{TVar}s in the
\code{Env} structure into a byte-string image:
\begin{lstlisting}
showEvalD :: EvalD -> STM String
readEvalD :: String -> STM EvalD
\end{lstlisting}
This has many interesting applications, including web continuations, omniscient
debugging, and mobile code. Since Parrot also supports reified continuation
objects, we plan to work with Parrot developers to add serialization
capabilities as well.
% % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % %
% % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % %
% 6.
\section{Cross-Language Interoperation}
\label{sec:Cross-LanguageInteroperation}
New computer languages often suffer from a lack of available libraries.
Conversely, programmers are much more likely to adopt a new language if they
can reuse libraries from an existing language. As one Perl developer put it:
\emph{CPAN is my programming language of choice; the rest is just syntax.}
Therefore, we were strongly motivated to work on Pugs's support for modules
written in languages other than Perl 6, as well as making Pugs embeddable from
other languages. As most Pugs developers came from Perl 5, Haskell and Parrot
backgrounds, those three were the first targets for interoperation.
% 6.1. Perl 5
\subsection{Perl 5}
\label{sec:Perl5}
Embedding Perl 5 is achieved by linking the \code{libperl.so} runtime with
Pugs. Perl 5 represents all values as \code{(SV *)} structures, similar to the
Val type in Pugs. As both sides allow first-class closure functions, their
C-side eval/apply interfaces are straightforward:
\begin{lstlisting}
/* foreign imports to FFI */
SV *perl5_eval (
char *code,
void *env, int cxt
);
SV **perl5_apply (
SV *sub, SV *inv, SV** args,
void *env, int cxt
);
/* foreign exports from FFI */
extern Val *pugs_eval (
char *code, int cxt
);
extern Val **pugs_apply (
Val *sub, Val *inv, Val **args, int cxt
);
\end{lstlisting}
The import and export APIs are almost identical: both use an optional invocant
( run in 0.613 second using v1.01-cache-2.11-cpan-e93a5daba3e )