FunctionalPerl

 view release on metacpan or  search on metacpan

docs/howto.md  view on Meta::CPAN

(yet); there will be various ways how the parallel evaluation could be
implemented in Perl, and there may be modules on CPAN already (todo:
see what's around and perhaps wrap it in a way consistent with the
rest of functional-perl).

### Comparison to generators

Generators (code that can produce a sequence by `yield`ing elements)
are en vogue in non-functional languages today. They also run code on
demand ('lazily') to produce data elements. How do they compare to
streams (lazy linked lists)?

* `yield` interrupts control flow; from the point of view
  of function application, it is magic, it's not part of what
  'straight' functions can do. The mechanism of their implementation
  is often not accessible; if it is (i.e. built as a library using
  only the host language), then first-class continuations are
  needed. Those have a bad reputation for being difficult to reason
  about. Even Schemers, the community where the concept originated,
  recommend to use them sparingly. In comparison, lazy evaluation is
  very straight-forward (and even in Scheme implementations with
  perfectly efficient first-class continuations, an implementation of
  lazy lists is faster than one of generators).

* One benefit (the only?) that generators have is that they don't need
  to introduce the concept of linked lists.

* The lazyness mechanism as described above is universal, it doesn't
  just apply to sequences, but works seamlessly for things like trees,
  or even individual (but expensive to calculate) values.


## Memory handling

The second area (after the multiple-namespaces) where Perl is
inconvenient to get functional programs working is for them to handle
memory correctly. Functional programming languages usually use tracing
garbage collection, have compilers that do live time analysis of
variables, and optimize tail calls by default (although some like
Clojure don't do the latter), the sum of which mostly resolves
concerns about memory. Perl 5 currently offers none of these three
features. But it still does provide all the features to handle these
issues manually, and often the places where they are needed are in
lower level parts of programs, which are often shared as libraries,
and hence you might not need to use the tricks described here
often. You need to know about them though, or you will end up
scratching your head about out of memory errors, segfaults (uh), and
undef values in unexpected places at some point.


### Reference cycles (and self-referential closures)

This is the classic issue with a system like Perl that uses reference
counting to determine when the last reference to a piece of data is
let go. Add a reference to the data structure to itself, and it will
never be freed. The leaked data structures are never reclaimed before
the exit of the process (or at least the perl interpreter) as they are
not reachable anymore (by normal programs). It's well-known in the
Perl community.

The solution is to strategically weaken a reference in the cycle
(usually the cyclic reference that's put inside the structure itself),
using `Scalar::Utils`'s `weaken`. `FP::Weak` also has `Weakened` which
is often handy, and `Keep` to protect a reference from such weakening
attacks in case it should remain strong.

The most frequent case involving reference cycles in functional
programs are self-referential closures:

    sub foo {
        my ($start) = @_;
        my $x = calculate_x;
        my $rec; $rec = sub {
            my ($y) = @_;
            is_bar $y ? $y : cons $y, &$rec(barify_a_bit_with $y, $x)
        };
        &{Weakened $rec} ($start)
    }

Without the `Weakened` call, this would leak the closure at $rec. (In
principle, setting `$rec = undef; ` when the subroutine is done would
work, too, but the subroutine might lose control due to an unavoidable
exception like out of memory or a signal handler that calls `die`, in
which case it would still be leaked.)

Note that alternative, and often better, solutions for
self-referential closures exist: `FP::fix`, and `_SUB_` from `use
v5.16`. Also, sometimes (well, always when one is fine with passing
all the context explicitely) a local subroutine can be moved to the
toplevel and bound to normal subroutine package variables, which makes
it visible to itself by default.


### Variable life times

Lexical variables in the current implementation of the perl
interpreter live until the scope in which they are defined is
exited. Note explicitely that this means they may still reference
their data even at points of the code within a scope from which on
they can never be used anymore. Example:

    {
        my $s = ["Hello"];
        print $$s[0], "\n";
        main_event_loop(); # the array remains allocated till the event
                           # loop returns, even though never (normally)
                           # accessible
    }

You may ask why you should care about a little data staying
around. The first answer is that the data might be big, but the more
important second answer in the context of functional programming is
that the data structure might be a hierarchical data structure like a
linked list that's passed on, and then appended to there (by way of
mutation, or in the case of lazy functional programming, by way of
mutation hidden in promises). The top (or head, first, in case of
linked lists) of the data structure might be released by the called
code as time goes on. But the variable in the calling scope will still
hold on to it, meaning, it will grow, possibly without
bounds. Example:

    {
        my $s = xfile_lines $path; # lazy linked list of lines
        print "# ".$s->first."\n";
        $s->for_each (sub { print "> $_[0]\n" });
    }

Without further ado, this will retain all lines of the file at `$path`
in `$s` while the `for_each` forces in (and itself releases) line
after line.

This is a problem that many programming language implementations (that
are not written to support lazy evaluation) have. Luckily in the case
of Perl, it can be worked around, by assigning `undef` or better
weakening the variable from within the called method:

    sub for_each {
        my ($s, $proc) = @_;
        weaken $_[0];
        ...
    }

`weaken` is a bit more friendly than `$_[0] = undef;` in that it
leaves the variable set if there's still another reference to the head
around.

With this trick (which is used in all of the relevant
functions/methods in `FP::Stream`), the above example actually *does*
release the head of the stream in a timely manner.

Now there may be situations where you actually really want to keep
`$s` alive. In such a case, you can protect its value from being
clobbered by passing it through the `Keep` function from `FP::Weak`:

    {
        my $s = xfile_lines $path; # lazy linked list of lines
        print "# ".$s->first."\n";
        Keep($s)->for_each (sub { print "> ".$_[0]."\n" });
        $s->for_each (sub { print "again: > ".$_[0]."\n" });
    }

Of course this *will* keep the whole file in memory until it reaches
the second `for_each`! So perhaps you'd really want to do the
following:

    {
        my $s = xfile_lines $path; # lazy linked list of lines
        print "# ".$s->first."\n";
        $s->for_each (sub { print "> ".$_[0]."\n" });
        $s = xfile_lines $path; # reopen the file from the start
        $s->for_each (sub { print "again: > ".$_[0]."\n" });
    }

This is probably the ugliest part when programming functionally in
Perl.  Perhaps the interpreter could be changed (or a lowlevel module
written) so that lexical variables are automatically cleared upon
their last access (and something like @_ = () is enough to clear it from
the perl calling stack, if not automatic). An argument against this is
inspection using debuggers or modules like `PadWalker`, so it will
have to be enabled explicitely (lexically scoped).


### Stack memory and tail calls

Another, closely related, place where the perl interpreter does not
release memory in a timely (enough for some programs) manner, are
subroutine calls in tail position. The tail position is the place of
the last expression or statement in a (subroutine) scope. There's no
need to remember the current context (other than, again, to aid
inspection for debugging), and hence the current context could be
released and the tail-called subroutine be made to return directly to
the parent context, but the interpreter doesn't do it.

    sub sum_map_to {
        my ($fn, $start, $end, $total) = @_;
        # this example only contains an expression in tail position
        # (ignoring the variable binding statement).
        $start < $end ?
            sum_map_to ($fn, $start + 1, $end, $total + &$fn($start))
          : $total
    }

This causes code using recursion to allocate stack memory proportional
to the number of recursive calls, even if the calls are all in tail
position. It keeps around a chain of return addresses, but also (due
to the issue described in the previous section) references to possibly
unused data.

See [`intro/tailcalls`](../intro/tailcalls) and
[`intro/more_tailcalls`](../intro/more_tailcalls) for solutions to
this problem.

(Perhaps a bytecode optimizer could be written that, given a pragma,
automatically turns calls in tail position into gotos.)

In simple cases like above, the code can also be changed to use Perl's
`while`, `for`, or `redo LABEL` constructs instead. The latter looks
closest to actual function calls, if that's something you'd like to
retain:

    sub sum_map_to {
    sum_map_to: {
        my ($fn, $start, $end, $total) = @_;
        # this example only contains an expression in tail position
        # (ignoring the variable binding statement).
        $start < $end ?
            do { @_ = ($fn, $start + 1, $end, $total + &$fn($start));
                 redo sum_map_to }
          : $total
    }}

(Automatically turning such simple self tail calls into redo may
perhaps also be doable by way of a bytecode optimizer.)


### C stack memory and freeing of nested data structures

When Perl deallocates nested data structures, it uses space on the C
(not Perl language) stack for the recursion into the data
structure. When a structure to be freed is nested deeply enough (like
with long linked lists), this will make the interpreter run out of
stack space, which will be reported as a segfault on most
systems. There are two different possible remedies for this:

  * __increase the system stack__ size by changing the corresponding
    resource limit (e.g. see `help ulimit` in Bash). (Since
    stack space is virtual memory, giving a huge number (like 1 GB) is
    not a problem, it will only actually use RAM once needed (alright,
    it won't be freed anymore afterwards and will eventually go to
    swap if the program doesn't exit before).)

  * being careful __not to let go of a deeply nested structure at
    once__. By using `FP::Stream` instead of `FP::List` for bigger
    lists and taking care that the head of the stream is not being
    retained, there will never be any long list in memory at any given
    time (it is being reclaimed piece after piece instead of all at
    once)


Note that the same workaround as used with streams (weakening entries
in `@_`) will help with incremental deallocation with non-lazy lists
as well, and hence avoid the need for a big C stack, and avoid the
cumulation of time needed to deallocate the list (bad for soft
real-time latency). But weakening of non-lazy lists will/would be more
painful to handle for users, as it's more common to reuse them than
their lazy cousins, and thus functions in the functional-perl
libraries for non-lazy lists do not weaken their arguments. Arguably
it would really be best to make the language handle lifetimes
automatically (lexical variable analysis), it would benefit both the
lazy and non-lazy cases. (Side note: *some* streams, i.e. those with a
definition that uses earlier elements of themselves, have reference
cycles, and will need weakening of the head even with the lexical
analysis; but this should be a special case that should be fine to
handle accordingly.)

### See also

* A [post](https://news.ycombinator.com/item?id=8734719) about streams
  in Scheme mentioning the memory retention issues that even some
  Scheme implementations can have.


## Object oriented functional programming

Note that "functional" in the context of "functional programming" does
not mean "using function call syntax instead of method call syntax". A
subroutine that prints something to stdout or modifies some of its
arguments or variables outside its inner scope is not a *pure*
function, which is what "functional" in "functional programming"
implies. A pure function's *only* effect (as observable by a purely
functional program, i.e. ignoring the use of machine resources and
time) is its result value, and is dependent *only* on its arguments
and immutable values (values that are never modified during a program
run).

Even aside the above confusion, there seems to be a rather widespread
notion that object oriented and functional programming are at odds
with each other. This is only the case if "object oriented" implies a
style that mutates the object(s) or has other side effects. So whether
there is a conflict depends on the definition of "object
orientation". The author of this text has never found a precise
definition. Wikipedia
[writes](https://en.wikipedia.org/wiki/Object_oriented_programming):

> *A distinguishing feature of objects is that an object's procedures
> can access and often modify the data fields of the object with which
> they are associated.*

It only says "often modify", not that modification is a required part
of the methodology. (Also, it doesn't say whether the modification is
carried out by side-effecting the object or by returning a modified
version of the object.)

If individual method implementations all follow the rules for a pure
function, then the method call only adds the dispatch on the object
type. The object can be thought of (and in Perl actually is) an
additional function argument and its type simply becomes part of the
algorithm, and the whole remains pure. Functional programming
languages often offer pattern matching that can be used to the same
effect, or other features that are even closer (e.g. Haskell's type
classes). Or to say it in a more pointed way: remove side effects from
your coding, and your code is purely functional, totally regardless
of whether it is implementing classes and objects or not.

To build purely functional classes easily, have a look at
`FP::Struct`. Classes generated with it automatically inherit from
`FP::Abstract::Pure` by default, so that `is_pure` from `FP::Predicates` will
return true (but currently it's your responsibility to not mutate the
objects by writing to their hash fields directly, or if you do so,
only in a manner that doesn't violate the purity as seen from the
outside (e.g. caching is OK if done correctly)). If you would prefer
to extend `Moose` for the same purposes, please

docs/howto.md  view on Meta::CPAN

are mutated, or even write a cache to files in a predetermined (and
documented as 'untouchable') location. But make sure that the purity
doesn't break in exceptional situations, like when (temporarily)
running out of memory, or when exceptions are thrown from signal
handlers; just like the safe way to write files on unix is to write
them to a temporary location then doing a rename to guarantee
atomicity, you shouldn't leave unfinished state linger around when
interrupted, or purity will break.


## Debugging

### General

* Write small functions, test them from an embedded `use FP::Repl;
  repl;` (or `FP::Repl::repl();` if already loaded elsewhere) placed in the
  module body. Write bigger functions by reusing smaller ones. Write
  test cases (using `Chj::TEST` to make it as easy as possible) for
  all of them so that when you need to adapt a function, you can
  refactor it to be parameterized (instead of doing copy-paste
  programming) without fear of breakage.

* You can still use "printf debugging" (plain old `warn` etc.) even
  in pure code. Treat STDERR as exempt from the purity
  requirements.

* When you don't understand what's going on inside a function, place a
  `repl` call right into it and use `:e`, `:b` etc. (see `:?`) to
  inspect the context and experiment with local function calls.

* Use `Chj::Backtrace` in your program to see errors with stack
  traces. Or use `FP::Repl::Trap` or better `FP::Repl::AutoTrap` to
  trap uncaught exceptions in a repl.

* Disable tail call optimizations to see the history of function calls
  (TODO: implement a `Sub::Call::Tail` variant that ignores the tail
  call declarations or lets them be turned on/off at runtime?)

* To get more detail on why predicates don't accept a value
  (e.g. those used to restrict `FP::Struct` fields), run this son
  startup (should we make this more convenient via a wrapper loader
  module?):
  
        use FP::Failure; 
        $FP::Failure::use_failure=1;

### Lazy code

Debugging lazy code is more challenging since the order of evaluation
appears pretty chaotic and at least unexpected. Hence, in addition to
the tips above:

* Try to get the code working without lazyness first (don't use `lazy`
  forms, use `FP::List` instead of `FP::Stream`). There's also
  `FP::noLazy` that makes `lazy` a no-op in the current module. See
  the documentation in this module.

* If you're getting `undef` in some place (like a subtree in a `PXML`
  document missing (maybe triggering an undef warning in the
  serializer)) and you don't know where it happens, and think it's
  because of stream weakening, and just want to get the program
  working first without worrying about memory retention, then disable
  weakening using the offerings in `FP::Weak`.

## Tips and recommendations

* Both `Method::Signatures` and `Function::Parameters` offer
  easy subroutine and method argument
  declarations. `Function::Parameters` has the advantage that it's
  lexically scoped, whereas `Method::Signatures` seems to be tied to
  the namespace it was imported into. This means that in a file with
  `{ package Foo; ... }` style sub-namespace parts, it's still enough
  to import `Function::Parameters` just once at the top of the file,
  whereas `Method::Signatures` needs to be imported into every
  package. (Also, it showed confusing error reports when one forgets.) 
  (Todo: did I check the newest version?)

  (Todo: see how argument type assertments using predicate functions
  could be done.)

  Note that Perl now also offers `use experimental "signatures"`,
  which is what is now generally used over `Method::Signatures` and
  `Function::Parameters`.

</with_toc>



( run in 0.544 second using v1.01-cache-2.11-cpan-39bf76dae61 )