FunctionalPerl
view release on metacpan or search on metacpan
docs/howto.md view on Meta::CPAN
(yet); there will be various ways how the parallel evaluation could be
implemented in Perl, and there may be modules on CPAN already (todo:
see what's around and perhaps wrap it in a way consistent with the
rest of functional-perl).
### Comparison to generators
Generators (code that can produce a sequence by `yield`ing elements)
are en vogue in non-functional languages today. They also run code on
demand ('lazily') to produce data elements. How do they compare to
streams (lazy linked lists)?
* `yield` interrupts control flow; from the point of view
of function application, it is magic, it's not part of what
'straight' functions can do. The mechanism of their implementation
is often not accessible; if it is (i.e. built as a library using
only the host language), then first-class continuations are
needed. Those have a bad reputation for being difficult to reason
about. Even Schemers, the community where the concept originated,
recommend to use them sparingly. In comparison, lazy evaluation is
very straight-forward (and even in Scheme implementations with
perfectly efficient first-class continuations, an implementation of
lazy lists is faster than one of generators).
* One benefit (the only?) that generators have is that they don't need
to introduce the concept of linked lists.
* The lazyness mechanism as described above is universal, it doesn't
just apply to sequences, but works seamlessly for things like trees,
or even individual (but expensive to calculate) values.
## Memory handling
The second area (after the multiple-namespaces) where Perl is
inconvenient to get functional programs working is for them to handle
memory correctly. Functional programming languages usually use tracing
garbage collection, have compilers that do live time analysis of
variables, and optimize tail calls by default (although some like
Clojure don't do the latter), the sum of which mostly resolves
concerns about memory. Perl 5 currently offers none of these three
features. But it still does provide all the features to handle these
issues manually, and often the places where they are needed are in
lower level parts of programs, which are often shared as libraries,
and hence you might not need to use the tricks described here
often. You need to know about them though, or you will end up
scratching your head about out of memory errors, segfaults (uh), and
undef values in unexpected places at some point.
### Reference cycles (and self-referential closures)
This is the classic issue with a system like Perl that uses reference
counting to determine when the last reference to a piece of data is
let go. Add a reference to the data structure to itself, and it will
never be freed. The leaked data structures are never reclaimed before
the exit of the process (or at least the perl interpreter) as they are
not reachable anymore (by normal programs). It's well-known in the
Perl community.
The solution is to strategically weaken a reference in the cycle
(usually the cyclic reference that's put inside the structure itself),
using `Scalar::Utils`'s `weaken`. `FP::Weak` also has `Weakened` which
is often handy, and `Keep` to protect a reference from such weakening
attacks in case it should remain strong.
The most frequent case involving reference cycles in functional
programs are self-referential closures:
sub foo {
my ($start) = @_;
my $x = calculate_x;
my $rec; $rec = sub {
my ($y) = @_;
is_bar $y ? $y : cons $y, &$rec(barify_a_bit_with $y, $x)
};
&{Weakened $rec} ($start)
}
Without the `Weakened` call, this would leak the closure at $rec. (In
principle, setting `$rec = undef; ` when the subroutine is done would
work, too, but the subroutine might lose control due to an unavoidable
exception like out of memory or a signal handler that calls `die`, in
which case it would still be leaked.)
Note that alternative, and often better, solutions for
self-referential closures exist: `FP::fix`, and `_SUB_` from `use
v5.16`. Also, sometimes (well, always when one is fine with passing
all the context explicitely) a local subroutine can be moved to the
toplevel and bound to normal subroutine package variables, which makes
it visible to itself by default.
### Variable life times
Lexical variables in the current implementation of the perl
interpreter live until the scope in which they are defined is
exited. Note explicitely that this means they may still reference
their data even at points of the code within a scope from which on
they can never be used anymore. Example:
{
my $s = ["Hello"];
print $$s[0], "\n";
main_event_loop(); # the array remains allocated till the event
# loop returns, even though never (normally)
# accessible
}
You may ask why you should care about a little data staying
around. The first answer is that the data might be big, but the more
important second answer in the context of functional programming is
that the data structure might be a hierarchical data structure like a
linked list that's passed on, and then appended to there (by way of
mutation, or in the case of lazy functional programming, by way of
mutation hidden in promises). The top (or head, first, in case of
linked lists) of the data structure might be released by the called
code as time goes on. But the variable in the calling scope will still
hold on to it, meaning, it will grow, possibly without
bounds. Example:
{
my $s = xfile_lines $path; # lazy linked list of lines
print "# ".$s->first."\n";
$s->for_each (sub { print "> $_[0]\n" });
}
Without further ado, this will retain all lines of the file at `$path`
in `$s` while the `for_each` forces in (and itself releases) line
after line.
This is a problem that many programming language implementations (that
are not written to support lazy evaluation) have. Luckily in the case
of Perl, it can be worked around, by assigning `undef` or better
weakening the variable from within the called method:
sub for_each {
my ($s, $proc) = @_;
weaken $_[0];
...
}
`weaken` is a bit more friendly than `$_[0] = undef;` in that it
leaves the variable set if there's still another reference to the head
around.
With this trick (which is used in all of the relevant
functions/methods in `FP::Stream`), the above example actually *does*
release the head of the stream in a timely manner.
Now there may be situations where you actually really want to keep
`$s` alive. In such a case, you can protect its value from being
clobbered by passing it through the `Keep` function from `FP::Weak`:
{
my $s = xfile_lines $path; # lazy linked list of lines
print "# ".$s->first."\n";
Keep($s)->for_each (sub { print "> ".$_[0]."\n" });
$s->for_each (sub { print "again: > ".$_[0]."\n" });
}
Of course this *will* keep the whole file in memory until it reaches
the second `for_each`! So perhaps you'd really want to do the
following:
{
my $s = xfile_lines $path; # lazy linked list of lines
print "# ".$s->first."\n";
$s->for_each (sub { print "> ".$_[0]."\n" });
$s = xfile_lines $path; # reopen the file from the start
$s->for_each (sub { print "again: > ".$_[0]."\n" });
}
This is probably the ugliest part when programming functionally in
Perl. Perhaps the interpreter could be changed (or a lowlevel module
written) so that lexical variables are automatically cleared upon
their last access (and something like @_ = () is enough to clear it from
the perl calling stack, if not automatic). An argument against this is
inspection using debuggers or modules like `PadWalker`, so it will
have to be enabled explicitely (lexically scoped).
### Stack memory and tail calls
Another, closely related, place where the perl interpreter does not
release memory in a timely (enough for some programs) manner, are
subroutine calls in tail position. The tail position is the place of
the last expression or statement in a (subroutine) scope. There's no
need to remember the current context (other than, again, to aid
inspection for debugging), and hence the current context could be
released and the tail-called subroutine be made to return directly to
the parent context, but the interpreter doesn't do it.
sub sum_map_to {
my ($fn, $start, $end, $total) = @_;
# this example only contains an expression in tail position
# (ignoring the variable binding statement).
$start < $end ?
sum_map_to ($fn, $start + 1, $end, $total + &$fn($start))
: $total
}
This causes code using recursion to allocate stack memory proportional
to the number of recursive calls, even if the calls are all in tail
position. It keeps around a chain of return addresses, but also (due
to the issue described in the previous section) references to possibly
unused data.
See [`intro/tailcalls`](../intro/tailcalls) and
[`intro/more_tailcalls`](../intro/more_tailcalls) for solutions to
this problem.
(Perhaps a bytecode optimizer could be written that, given a pragma,
automatically turns calls in tail position into gotos.)
In simple cases like above, the code can also be changed to use Perl's
`while`, `for`, or `redo LABEL` constructs instead. The latter looks
closest to actual function calls, if that's something you'd like to
retain:
sub sum_map_to {
sum_map_to: {
my ($fn, $start, $end, $total) = @_;
# this example only contains an expression in tail position
# (ignoring the variable binding statement).
$start < $end ?
do { @_ = ($fn, $start + 1, $end, $total + &$fn($start));
redo sum_map_to }
: $total
}}
(Automatically turning such simple self tail calls into redo may
perhaps also be doable by way of a bytecode optimizer.)
### C stack memory and freeing of nested data structures
When Perl deallocates nested data structures, it uses space on the C
(not Perl language) stack for the recursion into the data
structure. When a structure to be freed is nested deeply enough (like
with long linked lists), this will make the interpreter run out of
stack space, which will be reported as a segfault on most
systems. There are two different possible remedies for this:
* __increase the system stack__ size by changing the corresponding
resource limit (e.g. see `help ulimit` in Bash). (Since
stack space is virtual memory, giving a huge number (like 1 GB) is
not a problem, it will only actually use RAM once needed (alright,
it won't be freed anymore afterwards and will eventually go to
swap if the program doesn't exit before).)
* being careful __not to let go of a deeply nested structure at
once__. By using `FP::Stream` instead of `FP::List` for bigger
lists and taking care that the head of the stream is not being
retained, there will never be any long list in memory at any given
time (it is being reclaimed piece after piece instead of all at
once)
Note that the same workaround as used with streams (weakening entries
in `@_`) will help with incremental deallocation with non-lazy lists
as well, and hence avoid the need for a big C stack, and avoid the
cumulation of time needed to deallocate the list (bad for soft
real-time latency). But weakening of non-lazy lists will/would be more
painful to handle for users, as it's more common to reuse them than
their lazy cousins, and thus functions in the functional-perl
libraries for non-lazy lists do not weaken their arguments. Arguably
it would really be best to make the language handle lifetimes
automatically (lexical variable analysis), it would benefit both the
lazy and non-lazy cases. (Side note: *some* streams, i.e. those with a
definition that uses earlier elements of themselves, have reference
cycles, and will need weakening of the head even with the lexical
analysis; but this should be a special case that should be fine to
handle accordingly.)
### See also
* A [post](https://news.ycombinator.com/item?id=8734719) about streams
in Scheme mentioning the memory retention issues that even some
Scheme implementations can have.
## Object oriented functional programming
Note that "functional" in the context of "functional programming" does
not mean "using function call syntax instead of method call syntax". A
subroutine that prints something to stdout or modifies some of its
arguments or variables outside its inner scope is not a *pure*
function, which is what "functional" in "functional programming"
implies. A pure function's *only* effect (as observable by a purely
functional program, i.e. ignoring the use of machine resources and
time) is its result value, and is dependent *only* on its arguments
and immutable values (values that are never modified during a program
run).
Even aside the above confusion, there seems to be a rather widespread
notion that object oriented and functional programming are at odds
with each other. This is only the case if "object oriented" implies a
style that mutates the object(s) or has other side effects. So whether
there is a conflict depends on the definition of "object
orientation". The author of this text has never found a precise
definition. Wikipedia
[writes](https://en.wikipedia.org/wiki/Object_oriented_programming):
> *A distinguishing feature of objects is that an object's procedures
> can access and often modify the data fields of the object with which
> they are associated.*
It only says "often modify", not that modification is a required part
of the methodology. (Also, it doesn't say whether the modification is
carried out by side-effecting the object or by returning a modified
version of the object.)
If individual method implementations all follow the rules for a pure
function, then the method call only adds the dispatch on the object
type. The object can be thought of (and in Perl actually is) an
additional function argument and its type simply becomes part of the
algorithm, and the whole remains pure. Functional programming
languages often offer pattern matching that can be used to the same
effect, or other features that are even closer (e.g. Haskell's type
classes). Or to say it in a more pointed way: remove side effects from
your coding, and your code is purely functional, totally regardless
of whether it is implementing classes and objects or not.
To build purely functional classes easily, have a look at
`FP::Struct`. Classes generated with it automatically inherit from
`FP::Abstract::Pure` by default, so that `is_pure` from `FP::Predicates` will
return true (but currently it's your responsibility to not mutate the
objects by writing to their hash fields directly, or if you do so,
only in a manner that doesn't violate the purity as seen from the
outside (e.g. caching is OK if done correctly)). If you would prefer
to extend `Moose` for the same purposes, please
docs/howto.md view on Meta::CPAN
are mutated, or even write a cache to files in a predetermined (and
documented as 'untouchable') location. But make sure that the purity
doesn't break in exceptional situations, like when (temporarily)
running out of memory, or when exceptions are thrown from signal
handlers; just like the safe way to write files on unix is to write
them to a temporary location then doing a rename to guarantee
atomicity, you shouldn't leave unfinished state linger around when
interrupted, or purity will break.
## Debugging
### General
* Write small functions, test them from an embedded `use FP::Repl;
repl;` (or `FP::Repl::repl();` if already loaded elsewhere) placed in the
module body. Write bigger functions by reusing smaller ones. Write
test cases (using `Chj::TEST` to make it as easy as possible) for
all of them so that when you need to adapt a function, you can
refactor it to be parameterized (instead of doing copy-paste
programming) without fear of breakage.
* You can still use "printf debugging" (plain old `warn` etc.) even
in pure code. Treat STDERR as exempt from the purity
requirements.
* When you don't understand what's going on inside a function, place a
`repl` call right into it and use `:e`, `:b` etc. (see `:?`) to
inspect the context and experiment with local function calls.
* Use `Chj::Backtrace` in your program to see errors with stack
traces. Or use `FP::Repl::Trap` or better `FP::Repl::AutoTrap` to
trap uncaught exceptions in a repl.
* Disable tail call optimizations to see the history of function calls
(TODO: implement a `Sub::Call::Tail` variant that ignores the tail
call declarations or lets them be turned on/off at runtime?)
* To get more detail on why predicates don't accept a value
(e.g. those used to restrict `FP::Struct` fields), run this son
startup (should we make this more convenient via a wrapper loader
module?):
use FP::Failure;
$FP::Failure::use_failure=1;
### Lazy code
Debugging lazy code is more challenging since the order of evaluation
appears pretty chaotic and at least unexpected. Hence, in addition to
the tips above:
* Try to get the code working without lazyness first (don't use `lazy`
forms, use `FP::List` instead of `FP::Stream`). There's also
`FP::noLazy` that makes `lazy` a no-op in the current module. See
the documentation in this module.
* If you're getting `undef` in some place (like a subtree in a `PXML`
document missing (maybe triggering an undef warning in the
serializer)) and you don't know where it happens, and think it's
because of stream weakening, and just want to get the program
working first without worrying about memory retention, then disable
weakening using the offerings in `FP::Weak`.
## Tips and recommendations
* Both `Method::Signatures` and `Function::Parameters` offer
easy subroutine and method argument
declarations. `Function::Parameters` has the advantage that it's
lexically scoped, whereas `Method::Signatures` seems to be tied to
the namespace it was imported into. This means that in a file with
`{ package Foo; ... }` style sub-namespace parts, it's still enough
to import `Function::Parameters` just once at the top of the file,
whereas `Method::Signatures` needs to be imported into every
package. (Also, it showed confusing error reports when one forgets.)
(Todo: did I check the newest version?)
(Todo: see how argument type assertments using predicate functions
could be done.)
Note that Perl now also offers `use experimental "signatures"`,
which is what is now generally used over `Method::Signatures` and
`Function::Parameters`.
</with_toc>
( run in 0.544 second using v1.01-cache-2.11-cpan-39bf76dae61 )