Fsdb
view release on metacpan or search on metacpan
[4mFsdb[24m(3) User Contributed Perl Documentation [4mFsdb[24m(3)
[1mNAME[0m
Fsdb - a flat-text database for shell scripting
[1mSYNOPSIS[0m
Fsdb, the flatfile streaming database is package of commands for
manipulating flat-ASCII databases from shell scripts. Fsdb is useful
to process medium amounts of data (with very little data you'd do it by
hand, with megabytes you might want a real database). Fsdb was known
as as Jdb from 1991 to Oct. 2008.
Fsdb is very good at doing things like:
+o extracting measurements from experimental output
+o examining data to address different hypotheses
+o joining data from different experiments
+o eliminating/detecting outliers
+o computing statistics on data (mean, confidence intervals,
correlations, histograms)
+o reformatting data for graphing programs
Fsdb is built around the idea of a flat text file as a database. Fsdb
files (by convention, with the extension [4m.fsdb[24m), have a header
documenting the schema (what the columns mean), and then each line
represents a database record (or row).
For example:
#fsdb experiment duration
ufs_mab_sys 37.2
ufs_mab_sys 37.3
ufs_rcp_real 264.5
ufs_rcp_real 277.9
Is a simple file with four experiments (the rows), each with a
description, size parameter, and run time in the first, second, and
third columns.
Rather than hand-code scripts to do each special case, Fsdb provides
higher-level functions. Although it's often easy throw together a
custom script to do any single task, I believe that there are several
advantages to using Fsdb:
+o these programs provide a higher level interface than plain Perl, so
** Fewer lines of simpler code:
dbrow '_experiment eq "ufs_mab_sys"' | dbcolstats duration
Picks out just one type of experiment and computes statistics
on it, rather than:
while (<>) { split; $sum+=$F[1]; $ss+=$F[1]**2; $n++; }
$mean = $sum / $n; $std_dev = ...
in dozens of places.
+o the library uses names for columns, so
** No more $F[1], use "_duration".
fsdb-2.x.
SEMI-COMPATIBLE CHANGE
The header of fsdb files has changed, it is now #fsdb, not #h (or
#L) and parsing of -F and -R are also different. See dbfilealter
for the new specification. The v1 file format will be read,
compatibly, but not written.
BUG FIX
dbmapreduce now tolerates comments that precede the first key,
instead of failing with an error message.
[1m2.9, 6-Aug-08[0m
Still in beta; just a quick bug-fix for dbmapreduce.
ENHANCEMENT
dbmapreduce now generates plausible output when given no rows of
input.
[1m2.10, 23-Sep-08[0m
Still in beta, but picking up some bug fixes.
ENHANCEMENT
dbmapreduce now generates plausible output when given no rows of
input.
ENHANCEMENT
dbroweval the warnings option was backwards; now corrected. As a
result, warnings in user code now default off (like in fsdb-1.x).
BUG FIX
dbcolpercentile now defaults to assuming the target column is
numeric. The new option "-N" allows selection of a non-numeric
target.
BUG FIX
dbcolscorrelate now includes "--sample" and "--nosample" options to
compute the sample or full population correlation coefficients.
Thanks to Xue Cai for finding this bug.
[1m2.11, 14-Oct-08[0m
Still in beta, but picking up some bug fixes.
ENHANCEMENT
html_table_to_db is now more aggressive about filling in empty
cells with the official empty value, rather than leaving them blank
or as whitespace.
ENHANCEMENT
dbpipeline now catches failures during pipeline element setup and
exits reasonably gracefully.
BUG FIX
dbsubprocess now reaps child processes, thus avoiding running out
of processes when used a lot.
[1m2.12, 16-Oct-08[0m
Finally, a full (non-beta) 2.x release!
INCOMPATIBLE CHANGE
Jdb has been renamed Fsdb, the flatfile-streaming database. This
change affects all internal Perl APIs, but no shell command-level
APIs. While Jdb served well for more than ten years, it is easily
confused with the Java debugger (even though Jdb was there first!).
It also is too generic to work well in web search engines.
Finally, Jdb stands for ``John's database'', and we're a bit beyond
that. (However, some call me the ``file-system guy'', so one could
argue it retains that meeting.)
If you just used the shell commands, this change should not affect
you. If you used the Perl-level libraries directly in your code,
you should be able to rename "Jdb" to "Fsdb" to move to 2.12.
The jdb-announce list not yet been renamed, but it will be shortly.
With this release I've accomplished everything I wanted to in
fsdb-2.x. I therefore expect to return to boring, bugfix releases.
[1m2.13, 30-Oct-08[0m
BUG FIX
dbrowaccumulate now treats non-numeric data as zero by default.
BUG FIX
Fixed a perl-5.10ism in dbmapreduce that breaks that program under
5.8. Thanks to Martin Lukac for reporting the bug.
[1m2.14, 26-Nov-08[0m
BUG FIX
Improved documentation for dbmapreduce's "-f" option.
ENHANCEMENT
dbcolmovingstats how computes a moving standard deviation in
addition to a moving mean.
[1m2.15, 13-Apr-09[0m
BUG FIX
Fix a [4mmake[24m [4minstall[24m bug reported by Shalindra Fernando.
[1m2.16, 14-Apr-09[0m
BUG FIX
Another minor release bug: on some systems [4mprogramize_module[24m looses
executable permissions. Again reported by Shalindra Fernando.
[1m2.17, 25-Jun-09[0m
TYPO FIXES
Typo in the [4mdbroweval[24m manual fixed.
IMPROVEMENT
There is no longer a comment line to label columns in [4mdbcolneaten[24m,
instead the header line is tweaked to line up. This change
restores the Jdb-1.x behavior, and means that repeated runs of
dbcolneaten no longer add comment lines each time.
BUG FIX
It turns out [4mdbcolneaten[24m was not correctly handling trailing
spaces when given the "-E" option to suppress them. This
regression is now fixed.
EXTENSION
[1mdbroweval[22m(1) can now handle direct references to the last row via
[4m$lfref[24m, a dubious but now documented feature.
( run in 0.909 second using v1.01-cache-2.11-cpan-39bf76dae61 )