unicode results from the CPAN

unicode

DBD-SQLcipher

view release on metacpan or search on metacpan

        DBD::SQLcipher::db->install_method('sqlite_profile', { O => 0x0004 });
        DBD::SQLcipher::db->install_method('sqlite_table_column_metadata', { O => 0x0004 });
        DBD::SQLcipher::db->install_method('sqlite_db_filename', { O => 0x0004 });
        DBD::SQLcipher::db->install_method('sqlite_db_status', { O => 0x0004 });
        DBD::SQLcipher::st->install_method('sqlite_st_status', { O => 0x0004 });
        DBD::SQLcipher::db->install_method('sqlite_create_module');

        $methods_are_installed++;
    }

    $drh = DBI::_new_drh( "$_[0]::dr", {
        Name        => 'SQLcipher',
        Version     => $VERSION,
        Attribution => 'DBD::SQLcipher by Matt Sergeant et al',
    } );

    return $drh;
}

sub CLONE {
    undef $drh;
}


package # hide from PAUSE
    DBD::SQLcipher::dr;

sub connect {
    my ($drh, $dbname, $user, $auth, $attr) = @_;

    # Default PrintWarn to the value of $^W
    # unless ( defined $attr->{PrintWarn} ) {
    #    $attr->{PrintWarn} = $^W ? 1 : 0;
    # }

    my $dbh = DBI::_new_dbh( $drh, {
        Name => $dbname,
    } );

    my $real = $dbname;
    if ( $dbname =~ /=/ ) {
        foreach my $attrib ( split(/;/, $dbname) ) {
            my ($key, $value) = split(/=/, $attrib, 2);
            if ( $key =~ /^(?:db(?:name)?|database)$/ ) {
                $real = $value;
            } elsif ( $key eq 'uri' ) {
                $real = $value;
                $attr->{sqlite_open_flags} |= DBD::SQLcipher::OPEN_URI();
            } else {
                $attr->{$key} = $value;
            }
        }
    }

    if (my $flags = $attr->{sqlite_open_flags}) {
        unless ($flags & (DBD::SQLcipher::OPEN_READONLY() | DBD::SQLcipher::OPEN_READWRITE())) {
            $attr->{sqlite_open_flags} |= DBD::SQLcipher::OPEN_READWRITE() | DBD::SQLcipher::OPEN_CREATE();
        }
    }

    # To avoid unicode and long file name problems on Windows,
    # convert to the shortname if the file (or parent directory) exists.
    if ( $^O =~ /MSWin32/ and $real ne ':memory:' and $real ne '' and $real !~ /^file:/ and !-f $real ) {
        require File::Basename;
        my ($file, $dir, $suffix) = File::Basename::fileparse($real);
        # We are creating a new file.
        # Does the directory it's in at least exist?
        if ( -d $dir ) {
            require Win32;
            $real = join '', grep { defined } Win32::GetShortPathName($dir), $file, $suffix;
        } else {
            # SQLcipher can't do mkpath anyway.
            # So let it go through as it and fail.
        }
    }

    # Hand off to the actual login function
    DBD::SQLcipher::db::_login($dbh, $real, $user, $auth, $attr) or return undef;

    # Register the on-demand collation installer, REGEXP function and
    # perl tokenizer
    if ( DBD::SQLcipher::NEWAPI ) {
        $dbh->sqlite_collation_needed( \&install_collation );
        $dbh->sqlite_create_function( "REGEXP", 2, \&regexp );
        $dbh->sqlite_register_fts3_perl_tokenizer();
    } else {
        $dbh->func( \&install_collation, "collation_needed"  );
        $dbh->func( "REGEXP", 2, \&regexp, "create_function" );
        $dbh->func( "register_fts3_perl_tokenizer" );
    }

    # HACK: Since PrintWarn = 0 doesn't seem to actually prevent warnings
    # in DBD::SQLcipher we set Warn to false if PrintWarn is false.

    # NOTE: According to the explanation by timbunce,
    # "Warn is meant to report on bad practices or problems with
    # the DBI itself (hence always on by default), while PrintWarn
    # is meant to report warnings coming from the database."
    # That is, if you want to disable an ineffective rollback warning
    # etc (due to bad practices), you should turn off Warn,
    # and to silence other warnings, turn off PrintWarn.
    # Warn and PrintWarn are independent, and turning off PrintWarn
    # does not silence those warnings that should be controlled by
    # Warn.

    # unless ( $attr->{PrintWarn} ) {
    #     $attr->{Warn} = 0;
    # }

    return $dbh;
}

sub install_collation {
    my $dbh       = shift;
    my $name      = shift;
    my $collation = $DBD::SQLcipher::COLLATION{$name};
    unless ($collation) {
        warn "Can't install unknown collation: $name" if $dbh->{PrintWarn};
        return;
    }
    if ( DBD::SQLcipher::NEWAPI ) {

lib/DBD/SQLcipher.pm view on Meta::CPAN

Since 1.30_01, you can retrieve those ignored (unprepared) statements
via C<< $sth->{sqlite_unprepared_statements} >>. It usually contains
nothing but white spaces, but if you really care, you can check this
attribute to see if there's anything left undone. Also, if you set
a C<sqlite_allow_multiple_statements> attribute of a database handle
to true when you connect to a database, C<do> method automatically
checks the C<sqlite_unprepared_statements> attribute, and if it finds
anything undone (even if what's left is just a single white space),
it repeats the process again, to the end.

=head2 Performance

SQLcipher is fast, very fast. Matt processed his 72MB log file with it,
inserting the data (400,000+ rows) by using transactions and only
committing every 1000 rows (otherwise the insertion is quite slow),
and then performing queries on the data.

Queries like count(*) and avg(bytes) took fractions of a second to
return, but what surprised him most of all was:

  SELECT url, count(*) as count
  FROM access_log
  GROUP BY url
  ORDER BY count desc
  LIMIT 20

To discover the top 20 hit URLs on the site (L<http://axkit.org>),
and it returned within 2 seconds. He was seriously considering
switching his log analysis code to use this little speed demon!

Oh yeah, and that was with no indexes on the table, on a 400MHz PIII.

For best performance be sure to tune your hdparm settings if you
are using linux. Also you might want to set:

  PRAGMA synchronous = OFF

Which will prevent SQLcipher from doing fsync's when writing (which
slows down non-transactional writes significantly) at the expense
of some peace of mind. Also try playing with the cache_size pragma.

The memory usage of SQLcipher can also be tuned using the cache_size
pragma.

  $dbh->do("PRAGMA cache_size = 800000");

The above will allocate 800M for DB cache; the default is 2M.
Your sweet spot probably lies somewhere in between.

=head1 DRIVER PRIVATE ATTRIBUTES

=head2 Database Handle Attributes

=over 4

=item sqlite_version

Returns the version of the SQLcipher library which B<DBD::SQLcipher> is using,
e.g., "2.8.0". Can only be read.

=item sqlite_unicode

If set to a true value, B<DBD::SQLcipher> will turn the UTF-8 flag on for all
text strings coming out of the database (this feature is currently disabled
for perl < 5.8.5). For more details on the UTF-8 flag see
L<perlunicode>. The default is for the UTF-8 flag to be turned off.

Also note that due to some bizarreness in SQLcipher's type system (see
L<http://www.sqlite.org/datatype3.html>), if you want to retain
blob-style behavior for B<some> columns under C<< $dbh->{sqlite_unicode} = 1
>> (say, to store images in the database), you have to state so
explicitly using the 3-argument form of L<DBI/bind_param> when doing
updates:

  use DBI qw(:sql_types);
  $dbh->{sqlite_unicode} = 1;
  my $sth = $dbh->prepare("INSERT INTO mytable (blobcolumn) VALUES (?)");
  
  # Binary_data will be stored as is.
  $sth->bind_param(1, $binary_data, SQL_BLOB);

Defining the column type as C<BLOB> in the DDL is B<not> sufficient.

This attribute was originally named as C<unicode>, and renamed to
C<sqlite_unicode> for integrity since version 1.26_06. Old C<unicode>
attribute is still accessible but will be deprecated in the near future.

=item sqlite_allow_multiple_statements

If you set this to true, C<do> method will process multiple
statements at one go. This may be handy, but with performance
penalty. See above for details.

=item sqlite_use_immediate_transaction

If you set this to true, DBD::SQLcipher tries to issue a C<begin
immediate transaction> (instead of C<begin transaction>) when
necessary. See above for details.

As of version 1.38_01, this attribute is set to true by default.
If you really need to use C<deferred> transactions for some reasons,
set this to false explicitly.

=item sqlite_see_if_its_a_number

If you set this to true, DBD::SQLcipher tries to see if the bind values
are number or not, and does not quote if they are numbers. See above
for details.

=back

=head2 Statement Handle Attributes

=over 4

=item sqlite_unprepared_statements

Returns an unprepared part of the statement you pass to C<prepare>.
Typically this contains nothing but white spaces after a semicolon.
See above for details.

=back

=head1 METHODS

See also to the L<DBI> documentation for the details of other common
methods.

=head2 table_info

  $sth = $dbh->table_info(undef, $schema, $table, $type, \%attr);

Returns all tables and schemas (databases) as specified in L<DBI/table_info>.
The schema and table arguments will do a C<LIKE> search. You can specify an
ESCAPE character by including an 'Escape' attribute in \%attr. The C<$type>
argument accepts a comma separated list of the following types 'TABLE',
'VIEW', 'LOCAL TEMPORARY' and 'SYSTEM TABLE' (by default all are returned).
Note that a statement handle is returned, and not a direct list of tables.

The following fields are returned:

B<TABLE_CAT>: Always NULL, as SQLcipher does not have the concept of catalogs.

B<TABLE_SCHEM>: The name of the schema (database) that the table or view is
in. The default schema is 'main', temporary tables are in 'temp' and other

lib/DBD/SQLcipher.pm view on Meta::CPAN

comparison functions, known as user-defined "collation sequences" or
"collating functions", to be used for comparing two text values.
L<http://www.sqlite.org/datatype3.html#collation>
explains how collations are used in various SQL expressions.

=head2 Builtin collation sequences

The following collation sequences are builtin within SQLcipher :

=over

=item B<BINARY>

Compares string data using memcmp(), regardless of text encoding.

=item B<NOCASE>

The same as binary, except the 26 upper case characters of ASCII are
folded to their lower case equivalents before the comparison is
performed. Note that only ASCII characters are case folded. SQLcipher
does not attempt to do full UTF case folding due to the size of the
tables required.

=item B<RTRIM>

The same as binary, except that trailing space characters are ignored.

=back

In addition, C<DBD::SQLcipher> automatically installs the
following collation sequences :

=over

=item B<perl>

corresponds to the Perl C<cmp> operator

=item B<perllocale>

Perl C<cmp> operator, in a context where C<use locale> is activated.

=back

=head2 Usage

You can write for example

  CREATE TABLE foo(
      txt1 COLLATE perl,
      txt2 COLLATE perllocale,
      txt3 COLLATE nocase
  )

or

  SELECT * FROM foo ORDER BY name COLLATE perllocale

=head2 Unicode handling

If the attribute C<< $dbh->{sqlite_unicode} >> is set, strings coming from
the database and passed to the collation function will be properly
tagged with the utf8 flag; but this only works if the
C<sqlite_unicode> attribute is set B<before> the first call to
a perl collation sequence . The recommended way to activate unicode
is to set the parameter at connection time :

  my $dbh = DBI->connect(
      "dbi:SQLcipher:dbname=foo", "", "",
      {
          RaiseError     => 1,
          sqlite_unicode => 1,
      }
  );

=head2 Adding user-defined collations

The native SQLcipher API for adding user-defined collations is
exposed through methods L</"sqlite_create_collation"> and
L</"sqlite_collation_needed">.

To avoid calling these functions every time a C<$dbh> handle is
created, C<DBD::SQLcipher> offers a simpler interface through the
C<%DBD::SQLcipher::COLLATION> hash : just insert your own
collation functions in that hash, and whenever an unknown
collation name is encountered in SQL, the appropriate collation
function will be loaded on demand from the hash. For example,
here is a way to sort text values regardless of their accented
characters :

  use DBD::SQLcipher;
  $DBD::SQLcipher::COLLATION{no_accents} = sub {
    my ( $a, $b ) = map lc, @_;
    tr[àâáäåãçðèêéëìîíïñòôóöõøùûúüý]
      [aaaaaacdeeeeiiiinoooooouuuuy] for $a, $b;
    $a cmp $b;
  };
  my $dbh  = DBI->connect("dbi:SQLcipher:dbname=dbfile");
  my $sql  = "SELECT ... FROM ... ORDER BY ... COLLATE no_accents");
  my $rows = $dbh->selectall_arrayref($sql);

The builtin C<perl> or C<perllocale> collations are predefined
in that same hash.

The COLLATION hash is a global registry within the current process;
hence there is a risk of undesired side-effects. Therefore, to
prevent action at distance, the hash is implemented as a "write-only"
hash, that will happily accept new entries, but will raise an
exception if any attempt is made to override or delete a existing
entry (including the builtin C<perl> and C<perllocale>).

If you really, really need to change or delete an entry, you can
always grab the tied object underneath C<%DBD::SQLcipher::COLLATION> ---
but don't do that unless you really know what you are doing. Also
observe that changes in the global hash will not modify existing
collations in existing database handles: it will only affect new
I<requests> for collations. In other words, if you want to change
the behaviour of a collation within an existing C<$dbh>, you
need to call the L</create_collation> method directly.

=head1 FULLTEXT SEARCH

SQLcipher is bundled with an extension module for full-text
indexing. Tables with this feature enabled can be efficiently queried
to find rows that contain one or more instances of some specified
words, in any column, even if the table contains many large documents.

Explanations for using this feature are provided in a separate document:
see L<DBD::SQLcipher::Fulltext_search>.


=head1 R* TREE SUPPORT

( run in 2.137 seconds using v1.01-cache-2.11-cpan-97f6503c9c8 )