view release on metacpan or search on metacpan
1.71_01 2021-12-02
- Upgraded SQLite to 3.37.0
- Add a feature to unregister a created function
- Fix accented characters in POD (GH#90, HaraldJoerg++)
1.70 2021-08-01
- Switched to a production version
1.69_02 2021-07-30
- Fix doc to use the correct attribute with sqlite_ (GH#86, eekboek++)
- Modify the fix to silence the sqlite_unicode warning not to check
the attribute twice
- Fix an encoding issue of naive (GH#83, HaraldJoerg++)
1.69_01 2021-07-30
- Typo (GH#85, grr++)
- Silenced deprecation warning of sqlite_unicode not to break
tests of existing applications
1.68 2021-07-22
- Switched to a production version
1.67_07 2021-06-19
- Upgraded SQLite to 3.36.0
1.67_06 2021-06-14
- Experiment with another quadmath patch to see if it works
1.67_05 2021-06-13
- Made DBD_SQLITE_STRING_MODE constants exportable
1.67_04 2021-05-31
- Upgraded SQLite to 3.35.5
- Stop setting THREADSAFE=0 if perl has pthread (ie. 5.20+)
(Bjoern Hoehrmann++, GH#69, #72)
- Fixed a memory leak in ::VirtualTable
- Introduced "string_mode" handle attribute (Felipe Gasper++)
to fix long-standing issues of sqlite_unicode (GH#78, #68)
- Added a dependency from dbdimp.o to the *.inc files included
into dbdimp.c (Laurent Dami++, GH#74)
- Fixed an offset issue of VirtualTable (Laurent Dami++, GH#75)
1.67_03 2021-03-31
- Upgraded SQLite to 3.35.3
- Enabled math functions introduced in SQLite 3.35
- Fix quadmath issues (Tux++, leont++)
1.67_02 2020-12-06
1.51_01 2016-02-20
*** CHANGES THAT MAY POSSIBLY BREAK YOUR OLD APPLICATIONS ***
- Updated to SQLite 3.11.0.
As upstream disabled two-arg fts3_tokenizer() for security concern,
DBD::SQLite also stopped enabling it by default. If you do need
perl tokenizer, compile/install with SQLITE_ENABLE_FTS3_TOKENIZER
environmental variable.
- Applied a doc patch by Salvatore Bonaccorso
- Enabled (experimental) FTS5
- Fixed REGEXP function to work under sqlite_unicode correctly
(András Farkas++)
1.50 2016-02-11
- Switched to a production version.
1.49_08 2016-01-30
- no significant code changes
- Resolved RT#111558: Virtual table tests depend on enhanced
query syntax availability (vlmarek++)
- Ingore FTS tests if FTS is not available
1.43_06 2014-07-22
- Fixed compile error/warning for older perls (reported by ribasushi)
(ISHIGAKI)
1.43_05 2014-07-21
- No significant code changes; removed unnecessary dependencies.
1.43_04 2014-07-21
*** CHANGES THAT MAY POSSIBLY BREAK YOUR OLD APPLICATIONS ***
- Resolved #96877: sql statements should be converted to utf8 (DAMI)
If you set sqlite_unicode to true, SQL statements will be upgraded
to avoid inconsistency between embedded params and bind params.
- Resolved #96494: [PATCH] add SYSTEM TABLE to table_info() type
list (MJP)
- Supported virtual tables in Perl, and added two sample tables
(DAMI++)
1.43_03 2014-06-12
- Updated to SQLite 3.8.5, which should fix query planner's
issues in SQLite (ISHIGAKI)
1.40 2013-07-28
- NetBSD also doesn't like the _XOPEN_SOURCE hack (ISHIGAKI)
- Resolved #86080: PATCH: statistics_info support (DDICK)
1.39 2013-05-31
- Production release, no changes from 1.38_05
1.38_05 2013-05-31
- OpenBSD doesn't like the previous _XOPEN_SOURCE hack (ISHIGAKI)
- Disabled a unicode-related test for older perls (ISHIGAKI)
1.38_04 2013-05-29
- Tentatively defined _XOPEN_SOURCE under *BSD systems to see
if it solves a compilation issue for threaded perls (ISHIGAKI)
1.38_03 2013-05-20
*** NOTICE ON QUERY OPTIMIZER ENHANCEMENT ***
- As of SQLite 3.7.15, SQLite's query optimizer was enhanced
and the result order of a SELECT statement without an ORDER
BY clause may be different from the one of the previous
it actually broke things. This feature probably will be
enabled by default by the sqlite team in the future, and
eventually you'll need to cope with it, but right now we
agreed with some discussion to give you more time to be
prepared. If you use referential stuff in your schema
(which sqlite ignores now) should do extensive testing
to ensure that they will work when you issue "PRAGMA
foreign_keys = ON". (ISHIGAKI)
- Updated to SQLite 3.6.20 (DUNCAND)
- Resolved #50935: there remained old "unicode" attribute usage
in the pod, spotted by ASHLEY. (ISHIGAKI)
1.26_06 2009-10-28
*** CHANGES THAT MAY POSSIBLY BREAK YOUR OLD APPLICATIONS ***
- Removed undocumented (and most probably unused) reset method
from a statement handle (which was only accessible via func().)
Simply use "$sth->finish" instead. (ISHIGAKI)
- Now DBD::SQLite supports foreign key constraints by default.
Long-ignored foreign keys (typically written for other DB
engines) will start working. If you don't want this feature,
issue a pragma to disable foreign keys. (ISHIGAKI)
- Renamed "unicode" attribute to "sqlite_unicode" for integrity.
Old "unicode" attribute is still accessible but will be
deprecated in the near future. (ISHIGAKI)
- You can see file/line info while tracing even if you compile
with a non-gcc compiler. (ISHIGAKI)
- Major code refactoring. (ISHIGAKI)
- Pod reorganized, and some of the missing bits (including
pragma) are added. (ISHIGAKI)
1.26_05 2009-10-15
- Updated to SQLite 3.6.19 (ISHIGAKI)
- Added access to Online Backup functionality. (TJC)
- Added enable_load_extension pod (ISHIGAKI)
- Now private methods/functions return true after successful
calls (#44871) (ISHIGAKI)
- Removed all of the "croak"s (#44871) (ISHIGAKI)
1.26_01 2009-05-05
- Added ORDINAL_POSITION support for $dbh->column_info (ADAMK)
- Applied several fixes from GFUJI to clean up code (#45578)
(ISHIGAKI)
- Skipped some of the unicode path tests under cygwin (#45166)
(JDHEDDEN)
- Added some explanation and workarounds for a SQL that
compares a return value of a function with a numeric bind
value (ISHIGAKI)
1.25 2009-04-23
- Amalgamation conversion turned out to be quicker than expected.
- Changing to a production release. (ADAMK)
1.24_02 2009-04-22
- Merging various externally-contributed annotations from
annocpan.org (ADAMK)
- Created the beginnings of a DBD::SQLite::Cookbook (ADAMK)
1.24_01 2009-04-22
- Moved getsqlite.pl into util (ADAMK)
- Switching to the RT queue instead of the RT report page that
does nothing and just refers you to email (ADAMK)
- Now DBD::SQLite also uses amalgamated source recommended at sqlite.org (ISHIGAKI)
- Resolved #45166: better unicode path handling under cygwin (ISHIGAKI)
- Resolved #45171: test failure on CentOS 4.6 (ISHIGAKI)
1.23 2009-04-19
- No changes from 1.22_08, just switched to production release (ADAMK)
1.22_08 2009-04-17
- Completed the migration of all tests and deleted lib.pl (ADAMK)
- Prevented a double "commit is innefective" warning (ADAMK)
- Adding a version for the Win32.pm dependency (ADAMK)
1.22_07 2009-04-16
- Improved non-latin unicode filename support/test
on Windows (SZABGAB/ISHIGAKI)
- Removed the table name generator from t/lib.pl,
getting us closer to removing t/lib.pl entirely (ADAMK)
- Increased use of Test::NoWarnings (ADAMK)
- Converted half the remaining lib.pl tests to t::lib::Test (ADAMK)
- Require Win32.pm on Windows (CHORNY)
1.22_06 2009-04-15
- Simplifying various miscellaneous code (ADAMK)
- Adding support for non-latin unicode filenames on Windows (ADAMK)
1.22_05 2009-04-15
- Hopefully the last dev release before the next production release.
- Updated to SQLite 3.6.13 (DUNCAND)
- Setting svn:eol-style to native to prevent EOL issues (ADAMK)
- Resolved #44861: tweaked Makefile.PL to support older HP-UX (ISHIGAKI)
1.22_04 2009-04-11
- Adding support parsing attributes out of the DSN (ADAMK)
- Inserted pTHX_/aTHX_ for better efficiency (suggested in #44884 by TIMB) (ISHIGAKI)
1.22_03 2009-04-10
- Resolved #44876: Patch to fix includes in the SQLITE_LOCATION case by janus (ISHIGAKI)
- Added PERL_NO_GET_CONTEXT for efficiency (suggested in #44884 by TIMB) (ISHIGAKI)
- Refactored error handling (suggested in #44884, #44871 by TIMB) (ISHIGAKI)
1.22_02 2009-04-09
- Added missing documentation bits for 'create_collation'
and 'progress_handler' (DAMI)
- Resolved RT#25924 (Arguments to user-defined functions do not
respect unicode setting) (DAMI)
- Added comments on the return values on error, and fixed another
wrong return value in execute (ISHIGAKI)
- Added SQL_NULLABLE_UNKNOWN; still wonders if the error above
should be ignored or not (ISHIGAKI)
1.22_01 2009-04-09
- Resolved #25371: Calls sv_utf8_upgrade on strings going into
the database to make sure latin-1 strings are not saved as
Malformed UTF-8 character in the SQLite TEXT column (MIYAGAWA)
- Improved error handling
(many MANY thanks to Tim for all these patches!)
- Updated to sqlite 2.8.0
0.24
- Fixed major crash bug affecting Mac OS X
- Removed test.pl from distribution
- Upgraded to sqlite 2.7.6
0.23
- Fixed unicode tests
0.22
- Merge with sqlite 2.7.4
0.21
- Ooops - forgot new opcodes files from MANIFEST
0.20
- Port to SQLite 2.7.2
- Fixed bug in not freeing memory if you re-execute a $sth
- Fixed getsqlite.pl
0.16
- Upgraded to SQLite 2.5.0
0.15
- Upgraded to SQLite 2.4.5
0.14
- Added NoUTF8Flag option, so that returned strings don't get flagged
with SvUTF8_on() - needed when you're storing non-unicode in the database
0.13
- Upgraded to SQLite 2.4.3
- Added script to download sqlite core library when it's upgraded
0.12
- Upgraded to SQLite 2.4.2
0.11
- Upgraded to SQLite 2.4.0, which adds views, subqueries, new builtin
t/02_logon.t
t/03_create_table.t
t/04_insert.t
t/05_select.t
t/06_tran.t
t/07_error.t
t/08_busy.t
t/09_create_function.t
t/10_create_aggregate.t
t/11_get_info.t
t/12_unicode.t
t/13_create_collation.t
t/14_progress_handler.t
t/15_ak_dbd.t
t/16_column_info.t
t/17_createdrop.t
t/18_insertfetch.t
t/19_bindparam.t
t/20_blobs.t
t/21_blobtext.t
t/22_listfields.t
t/65_db_config.t
t/66_get_autocommit.t
t/cookbook_variance.t
t/lib/SQLiteTest.pm
t/rt_106151_outermost_savepoint.t
t/rt_106950_extra_warnings_with_savepoints.t
t/rt_115465_column_info_with_spaces.t
t/rt_124227_index_regression.t
t/rt_15186_prepcached.t
t/rt_21406_auto_finish.t
t/rt_25371_asymmetric_unicode.t
t/rt_25460_numeric_aggregate.t
t/rt_25924_user_defined_func_unicode.t
t/rt_26775_distinct.t
t/rt_27553_prepared_cache_and_analyze.t
t/rt_29058_group_by.t
t/rt_29629_sqlite_where_length.t
t/rt_31324_full_names.t
t/rt_32889_prepare_cached_reexecute.t
t/rt_36836_duplicate_key.t
t/rt_36838_unique_and_bus_error.t
t/rt_40594_nullable.t
t/rt_48393_debug_panic_with_commit.t
t/rt_50503_fts3.t
t/rt_52573_manual_exclusive_lock.t
t/rt_53235_icu_compatibility.t
t/rt_62370_diconnected_handles_operation.t
t/rt_64177_ping_wipes_out_the_errstr.t
t/rt_67581_bind_params_mismatch.t
t/rt_71311_bind_col_and_unicode.t
t/rt_73159_fts_tokenizer_segfault.t
t/rt_73787_exponential_buffer_overflow.t
t/rt_76395_int_overflow.t
t/rt_77724_primary_key_with_a_whitespace.t
t/rt_78833_utf8_flag_for_column_names.t
t/rt_81536_multi_column_primary_key_info.t
t/rt_88228_sqlite_3_8_0_crash.t
t/rt_96050_db_filename_for_a_closed_database.t
t/rt_96877_unicode_statements.t
t/rt_96878_fts_contentless_table.t
t/rt_97598_crash_on_disconnect_with_virtual_tables.t
t/virtual_table/00_base.t
t/virtual_table/01_destroy.t
t/virtual_table/02_find_function.t
t/virtual_table/10_filecontent.t
t/virtual_table/11_filecontent_fulltext.t
t/virtual_table/20_perldata.t
t/virtual_table/21_perldata_charinfo.t
t/virtual_table/rt_124941.t
The above will allocate 800M for DB cache; the default is 2M. Your
sweet spot probably lies somewhere in between.
DRIVER PRIVATE ATTRIBUTES
Database Handle Attributes
sqlite_version
Returns the version of the SQLite library which DBD::SQLite is
using, e.g., "2.8.0". Can only be read.
sqlite_unicode
If set to a true value, DBD::SQLite will turn the UTF-8 flag
on for all text strings coming out of the database (this
feature is currently disabled for perl < 5.8.5). For more
details on the UTF-8 flag see perlunicode. The default is for
the UTF-8 flag to be turned off.
Also note that due to some bizarreness in SQLite's type system
(see <http://www.sqlite.org/datatype3.html>), if you want to
retain blob-style behavior for some columns under
"$dbh->{sqlite_unicode} = 1" (say, to store images in the
database), you have to state so explicitly using the
3-argument form of "bind_param" in DBI when doing updates:
use DBI qw(:sql_types);
$dbh->{sqlite_unicode} = 1;
my $sth = $dbh->prepare("INSERT INTO mytable (blobcolumn) VALUES (?)");
# Binary_data will be stored as is.
$sth->bind_param(1, $binary_data, SQL_BLOB);
Defining the column type as "BLOB" in the DDL is not
sufficient.
This attribute was originally named as "unicode", and renamed
to "sqlite_unicode" for integrity since version 1.26_06. Old
"unicode" attribute is still accessible but will be deprecated
in the near future.
sqlite_allow_multiple_statements
If you set this to true, "do" method will process multiple
statements at one go. This may be handy, but with performance
penalty. See above for details.
sqlite_use_immediate_transaction
If you set this to true, DBD::SQLite tries to issue a "begin
immediate transaction" (instead of "begin transaction") when
txt1 COLLATE perl,
txt2 COLLATE perllocale,
txt3 COLLATE nocase
)
or
SELECT * FROM foo ORDER BY name COLLATE perllocale
Unicode handling
If the attribute "$dbh->{sqlite_unicode}" is set, strings coming
from the database and passed to the collation function will be
properly tagged with the utf8 flag; but this only works if the
"sqlite_unicode" attribute is set before the first call to a perl
collation sequence . The recommended way to activate unicode is to
set the parameter at connection time :
my $dbh = DBI->connect(
"dbi:SQLite:dbname=foo", "", "",
{
RaiseError => 1,
sqlite_unicode => 1,
}
);
Adding user-defined collations
The native SQLite API for adding user-defined collations is
exposed through methods "sqlite_create_collation" and
"sqlite_collation_needed".
To avoid calling these functions every time a $dbh handle is
created, "DBD::SQLite" offers a simpler interface through the
}
}
/* sqlite_string_mode should be detected earlier, to register default functions correctly */
SV** string_mode_svp = hv_fetchs(hv, "sqlite_string_mode", 0);
if (string_mode_svp != NULL && SvOK(*string_mode_svp)) {
string_mode = _extract_sqlite_string_mode_from_sv(aTHX_ *string_mode_svp);
/* Legacy alternatives to sqlite_string_mode: */
} else if (hv_exists(hv, "sqlite_unicode", 14)) {
val = hv_fetch(hv, "sqlite_unicode", 14, 0);
if ( (val && SvOK(*val)) ? SvIV(*val) : 0 ) {
string_mode = DBD_SQLITE_STRING_MODE_UNICODE_NAIVE;
}
} else if (hv_exists(hv, "unicode", 7)) {
val = hv_fetch(hv, "unicode", 7, 0);
if ( (val && SvOK(*val)) ? SvIV(*val) : 0 ) {
string_mode = DBD_SQLITE_STRING_MODE_UNICODE_NAIVE;
}
}
}
rc = sqlite_open2(dbname, &(imp_dbh->db), flag, extended);
if ( rc != SQLITE_OK ) {
return FALSE; /* -> undef in lib/DBD/SQLite.pm */
}
DBIc_IMPSET_on(imp_dbh);
sqlite_trace(dbh, imp_dbh, 3, form("Unicode support is disabled for this version of perl."));
string_mode = DBD_SQLITE_STRING_MODE_PV;
}
#endif
imp_dbh->string_mode = string_mode;
return TRUE;
}
if (strEQ(key, "sqlite_unicode")) {
/* it's too early to warn the deprecation of sqlite_unicode as it's widely used */
#if PERL_UNICODE_DOES_NOT_WORK_WELL
sqlite_trace(dbh, imp_dbh, 3, form("Unicode support is disabled for this version of perl."));
imp_dbh->string_mode = DBD_SQLITE_STRING_MODE_PV;
#else
imp_dbh->string_mode = SvTRUE(valuesv) ? DBD_SQLITE_STRING_MODE_UNICODE_NAIVE : DBD_SQLITE_STRING_MODE_PV;
#endif
return TRUE;
}
if (strEQ(key, "unicode")) {
_warn_deprecated_if_possible(key, "sqlite_string_mode");
#if PERL_UNICODE_DOES_NOT_WORK_WELL
sqlite_trace(dbh, imp_dbh, 3, form("Unicode support is disabled for this version of perl."));
imp_dbh->string_mode = DBD_SQLITE_STRING_MODE_PV;
#else
imp_dbh->string_mode = SvTRUE(valuesv) ? DBD_SQLITE_STRING_MODE_UNICODE_NAIVE : DBD_SQLITE_STRING_MODE_PV;
#endif
return TRUE;
}
return sv_2mortal(newSViv(imp_dbh->extended_result_codes ? 1 : 0));
}
if (strEQ(key, "sqlite_prefer_numeric_type")) {
return sv_2mortal(newSViv(imp_dbh->prefer_numeric_type ? 1 : 0));
}
if (strEQ(key, "sqlite_string_mode")) {
return sv_2mortal(newSVuv(imp_dbh->string_mode));
}
if (strEQ(key, "sqlite_unicode") || strEQ(key, "unicode")) {
_warn_deprecated_if_possible(key, "sqlite_string_mode");
#if PERL_UNICODE_DOES_NOT_WORK_WELL
sqlite_trace(dbh, imp_dbh, 3, "Unicode support is disabled for this version of perl.");
return sv_2mortal(newSViv(0));
#else
return sv_2mortal(newSViv(imp_dbh->string_mode == DBD_SQLITE_STRING_MODE_UNICODE_NAIVE ? 1 : 0));
#endif
}
return NULL;
} else {
sqlite_set_result(aTHX_ context, POPs, 0 );
}
PUTBACK;
FREETMPS;
LEAVE;
}
static void
sqlite_db_func_dispatcher_unicode_naive(sqlite3_context *context, int argc, sqlite3_value **value)
{
sqlite_db_func_dispatcher(DBD_SQLITE_STRING_MODE_UNICODE_NAIVE, context, argc, value);
}
static void
sqlite_db_func_dispatcher_unicode_fallback(sqlite3_context *context, int argc, sqlite3_value **value)
{
sqlite_db_func_dispatcher(DBD_SQLITE_STRING_MODE_UNICODE_FALLBACK, context, argc, value);
}
static void
sqlite_db_func_dispatcher_unicode_strict(sqlite3_context *context, int argc, sqlite3_value **value)
{
sqlite_db_func_dispatcher(DBD_SQLITE_STRING_MODE_UNICODE_STRICT, context, argc, value);
}
static void
sqlite_db_func_dispatcher_bytes(sqlite3_context *context, int argc, sqlite3_value **value)
{
sqlite_db_func_dispatcher(DBD_SQLITE_STRING_MODE_BYTES, context, argc, value);
}
{
sqlite_db_func_dispatcher(DBD_SQLITE_STRING_MODE_PV, context, argc, value);
}
typedef void (*dispatch_func_t)(sqlite3_context*, int, sqlite3_value**);
static dispatch_func_t _FUNC_DISPATCHER[_DBD_SQLITE_STRING_MODE_COUNT] = {
sqlite_db_func_dispatcher_pv,
sqlite_db_func_dispatcher_bytes,
NULL, NULL,
sqlite_db_func_dispatcher_unicode_naive,
sqlite_db_func_dispatcher_unicode_fallback,
sqlite_db_func_dispatcher_unicode_strict,
};
int
sqlite_db_create_function(pTHX_ SV *dbh, const char *name, int argc, SV *func, int flags)
{
D_imp_dbh(dbh);
int rc;
SV *func_sv;
if (!DBIc_ACTIVE(imp_dbh)) {
#include "SQLiteXS.h"
#include "sqlite3.h"
#define MY_CXT_KEY "DBD::SQLite::_guts" XS_VERSION
typedef enum {
DBD_SQLITE_STRING_MODE_PV,
DBD_SQLITE_STRING_MODE_BYTES,
/* Leave space here so that we can use DBD_SQLITE_STRING_MODE_UNICODE_ANY
as a means of checking for any unicode mode. */
DBD_SQLITE_STRING_MODE_UNICODE_NAIVE = 4,
DBD_SQLITE_STRING_MODE_UNICODE_FALLBACK,
DBD_SQLITE_STRING_MODE_UNICODE_STRICT,
_DBD_SQLITE_STRING_MODE_COUNT,
} dbd_sqlite_string_mode_t;
#define DBD_SQLITE_STRING_MODE_UNICODE_ANY DBD_SQLITE_STRING_MODE_UNICODE_NAIVE
lib/DBD/SQLite.pm view on Meta::CPAN
}
}
}
if (my $flags = $attr->{sqlite_open_flags}) {
unless ($flags & (DBD::SQLite::OPEN_READONLY() | DBD::SQLite::OPEN_READWRITE())) {
$attr->{sqlite_open_flags} |= DBD::SQLite::OPEN_READWRITE() | DBD::SQLite::OPEN_CREATE();
}
}
# To avoid unicode and long file name problems on Windows,
# convert to the shortname if the file (or parent directory) exists.
if ( $^O =~ /MSWin32/ and $real ne ':memory:' and $real ne '' and $real !~ /^file:/ and !-f $real ) {
require File::Basename;
my ($file, $dir, $suffix) = File::Basename::fileparse($real);
# We are creating a new file.
# Does the directory it's in at least exist?
if ( -d $dir ) {
require Win32;
$real = join '', grep { defined } Win32::GetShortPathName($dir), $file, $suffix;
} else {
lib/DBD/SQLite.pm view on Meta::CPAN
decode, but it can also B<corrupt> B<Perl> B<itself!>
=item * DBD_SQLITE_STRING_MODE_PV (default, but B<DO> B<NOT> B<USE>): Like
DBD_SQLITE_STRING_MODE_BYTES, but when translating Perl strings to SQLite
the Perl string's internal byte buffer is given to SQLite. B<This> B<is>
B<bad>, but it's been the default for many years, and changing that would
break existing applications.
=back
=item C<sqlite_unicode> or C<unicode> (deprecated)
If truthy, equivalent to setting C<sqlite_string_mode> to
DBD_SQLITE_STRING_MODE_UNICODE_NAIVE; if falsy, equivalent to
DBD_SQLITE_STRING_MODE_PV.
Prefer C<sqlite_string_mode> in all new code.
=item sqlite_allow_multiple_statements
If you set this to true, C<do> method will process multiple
lib/DBD/SQLite.pm view on Meta::CPAN
or
SELECT * FROM foo ORDER BY name COLLATE perllocale
=head2 Unicode handling
Depending on the C<< $dbh->{sqlite_string_mode} >> value, strings coming
from the database and passed to the collation function may be decoded as
UTF-8. This only works, though, if the C<sqlite_string_mode> attribute is
set B<before> the first call to a perl collation sequence. The recommended
way to activate unicode is to set C<sqlite_string_mode> at connection time:
my $dbh = DBI->connect(
"dbi:SQLite:dbname=foo", "", "",
{
RaiseError => 1,
sqlite_string_mode => DBD_SQLITE_STRING_MODE_UNICODE_STRICT,
}
);
=head2 Adding user-defined collations
lib/DBD/SQLite/Fulltext_search.pod view on Meta::CPAN
=item icu
The I<icu> tokenizer uses the ICU library to decide how to
identify word characters in different languages; however, this
requires SQLite to be compiled with the C<SQLITE_ENABLE_ICU>
pre-processor symbol defined. So, to use this tokenizer, you need
edit F<Makefile.PL> to add this flag in C<@CC_DEFINE>, and then
recompile C<DBD::SQLite>; of course, the prerequisite is to have
an ICU library available on your system.
=item unicode61
The I<unicode61> tokenizer works very much like "simple" except that it
does full unicode case folding according to rules in Unicode Version
6.1 and it recognizes unicode space and punctuation characters and
uses those to separate tokens. By contrast, the simple tokenizer only
does case folding of ASCII characters and only recognizes ASCII space
and punctuation characters as token separators.
By default, "unicode61" also removes all diacritics from Latin script
characters. This behaviour can be overridden by adding the tokenizer
argument C<"remove_diacritics=0">. For example:
-- Create tables that remove diacritics from Latin script characters
-- as part of tokenization.
CREATE VIRTUAL TABLE txt1 USING fts4(tokenize=unicode61);
CREATE VIRTUAL TABLE txt2 USING fts4(tokenize=unicode61 "remove_diacritics=1");
-- Create a table that does not remove diacritics from Latin script
-- characters as part of tokenization.
CREATE VIRTUAL TABLE txt3 USING fts4(tokenize=unicode61 "remove_diacritics=0");
Additional options can customize the set of codepoints that unicode61
treats as separator characters or as token characters -- see the
documentation in L<http://www.sqlite.org/fts3.html#unicode61>.
=back
If a more complex tokenizing algorithm is required, for example to
implement stemming, discard punctuation, or to recognize compound words,
use the perl tokenizer to implement your own logic, as explained below.
=head2 Perl tokenizers
=head3 Declaring a perl tokenizer
lib/DBD/SQLite/VirtualTable/PerlData.pm view on Meta::CPAN
USING perl(path, dev, ino, mode, nlink, uid, gid, rdev, size,
atime, mtime, ctime, blksize, blocks,
arrayrefs="main::file_stats");
# search files
my $sth = $dbh->prepare(<<"");
SELECT * FROM file_stats
WHERE mtime BETWEEN ? AND ?
AND uid IN (...)
=head2 Hashref example : unicode characters
Given any unicode character, the L<Unicode::UCD/charinfo> function
returns a hashref with various bits of information about that character.
So this can be exploited in a virtual table :
use Unicode::UCD 'charinfo';
our $chars = [map {charinfo($_)} 0x300..0x400]; # arbitrary subrange
# create a temporary virtual table
$dbh->do(<<"");
CREATE VIRTUAL TABLE charinfo USING perl(
code, name, block, script, category,
parse_arithexpr||5.013008|
parse_barestmt||5.013007|
parse_block||5.013007|
parse_body|||
parse_fullexpr||5.013008|
parse_fullstmt||5.013005|
parse_label||5.013007|
parse_listexpr||5.013008|
parse_stmtseq||5.013006|
parse_termexpr||5.013008|
parse_unicode_opts|||
parser_dup|||
parser_free|||
path_is_absolute|||n
peep|||
pending_Slabs_to_ro|||
perl_alloc_using|||n
perl_alloc|||n
perl_clone_using|||n
perl_clone|||n
perl_construct|||n
#endif
#ifndef PERL_PV_PRETTY_DUMP
# define PERL_PV_PRETTY_DUMP PERL_PV_PRETTY_ELLIPSES|PERL_PV_PRETTY_QUOTE
#endif
#ifndef PERL_PV_PRETTY_REGPROP
# define PERL_PV_PRETTY_REGPROP PERL_PV_PRETTY_ELLIPSES|PERL_PV_PRETTY_LTGT|PERL_PV_ESCAPE_RE
#endif
/* Hint: pv_escape
* Note that unicode functionality is only backported to
* those perl versions that support it. For older perl
* versions, the implementation will fall back to bytes.
*/
#ifndef pv_escape
#if defined(NEED_pv_escape)
static char * DPPP_(my_pv_escape)(pTHX_ SV * dsv, char const * const str, const STRLEN count, const STRLEN max, STRLEN * const escaped, const U32 flags);
static
#else
extern char * DPPP_(my_pv_escape)(pTHX_ SV * dsv, char const * const str, const STRLEN count, const STRLEN max, STRLEN * const escaped, const U32 flags);
**
** ^The third argument is the value to bind to the parameter.
** ^If the third parameter to sqlite3_bind_text() or sqlite3_bind_text16()
** or sqlite3_bind_blob() is a NULL pointer then the fourth parameter
** is ignored and the end result is the same as sqlite3_bind_null().
** ^If the third parameter to sqlite3_bind_text() is not NULL, then
** it should be a pointer to well-formed UTF8 text.
** ^If the third parameter to sqlite3_bind_text16() is not NULL, then
** it should be a pointer to well-formed UTF16 text.
** ^If the third parameter to sqlite3_bind_text64() is not NULL, then
** it should be a pointer to a well-formed unicode string that is
** either UTF8 if the sixth parameter is SQLITE_UTF8, or UTF16
** otherwise.
**
** [[byte-order determination rules]] ^The byte-order of
** UTF16 input text is determined by the byte-order mark (BOM, U+FEFF)
** found in first character, which is removed, or in the absence of a BOM
** the byte order is the native byte order of the host
** machine for sqlite3_bind_text16() or the byte order specified in
** the 6th parameter for sqlite3_bind_text64().)^
** ^If UTF16 input text contains invalid unicode
** characters, then SQLite might change those invalid characters
** into the unicode replacement character: U+FFFD.
**
** ^(In those routines that have a fourth argument, its value is the
** number of bytes in the parameter. To be clear: the value is the
** number of <u>bytes</u> in the value, not the number of characters.)^
** ^If the fourth parameter to sqlite3_bind_text() or sqlite3_bind_text16()
** is negative, then the length of the string is
** the number of bytes up to the first zero terminator.
** If the fourth parameter to sqlite3_bind_blob() is negative, then
** the behavior is undefined.
** If a non-negative fourth parameter is provided to sqlite3_bind_text()
** specified by the interface procedure. ^So, for example, if
** sqlite3_result_text16le() is invoked with text that begins
** with bytes 0xfe, 0xff (a big-endian byte-order mark) then the
** first two bytes of input are skipped and the remaining input
** is interpreted as UTF16BE text.
**
** ^For UTF16 input text to the sqlite3_result_text16(),
** sqlite3_result_text16be(), sqlite3_result_text16le(), and
** sqlite3_result_text64() routines, if the text contains invalid
** UTF16 characters, the invalid characters might be converted
** into the unicode replacement character, U+FFFD.
**
** ^The sqlite3_result_value() interface sets the result of
** the application-defined function to be a copy of the
** [unprotected sqlite3_value] object specified by the 2nd parameter. ^The
** sqlite3_result_value() interface makes a copy of the [sqlite3_value]
** so that the [sqlite3_value] specified in the parameter may change or
** be deallocated after sqlite3_result_value() returns without harm.
** ^A [protected sqlite3_value] object may always be used where an
** [unprotected sqlite3_value] object is required, so either
** kind of [sqlite3_value] object can be used with this interface.
**
** * MEM_Real Real stored in Mem.u.r.
**
** * MEM_IntReal Real stored as an integer in Mem.u.i.
**
** If the MEM_Null flag is set, then the value is an SQL NULL value.
** For a pointer type created using sqlite3_bind_pointer() or
** sqlite3_result_pointer() the MEM_Term and MEM_Subtype flags are also set.
**
** If the MEM_Str flag is set then Mem.z points at a string representation.
** Usually this is encoded in the same unicode encoding as the main
** database (see below for exceptions). If the MEM_Term flag is also
** set, then the string is nul terminated. The MEM_Int and MEM_Real
** flags may coexist with the MEM_Str flag.
*/
#define MEM_Undefined 0x0000 /* Value is undefined */
#define MEM_Null 0x0001 /* Value is NULL (or a pointer) */
#define MEM_Str 0x0002 /* Value is a string */
#define MEM_Int 0x0004 /* Value is an integer */
#define MEM_Real 0x0008 /* Value is a real number */
#define MEM_Blob 0x0010 /* Value is a BLOB */
*zOut++ = (u8)(c&0x00FF); \
}else{ \
*zOut++ = (u8)(0x00D8 + (((c-0x10000)>>18)&0x03)); \
*zOut++ = (u8)(((c>>10)&0x003F) + (((c-0x10000)>>10)&0x00C0)); \
*zOut++ = (u8)(0x00DC + ((c>>8)&0x03)); \
*zOut++ = (u8)(c&0x00FF); \
} \
}
/*
** Translate a single UTF-8 character. Return the unicode value.
**
** During translation, assume that the byte that zTerm points
** is a 0x00.
**
** Write a pointer to the next unread byte back into *pzNext.
**
** Notes On Invalid UTF-8:
**
** * This routine never allows a 7-bit character (0x00 through 0x7f) to
** be encoded as a multi-byte character. Any multi-byte character that
** * This routine never allows a UTF16 surrogate value to be encoded.
** If a multi-byte character attempts to encode a value between
** 0xd800 and 0xe000 then it is rendered as 0xfffd.
**
** * Bytes in the range of 0x80 through 0xbf which occur as the first
** byte of a character are interpreted as single-byte characters
** and rendered as themselves even though they are technically
** invalid characters.
**
** * This routine accepts over-length UTF8 encodings
** for unicode values 0x80 and greater. It does not change over-length
** encodings to 0xfffd as some systems recommend.
*/
#define READ_UTF8(zIn, zTerm, c) \
c = *(zIn++); \
if( c>=0xc0 ){ \
c = sqlite3Utf8Trans1[c-0xc0]; \
while( zIn!=zTerm && (*zIn & 0xc0)==0x80 ){ \
c = (c<<6) + (0x3f & *(zIn++)); \
} \
if( c<0x80 \
pMem->z[pMem->n+1] = '\0';
pMem->flags |= MEM_Term;
pMem->enc = bom;
}
}
return rc;
}
#endif /* SQLITE_OMIT_UTF16 */
/*
** pZ is a UTF-8 encoded unicode string. If nByte is less than zero,
** return the number of unicode characters in pZ up to (but not including)
** the first 0x00 byte. If nByte is not less than zero, return the
** number of unicode characters in the first nByte of pZ (or up to
** the first 0x00, whichever comes first).
*/
SQLITE_PRIVATE int sqlite3Utf8CharLen(const char *zIn, int nByte){
int r = 0;
const u8 *z = (const u8*)zIn;
const u8 *zTerm;
if( nByte>=0 ){
zTerm = &z[nByte];
}else{
zTerm = (const u8*)(-1);
sqlite3VdbeMemRelease(&m);
m.z = 0;
}
assert( (m.flags & MEM_Term)!=0 || db->mallocFailed );
assert( (m.flags & MEM_Str)!=0 || db->mallocFailed );
assert( m.z || db->mallocFailed );
return m.z;
}
/*
** zIn is a UTF-16 encoded unicode string at least nChar characters long.
** Return the number of bytes in the first nChar unicode characters
** in pZ. nChar must be non-negative.
*/
SQLITE_PRIVATE int sqlite3Utf16ByteLen(const void *zIn, int nChar){
int c;
unsigned char const *z = zIn;
int n = 0;
if( SQLITE_UTF16NATIVE==SQLITE_UTF16LE ) z++;
while( n<nChar ){
c = z[0];
return 0;
}
zTextMbcs = winUnicodeToMbcs(zTmpWide, useAnsi);
sqlite3_free(zTmpWide);
return zTextMbcs;
}
/*
** This is a public wrapper for the winUtf8ToUnicode() function.
*/
SQLITE_API LPWSTR sqlite3_win32_utf8_to_unicode(const char *zText){
#ifdef SQLITE_ENABLE_API_ARMOR
if( !zText ){
(void)SQLITE_MISUSE_BKPT;
return 0;
}
#endif
#ifndef SQLITE_OMIT_AUTOINIT
if( sqlite3_initialize() ) return 0;
#endif
return winUtf8ToUnicode(zText);
}
/*
** This is a public wrapper for the winUnicodeToUtf8() function.
*/
SQLITE_API char *sqlite3_win32_unicode_to_utf8(LPCWSTR zWideText){
#ifdef SQLITE_ENABLE_API_ARMOR
if( !zWideText ){
(void)SQLITE_MISUSE_BKPT;
return 0;
}
#endif
#ifndef SQLITE_OMIT_AUTOINIT
if( sqlite3_initialize() ) return 0;
#endif
return winUnicodeToUtf8(zWideText);
** This function is the same as sqlite3_win32_set_directory (below); however,
** it accepts a UTF-16 string.
*/
SQLITE_API int sqlite3_win32_set_directory16(
unsigned long type, /* Identifier for directory being set or reset */
const void *zValue /* New value for directory being set or reset */
){
int rc;
char *zUtf8 = 0;
if( zValue ){
zUtf8 = sqlite3_win32_unicode_to_utf8(zValue);
if( zUtf8==0 ) return SQLITE_NOMEM_BKPT;
}
rc = sqlite3_win32_set_directory8(type, zUtf8);
if( zUtf8 ) sqlite3_free(zUtf8);
return rc;
}
/*
** This function sets the data directory or the temporary directory based on
** the provided arguments. The type argument must be 1 in order to set the
sqlite3QuoteValue(&str,argv[0]);
sqlite3_result_text(context, sqlite3StrAccumFinish(&str), str.nChar,
SQLITE_DYNAMIC);
if( str.accError!=SQLITE_OK ){
sqlite3_result_null(context);
sqlite3_result_error_code(context, str.accError);
}
}
/*
** The unicode() function. Return the integer unicode code-point value
** for the first character of the input string.
*/
static void unicodeFunc(
sqlite3_context *context,
int argc,
sqlite3_value **argv
){
const unsigned char *z = sqlite3_value_text(argv[0]);
(void)argc;
if( z && z[0] ) sqlite3_result_int(context, sqlite3Utf8Read(&z));
}
/*
** The char() function takes zero or more arguments, each of which is
** an integer. It constructs a string where each character of the string
** is the unicode character for the corresponding integer argument.
*/
static void charFunc(
sqlite3_context *context,
int argc,
sqlite3_value **argv
){
unsigned char *z, *zOut;
int i;
zOut = z = sqlite3_malloc64( argc*4+1 );
if( z==0 ){
FUNCTION(max, 0, 1, 1, 0 ),
WAGGREGATE(max, 1, 1, 1, minmaxStep, minMaxFinalize, minMaxValue, 0,
SQLITE_FUNC_MINMAX|SQLITE_FUNC_ANYORDER ),
FUNCTION2(typeof, 1, 0, 0, typeofFunc, SQLITE_FUNC_TYPEOF),
FUNCTION2(subtype, 1, 0, 0, subtypeFunc, SQLITE_FUNC_TYPEOF),
FUNCTION2(length, 1, 0, 0, lengthFunc, SQLITE_FUNC_LENGTH),
FUNCTION2(octet_length, 1, 0, 0, bytelengthFunc,SQLITE_FUNC_BYTELEN),
FUNCTION(instr, 2, 0, 0, instrFunc ),
FUNCTION(printf, -1, 0, 0, printfFunc ),
FUNCTION(format, -1, 0, 0, printfFunc ),
FUNCTION(unicode, 1, 0, 0, unicodeFunc ),
FUNCTION(char, -1, 0, 0, charFunc ),
FUNCTION(abs, 1, 0, 0, absFunc ),
#ifdef SQLITE_DEBUG
FUNCTION(fpdecode, 3, 0, 0, fpdecodeFunc ),
#endif
#ifndef SQLITE_OMIT_FLOATING_POINT
FUNCTION(round, 1, 0, 0, roundFunc ),
FUNCTION(round, 2, 0, 0, roundFunc ),
#endif
FUNCTION(upper, 1, 0, 0, upperFunc ),
nBytes = sz;
}
sqlite3_mutex_enter(db->mutex);
zSql8 = sqlite3Utf16to8(db, zSql, nBytes, SQLITE_UTF16NATIVE);
if( zSql8 ){
rc = sqlite3LockAndPrepare(db, zSql8, -1, prepFlags, 0, ppStmt, &zTail8);
}
if( zTail8 && pzTail ){
/* If sqlite3_prepare returns a tail pointer, we calculate the
** equivalent pointer into the UTF-16 string by counting the unicode
** characters between zSql8 and zTail8, and then returning a pointer
** the same number of characters into the UTF-16 string.
*/
int chars_parsed = sqlite3Utf8CharLen(zSql8, (int)(zTail8-zSql8));
*pzTail = (u8 *)zSql + sqlite3Utf16ByteLen(zSql, chars_parsed);
}
sqlite3DbFree(db, zSql8);
rc = sqlite3ApiExit(db, rc);
sqlite3_mutex_leave(db->mutex);
return rc;
#define CC_LP 17 /* '(' */
#define CC_RP 18 /* ')' */
#define CC_SEMI 19 /* ';' */
#define CC_PLUS 20 /* '+' */
#define CC_STAR 21 /* '*' */
#define CC_PERCENT 22 /* '%' */
#define CC_COMMA 23 /* ',' */
#define CC_AND 24 /* '&' */
#define CC_TILDA 25 /* '~' */
#define CC_DOT 26 /* '.' */
#define CC_ID 27 /* unicode characters usable in IDs */
#define CC_ILLEGAL 28 /* Illegal character */
#define CC_NUL 29 /* 0x00 */
#define CC_BOM 30 /* First byte of UTF8 BOM: 0xEF 0xBB 0xBF */
static const unsigned char aiClass[] = {
#ifdef SQLITE_ASCII
/* x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xa xb xc xd xe xf */
/* 0x */ 29, 28, 28, 28, 28, 28, 28, 28, 28, 7, 7, 28, 7, 7, 28, 28,
/* 1x */ 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28,
/* 2x */ 7, 15, 8, 5, 4, 22, 24, 8, 17, 18, 21, 20, 23, 11, 26, 16,
Fts3Table*, Fts3MultiSegReader*, int, const char*, int);
SQLITE_PRIVATE int sqlite3Fts3MsrIncrNext(
Fts3Table *, Fts3MultiSegReader *, sqlite3_int64 *, char **, int *);
SQLITE_PRIVATE int sqlite3Fts3EvalPhrasePoslist(Fts3Cursor *, Fts3Expr *, int iCol, char **);
SQLITE_PRIVATE int sqlite3Fts3MsrOvfl(Fts3Cursor *, Fts3MultiSegReader *, int *);
SQLITE_PRIVATE int sqlite3Fts3MsrIncrRestart(Fts3MultiSegReader *pCsr);
/* fts3_tokenize_vtab.c */
SQLITE_PRIVATE int sqlite3Fts3InitTok(sqlite3*, Fts3Hash *, void(*xDestroy)(void*));
/* fts3_unicode2.c (functions generated by parsing unicode text files) */
#ifndef SQLITE_DISABLE_FTS3_UNICODE
SQLITE_PRIVATE int sqlite3FtsUnicodeFold(int, int);
SQLITE_PRIVATE int sqlite3FtsUnicodeIsalnum(int);
SQLITE_PRIVATE int sqlite3FtsUnicodeIsdiacritic(int);
#endif
SQLITE_PRIVATE int sqlite3Fts3ExprIterate(Fts3Expr*, int (*x)(Fts3Expr*,int,void*), void*);
SQLITE_PRIVATE int sqlite3Fts3IntegrityCheck(Fts3Table *p, int *pbOk);
sqlite3Fts3HashInit(&pHash->hash, FTS3_HASH_STRING, 1);
pHash->nRef = 0;
}
/* Load the built-in tokenizers into the hash table */
if( rc==SQLITE_OK ){
if( sqlite3Fts3HashInsert(&pHash->hash, "simple", 7, (void *)pSimple)
|| sqlite3Fts3HashInsert(&pHash->hash, "porter", 7, (void *)pPorter)
#ifndef SQLITE_DISABLE_FTS3_UNICODE
|| sqlite3Fts3HashInsert(&pHash->hash, "unicode61", 10, (void *)pUnicode)
#endif
#ifdef SQLITE_ENABLE_ICU
|| (pIcu && sqlite3Fts3HashInsert(&pHash->hash, "icu", 4, (void *)pIcu))
#endif
){
rc = SQLITE_NOMEM;
}
}
#ifdef SQLITE_TEST
}else{
/* Retrieve matchinfo() data. */
fts3GetMatchinfo(pContext, pCsr, zFormat);
sqlite3Fts3SegmentsClose(pTab);
}
}
#endif
/************** End of fts3_snippet.c ****************************************/
/************** Begin file fts3_unicode.c ************************************/
/*
** 2012 May 24
**
** The author disclaims copyright to this source code. In place of
** a legal notice, here is a blessing:
**
** May you do good and not evil.
** May you find forgiveness for yourself and forgive others.
** May you share freely, never taking more than you give.
**
******************************************************************************
**
** Implementation of the "unicode" full-text-search tokenizer.
*/
#ifndef SQLITE_DISABLE_FTS3_UNICODE
/* #include "fts3Int.h" */
#if !defined(SQLITE_CORE) || defined(SQLITE_ENABLE_FTS3)
/* #include <assert.h> */
/* #include <stdlib.h> */
/* #include <stdio.h> */
}else{ \
*zOut++ = 0xF0 + (u8)((c>>18) & 0x07); \
*zOut++ = 0x80 + (u8)((c>>12) & 0x3F); \
*zOut++ = 0x80 + (u8)((c>>6) & 0x3F); \
*zOut++ = 0x80 + (u8)(c & 0x3F); \
} \
}
#endif /* ifndef SQLITE_AMALGAMATION */
typedef struct unicode_tokenizer unicode_tokenizer;
typedef struct unicode_cursor unicode_cursor;
struct unicode_tokenizer {
sqlite3_tokenizer base;
int eRemoveDiacritic;
int nException;
int *aiException;
};
struct unicode_cursor {
sqlite3_tokenizer_cursor base;
const unsigned char *aInput; /* Input text being tokenized */
int nInput; /* Size of aInput[] in bytes */
int iOff; /* Current offset within aInput[] */
int iToken; /* Index of next token to be returned */
char *zToken; /* storage for current token */
int nAlloc; /* space allocated at zToken */
};
/*
** Destroy a tokenizer allocated by unicodeCreate().
*/
static int unicodeDestroy(sqlite3_tokenizer *pTokenizer){
if( pTokenizer ){
unicode_tokenizer *p = (unicode_tokenizer *)pTokenizer;
sqlite3_free(p->aiException);
sqlite3_free(p);
}
return SQLITE_OK;
}
/*
** As part of a tokenchars= or separators= option, the CREATE VIRTUAL TABLE
** statement has specified that the tokenizer for this table shall consider
** all characters in string zIn/nIn to be separators (if bAlnum==0) or
** token characters (if bAlnum==1).
**
** For each codepoint in the zIn/nIn string, this function checks if the
** sqlite3FtsUnicodeIsalnum() function already returns the desired result.
** If so, no action is taken. Otherwise, the codepoint is added to the
** unicode_tokenizer.aiException[] array. For the purposes of tokenization,
** the return value of sqlite3FtsUnicodeIsalnum() is inverted for all
** codepoints in the aiException[] array.
**
** If a standalone diacritic mark (one that sqlite3FtsUnicodeIsdiacritic()
** identifies as a diacritic) occurs in the zIn/nIn string it is ignored.
** It is not possible to change the behavior of the tokenizer with respect
** to these codepoints.
*/
static int unicodeAddExceptions(
unicode_tokenizer *p, /* Tokenizer to add exceptions to */
int bAlnum, /* Replace Isalnum() return value with this */
const char *zIn, /* Array of characters to make exceptions */
int nIn /* Length of z in bytes */
){
const unsigned char *z = (const unsigned char *)zIn;
const unsigned char *zTerm = &z[nIn];
unsigned int iCode;
int nEntry = 0;
assert( bAlnum==0 || bAlnum==1 );
p->aiException = aNew;
p->nException = nNew;
}
return SQLITE_OK;
}
/*
** Return true if the p->aiException[] array contains the value iCode.
*/
static int unicodeIsException(unicode_tokenizer *p, int iCode){
if( p->nException>0 ){
int *a = p->aiException;
int iLo = 0;
int iHi = p->nException-1;
while( iHi>=iLo ){
int iTest = (iHi + iLo) / 2;
if( iCode==a[iTest] ){
return 1;
}else if( iCode>a[iTest] ){
}
}
return 0;
}
/*
** Return true if, for the purposes of tokenization, codepoint iCode is
** considered a token character (not a separator).
*/
static int unicodeIsAlnum(unicode_tokenizer *p, int iCode){
assert( (sqlite3FtsUnicodeIsalnum(iCode) & 0xFFFFFFFE)==0 );
return sqlite3FtsUnicodeIsalnum(iCode) ^ unicodeIsException(p, iCode);
}
/*
** Create a new tokenizer instance.
*/
static int unicodeCreate(
int nArg, /* Size of array argv[] */
const char * const *azArg, /* Tokenizer creation arguments */
sqlite3_tokenizer **pp /* OUT: New tokenizer handle */
){
unicode_tokenizer *pNew; /* New tokenizer object */
int i;
int rc = SQLITE_OK;
pNew = (unicode_tokenizer *) sqlite3_malloc(sizeof(unicode_tokenizer));
if( pNew==NULL ) return SQLITE_NOMEM;
memset(pNew, 0, sizeof(unicode_tokenizer));
pNew->eRemoveDiacritic = 1;
for(i=0; rc==SQLITE_OK && i<nArg; i++){
const char *z = azArg[i];
int n = (int)strlen(z);
if( n==19 && memcmp("remove_diacritics=1", z, 19)==0 ){
pNew->eRemoveDiacritic = 1;
}
else if( n==19 && memcmp("remove_diacritics=0", z, 19)==0 ){
pNew->eRemoveDiacritic = 0;
}
else if( n==19 && memcmp("remove_diacritics=2", z, 19)==0 ){
pNew->eRemoveDiacritic = 2;
}
else if( n>=11 && memcmp("tokenchars=", z, 11)==0 ){
rc = unicodeAddExceptions(pNew, 1, &z[11], n-11);
}
else if( n>=11 && memcmp("separators=", z, 11)==0 ){
rc = unicodeAddExceptions(pNew, 0, &z[11], n-11);
}
else{
/* Unrecognized argument */
rc = SQLITE_ERROR;
}
}
if( rc!=SQLITE_OK ){
unicodeDestroy((sqlite3_tokenizer *)pNew);
pNew = 0;
}
*pp = (sqlite3_tokenizer *)pNew;
return rc;
}
/*
** Prepare to begin tokenizing a particular string. The input
** string to be tokenized is pInput[0..nBytes-1]. A cursor
** used to incrementally tokenize this string is returned in
** *ppCursor.
*/
static int unicodeOpen(
sqlite3_tokenizer *p, /* The tokenizer */
const char *aInput, /* Input string */
int nInput, /* Size of string aInput in bytes */
sqlite3_tokenizer_cursor **pp /* OUT: New cursor object */
){
unicode_cursor *pCsr;
pCsr = (unicode_cursor *)sqlite3_malloc(sizeof(unicode_cursor));
if( pCsr==0 ){
return SQLITE_NOMEM;
}
memset(pCsr, 0, sizeof(unicode_cursor));
pCsr->aInput = (const unsigned char *)aInput;
if( aInput==0 ){
pCsr->nInput = 0;
pCsr->aInput = (const unsigned char*)"";
}else if( nInput<0 ){
pCsr->nInput = (int)strlen(aInput);
}else{
pCsr->nInput = nInput;
}
*pp = &pCsr->base;
UNUSED_PARAMETER(p);
return SQLITE_OK;
}
/*
** Close a tokenization cursor previously opened by a call to
** simpleOpen() above.
*/
static int unicodeClose(sqlite3_tokenizer_cursor *pCursor){
unicode_cursor *pCsr = (unicode_cursor *) pCursor;
sqlite3_free(pCsr->zToken);
sqlite3_free(pCsr);
return SQLITE_OK;
}
/*
** Extract the next token from a tokenization cursor. The cursor must
** have been opened by a prior call to simpleOpen().
*/
static int unicodeNext(
sqlite3_tokenizer_cursor *pC, /* Cursor returned by simpleOpen */
const char **paToken, /* OUT: Token text */
int *pnToken, /* OUT: Number of bytes at *paToken */
int *piStart, /* OUT: Starting offset of token */
int *piEnd, /* OUT: Ending offset of token */
int *piPos /* OUT: Position integer of token */
){
unicode_cursor *pCsr = (unicode_cursor *)pC;
unicode_tokenizer *p = ((unicode_tokenizer *)pCsr->base.pTokenizer);
unsigned int iCode = 0;
char *zOut;
const unsigned char *z = &pCsr->aInput[pCsr->iOff];
const unsigned char *zStart = z;
const unsigned char *zEnd;
const unsigned char *zTerm = &pCsr->aInput[pCsr->nInput];
/* Scan past any delimiter characters before the start of the next token.
** Return SQLITE_DONE early if this takes us all the way to the end of
** the input. */
while( z<zTerm ){
READ_UTF8(z, zTerm, iCode);
if( unicodeIsAlnum(p, (int)iCode) ) break;
zStart = z;
}
if( zStart>=zTerm ) return SQLITE_DONE;
zOut = pCsr->zToken;
do {
int iOut;
/* Grow the output buffer if required. */
if( (zOut-pCsr->zToken)>=(pCsr->nAlloc-4) ){
/* Write the folded case of the last character read to the output */
zEnd = z;
iOut = sqlite3FtsUnicodeFold((int)iCode, p->eRemoveDiacritic);
if( iOut ){
WRITE_UTF8(zOut, iOut);
}
/* If the cursor is not at EOF, read the next character */
if( z>=zTerm ) break;
READ_UTF8(z, zTerm, iCode);
}while( unicodeIsAlnum(p, (int)iCode)
|| sqlite3FtsUnicodeIsdiacritic((int)iCode)
);
/* Set the output variables and return. */
pCsr->iOff = (int)(z - pCsr->aInput);
*paToken = pCsr->zToken;
*pnToken = (int)(zOut - pCsr->zToken);
*piStart = (int)(zStart - pCsr->aInput);
*piEnd = (int)(zEnd - pCsr->aInput);
*piPos = pCsr->iToken++;
return SQLITE_OK;
}
/*
** Set *ppModule to a pointer to the sqlite3_tokenizer_module
** structure for the unicode tokenizer.
*/
SQLITE_PRIVATE void sqlite3Fts3UnicodeTokenizer(sqlite3_tokenizer_module const **ppModule){
static const sqlite3_tokenizer_module module = {
0,
unicodeCreate,
unicodeDestroy,
unicodeOpen,
unicodeClose,
unicodeNext,
0,
};
*ppModule = &module;
}
#endif /* !defined(SQLITE_CORE) || defined(SQLITE_ENABLE_FTS3) */
#endif /* ifndef SQLITE_DISABLE_FTS3_UNICODE */
/************** End of fts3_unicode.c ****************************************/
/************** Begin file fts3_unicode2.c ***********************************/
/*
** 2012-05-25
**
** The author disclaims copyright to this source code. In place of
** a legal notice, here is a blessing:
**
** May you do good and not evil.
** May you find forgiveness for yourself and forgive others.
** May you share freely, never taking more than you give.
**
/*
** DO NOT EDIT THIS MACHINE GENERATED FILE.
*/
#ifndef SQLITE_DISABLE_FTS3_UNICODE
#if defined(SQLITE_ENABLE_FTS3) || defined(SQLITE_ENABLE_FTS4)
/* #include <assert.h> */
/*
** Return true if the argument corresponds to a unicode codepoint
** classified as either a letter or a number. Otherwise false.
**
** The results are undefined if the value passed to this function
** is less than zero.
*/
SQLITE_PRIVATE int sqlite3FtsUnicodeIsalnum(int c){
/* Each unsigned integer in the following array corresponds to a contiguous
** range of unicode codepoints that are not either letters or numbers (i.e.
** codepoints for which this function should return 0).
**
** The most significant 22 bits in each 32-bit value contain the first
** codepoint in the range. The least significant 10 bits are used to store
** the size of the range (always at least 1). In other words, the value
** ((C<<22) + N) represents a range of N codepoints starting with codepoint
** C. It is not possible to represent a range larger than 1023 codepoints
** using this format.
*/
static const unsigned int aEntry[] = {
iHi = iTest-1;
}
}
assert( key>=aDia[iRes] );
if( bComplex==0 && (aChar[iRes] & 0x80) ) return c;
return (c > (aDia[iRes]>>3) + (aDia[iRes]&0x07)) ? c : ((int)aChar[iRes] & 0x7F);
}
/*
** Return true if the argument interpreted as a unicode codepoint
** is a diacritical modifier character.
*/
SQLITE_PRIVATE int sqlite3FtsUnicodeIsdiacritic(int c){
unsigned int mask0 = 0x08029FDF;
unsigned int mask1 = 0x000361F8;
if( c<768 || c>817 ) return 0;
return (c < 768+32) ?
(mask0 & ((unsigned int)1 << (c-768))) :
(mask1 & ((unsigned int)1 << (c-768-32)));
}
/*
** Interpret the argument as a unicode codepoint. If the codepoint
** is an upper case character that has a lower case equivalent,
** return the codepoint corresponding to the lower case version.
** Otherwise, return a copy of the argument.
**
** The results are undefined if the value passed to this function
** is less than zero.
*/
SQLITE_PRIVATE int sqlite3FtsUnicodeFold(int c, int eRemoveDiacritic){
/* Each entry in the following array defines a rule for folding a range
** of codepoints to lower case. The rule applies to a range of nRange
** to all nRange codepoints (i.e. all nRange codepoints are upper case and
** need to be folded). Or, if it is set, then the rule only applies to
** every second codepoint in the range, starting with codepoint C.
**
** The 7 most significant bits in flags are an index into the aiOff[]
** array. If a specific codepoint C does require folding, then its lower
** case equivalent is ((C + aiOff[flags>>1]) & 0xFFFF).
**
** The contents of this array are generated by parsing the CaseFolding.txt
** file distributed as part of the "Unicode Character Database". See
** http://www.unicode.org for details.
*/
static const struct TableEntry {
unsigned short iCode;
unsigned char flags;
unsigned char nRange;
} aEntry[] = {
{65, 14, 26}, {181, 64, 1}, {192, 14, 23},
{216, 14, 7}, {256, 1, 48}, {306, 1, 6},
{313, 1, 16}, {330, 1, 46}, {376, 116, 1},
{377, 1, 6}, {383, 104, 1}, {385, 50, 1},
else if( c>=66560 && c<66600 ){
ret = c + 40;
}
return ret;
}
#endif /* defined(SQLITE_ENABLE_FTS3) || defined(SQLITE_ENABLE_FTS4) */
#endif /* !defined(SQLITE_DISABLE_FTS3_UNICODE) */
/************** End of fts3_unicode2.c ***************************************/
/************** Begin file json.c ********************************************/
/*
** 2015-08-12
**
** The author disclaims copyright to this source code. In place of
** a legal notice, here is a blessing:
**
** May you do good and not evil.
** May you find forgiveness for yourself and forgive others.
** May you share freely, never taking more than you give.
}else if( v<=0x7ff ){
assert( szEscape>=2 );
zOut[iOut++] = (char)(0xc0 | (v>>6));
zOut[iOut++] = 0x80 | (v&0x3f);
}else if( v<0x10000 ){
assert( szEscape>=3 );
zOut[iOut++] = 0xe0 | (v>>12);
zOut[iOut++] = 0x80 | ((v>>6)&0x3f);
zOut[iOut++] = 0x80 | (v&0x3f);
}else if( v==JSON_INVALID_CHAR ){
/* Silently ignore illegal unicode */
}else{
assert( szEscape>=4 );
zOut[iOut++] = 0xf0 | (v>>18);
zOut[iOut++] = 0x80 | ((v>>12)&0x3f);
zOut[iOut++] = 0x80 | ((v>>6)&0x3f);
zOut[iOut++] = 0x80 | (v&0x3f);
}
iIn += szEscape - 1;
}else{
zOut[iOut++] = c;
**
** May you do good and not evil.
** May you find forgiveness for yourself and forgive others.
** May you share freely, never taking more than you give.
**
*************************************************************************
** $Id: icu.c,v 1.7 2007/12/13 21:54:11 drh Exp $
**
** This file implements an integration between the ICU library
** ("International Components for Unicode", an open-source library
** for handling unicode data) and SQLite. The integration uses
** ICU to provide the following to SQLite:
**
** * An implementation of the SQL regexp() function (and hence REGEXP
** operator) using the ICU uregex_XX() APIs.
**
** * Implementations of the SQL scalar upper() and lower() functions
** for case mapping.
**
** * Integration of ICU and SQLite collation sequences.
**
** * An implementation of the LIKE operator that uses ICU to
** provide case-independent matching.
*/
#if !defined(SQLITE_CORE) \
|| defined(SQLITE_ENABLE_ICU) \
|| defined(SQLITE_ENABLE_ICU_COLLATIONS)
/* Include ICU headers */
#include <unicode/utypes.h>
#include <unicode/uregex.h>
#include <unicode/ustring.h>
#include <unicode/ucol.h>
/* #include <assert.h> */
#ifndef SQLITE_CORE
/* #include "sqlite3ext.h" */
SQLITE_EXTENSION_INIT1
#else
/* #include "sqlite3.h" */
#endif
** This file implements a tokenizer for fts3 based on the ICU library.
*/
/* #include "fts3Int.h" */
#if !defined(SQLITE_CORE) || defined(SQLITE_ENABLE_FTS3)
#ifdef SQLITE_ENABLE_ICU
/* #include <assert.h> */
/* #include <string.h> */
/* #include "fts3_tokenizer.h" */
#include <unicode/ubrk.h>
/* #include <unicode/ucol.h> */
/* #include <unicode/ustring.h> */
#include <unicode/utf16.h>
typedef struct IcuTokenizer IcuTokenizer;
typedef struct IcuCursor IcuCursor;
struct IcuTokenizer {
sqlite3_tokenizer base;
char *zLocale;
};
struct IcuCursor {
*/
static int sqlite3Fts5VocabInit(Fts5Global*, sqlite3*);
/*
** End of interface to code in fts5_vocab.c.
**************************************************************************/
/**************************************************************************
** Interface to automatically generated code in fts5_unicode2.c.
*/
static int sqlite3Fts5UnicodeIsdiacritic(int c);
static int sqlite3Fts5UnicodeFold(int c, int bRemoveDiacritic);
static int sqlite3Fts5UnicodeCatParse(const char*, u8*);
static int sqlite3Fts5UnicodeCategory(u32 iCode);
static void sqlite3Fts5UnicodeAscii(u8*, u8*);
/*
** End of interface to code in fts5_unicode2.c.
**************************************************************************/
#endif
#define FTS5_OR 1
#define FTS5_AND 2
#define FTS5_NOT 3
#define FTS5_TERM 4
#define FTS5_COLON 5
#define FTS5_MINUS 6
/*
** Return true if character 't' may be part of an FTS5 bareword, or false
** otherwise. Characters that may be part of barewords:
**
** * All non-ASCII characters,
** * The 52 upper and lower case ASCII characters, and
** * The 10 integer ASCII characters.
** * The underscore character "_" (0x5F).
** * The unicode "subsitute" character (0x1A).
*/
static int sqlite3Fts5IsBareword(char t){
u8 aBareword[128] = {
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 0x00 .. 0x0F */
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, /* 0x10 .. 0x1F */
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 0x20 .. 0x2F */
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, /* 0x30 .. 0x3F */
0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, /* 0x40 .. 0x4F */
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, /* 0x50 .. 0x5F */
0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, /* 0x60 .. 0x6F */
*/
if( rc==SQLITE_OK && pRet->bContentlessDelete && pRet->bColumnsize==0 ){
*pzErr = sqlite3_mprintf(
"contentless_delete=1 is incompatible with columnsize=0"
);
rc = SQLITE_ERROR;
}
/* If a tokenizer= option was successfully parsed, the tokenizer has
** already been allocated. Otherwise, allocate an instance of the default
** tokenizer (unicode61) now. */
if( rc==SQLITE_OK && pRet->pTok==0 ){
rc = fts5ConfigDefaultTokenizer(pGlobal, pRet);
}
/* If no zContent option was specified, fill in the default values. */
if( rc==SQLITE_OK && pRet->zContent==0 ){
const char *zTail = 0;
assert( pRet->eContent==FTS5_CONTENT_NORMAL
|| pRet->eContent==FTS5_CONTENT_NONE
);
sqlite3_context *pCtx, /* Function call context */
int nArg, /* Number of args */
sqlite3_value **apVal /* Function arguments */
){
fts5ExprFunction(pCtx, nArg, apVal, 1);
}
/*
** The implementation of an SQLite user-defined-function that accepts a
** single integer as an argument. If the integer is an alpha-numeric
** unicode code point, 1 is returned. Otherwise 0.
*/
static void fts5ExprIsAlnum(
sqlite3_context *pCtx, /* Function call context */
int nArg, /* Number of args */
sqlite3_value **apVal /* Function arguments */
){
int iCode;
u8 aArr[32];
if( nArg!=1 ){
sqlite3_result_error(pCtx,
return 0;
}
}
}
}
return n;
}
/*
** pIn is a UTF-8 encoded string, nIn bytes in size. Return the number of
** unicode characters in the string.
*/
static int fts5IndexCharlen(const char *pIn, int nIn){
int nChar = 0;
int i = 0;
while( i<nIn ){
if( (unsigned char)pIn[i++]>=0xc0 ){
while( i<nIn && (pIn[i] & 0xc0)==0x80 ) i++;
}
nChar++;
}
*/
/* #include "fts5Int.h" */
/**************************************************************************
** Start of ascii tokenizer implementation.
*/
/*
** For tokenizers with no "unicode" modifier, the set of token characters
** is the same as the set of ASCII range alphanumeric characters.
*/
static unsigned char aAsciiTokenChar[128] = {
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 0x00..0x0F */
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 0x10..0x1F */
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 0x20..0x2F */
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, /* 0x30..0x3F */
0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, /* 0x40..0x4F */
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, /* 0x50..0x5F */
0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, /* 0x60..0x6F */
rc = xToken(pCtx, 0, pFold, nByte, is, ie);
is = ie+1;
}
if( pFold!=aFold ) sqlite3_free(pFold);
if( rc==SQLITE_DONE ) rc = SQLITE_OK;
return rc;
}
/**************************************************************************
** Start of unicode61 tokenizer implementation.
*/
/*
** The following two macros - READ_UTF8 and WRITE_UTF8 - have been copied
** from the sqlite3 source file utf.c. If this file is compiled as part
** of the amalgamation, they are not required.
*/
#ifndef SQLITE_AMALGAMATION
unsigned char aTokenChar[128]; /* ASCII range token characters */
char *aFold; /* Buffer to fold text into */
int nFold; /* Size of aFold[] in bytes */
int eRemoveDiacritic; /* True if remove_diacritics=1 is set */
int nException;
int *aiException;
unsigned char aCategory[32]; /* True for token char categories */
};
/* Values for eRemoveDiacritic (must match internals of fts5_unicode2.c) */
#define FTS5_REMOVE_DIACRITICS_NONE 0
#define FTS5_REMOVE_DIACRITICS_SIMPLE 1
#define FTS5_REMOVE_DIACRITICS_COMPLEX 2
static int fts5UnicodeAddExceptions(
Unicode61Tokenizer *p, /* Tokenizer object */
const char *z, /* Characters to treat as exceptions */
int bTokenChars /* 1 for 'tokenchars', 0 for 'separators' */
){
int rc = SQLITE_OK;
}else{
iHi = iTest-1;
}
}
}
return 0;
}
/*
** Delete a "unicode61" tokenizer.
*/
static void fts5UnicodeDelete(Fts5Tokenizer *pTok){
if( pTok ){
Unicode61Tokenizer *p = (Unicode61Tokenizer*)pTok;
sqlite3_free(p->aiException);
sqlite3_free(p->aFold);
sqlite3_free(p);
}
return;
}
static int unicodeSetCategories(Unicode61Tokenizer *p, const char *zCat){
const char *z = zCat;
while( *z ){
while( *z==' ' || *z=='\t' ) z++;
if( *z && sqlite3Fts5UnicodeCatParse(z, p->aCategory) ){
return SQLITE_ERROR;
}
while( *z!=' ' && *z!='\t' && *z!='\0' ) z++;
}
sqlite3Fts5UnicodeAscii(p->aCategory, p->aTokenChar);
return SQLITE_OK;
}
/*
** Create a "unicode61" tokenizer.
*/
static int fts5UnicodeCreate(
void *pUnused,
const char **azArg, int nArg,
Fts5Tokenizer **ppOut
){
int rc = SQLITE_OK; /* Return code */
Unicode61Tokenizer *p = 0; /* New tokenizer object */
UNUSED_PARAM(pUnused);
rc = SQLITE_NOMEM;
}
/* Search for a "categories" argument */
for(i=0; rc==SQLITE_OK && i<nArg-1; i+=2){
if( 0==sqlite3_stricmp(azArg[i], "categories") ){
zCat = azArg[i+1];
}
}
if( rc==SQLITE_OK ){
rc = unicodeSetCategories(p, zCat);
}
for(i=0; rc==SQLITE_OK && i<nArg-1; i+=2){
const char *zArg = azArg[i+1];
if( 0==sqlite3_stricmp(azArg[i], "remove_diacritics") ){
if( (zArg[0]!='0' && zArg[0]!='1' && zArg[0]!='2') || zArg[1] ){
rc = SQLITE_ERROR;
}else{
p->eRemoveDiacritic = (zArg[0] - '0');
assert( p->eRemoveDiacritic==FTS5_REMOVE_DIACRITICS_NONE
*/
static int fts5PorterCreate(
void *pCtx,
const char **azArg, int nArg,
Fts5Tokenizer **ppOut
){
fts5_api *pApi = (fts5_api*)pCtx;
int rc = SQLITE_OK;
PorterTokenizer *pRet;
void *pUserdata = 0;
const char *zBase = "unicode61";
if( nArg>0 ){
zBase = azArg[0];
}
pRet = (PorterTokenizer*)sqlite3_malloc(sizeof(PorterTokenizer));
if( pRet ){
memset(pRet, 0, sizeof(PorterTokenizer));
rc = pApi->xFindTokenizer(pApi, zBase, &pUserdata, &pRet->tokenizer);
}else{
}
/*
** Register all built-in tokenizers with FTS5.
*/
static int sqlite3Fts5TokenizerInit(fts5_api *pApi){
struct BuiltinTokenizer {
const char *zName;
fts5_tokenizer x;
} aBuiltin[] = {
{ "unicode61", {fts5UnicodeCreate, fts5UnicodeDelete, fts5UnicodeTokenize}},
{ "ascii", {fts5AsciiCreate, fts5AsciiDelete, fts5AsciiTokenize }},
{ "porter", {fts5PorterCreate, fts5PorterDelete, fts5PorterTokenize }},
{ "trigram", {fts5TriCreate, fts5TriDelete, fts5TriTokenize}},
};
int rc = SQLITE_OK; /* Return code */
int i; /* To iterate through builtin functions */
for(i=0; rc==SQLITE_OK && i<ArraySize(aBuiltin); i++){
rc = pApi->xCreateTokenizer(pApi,
iHi = iTest-1;
}
}
assert( key>=aDia[iRes] );
if( bComplex==0 && (aChar[iRes] & 0x80) ) return c;
return (c > (aDia[iRes]>>3) + (aDia[iRes]&0x07)) ? c : ((int)aChar[iRes] & 0x7F);
}
/*
** Return true if the argument interpreted as a unicode codepoint
** is a diacritical modifier character.
*/
static int sqlite3Fts5UnicodeIsdiacritic(int c){
unsigned int mask0 = 0x08029FDF;
unsigned int mask1 = 0x000361F8;
if( c<768 || c>817 ) return 0;
return (c < 768+32) ?
(mask0 & ((unsigned int)1 << (c-768))) :
(mask1 & ((unsigned int)1 << (c-768-32)));
}
/*
** Interpret the argument as a unicode codepoint. If the codepoint
** is an upper case character that has a lower case equivalent,
** return the codepoint corresponding to the lower case version.
** Otherwise, return a copy of the argument.
**
** The results are undefined if the value passed to this function
** is less than zero.
*/
static int sqlite3Fts5UnicodeFold(int c, int eRemoveDiacritic){
/* Each entry in the following array defines a rule for folding a range
** of codepoints to lower case. The rule applies to a range of nRange
** to all nRange codepoints (i.e. all nRange codepoints are upper case and
** need to be folded). Or, if it is set, then the rule only applies to
** every second codepoint in the range, starting with codepoint C.
**
** The 7 most significant bits in flags are an index into the aiOff[]
** array. If a specific codepoint C does require folding, then its lower
** case equivalent is ((C + aiOff[flags>>1]) & 0xFFFF).
**
** The contents of this array are generated by parsing the CaseFolding.txt
** file distributed as part of the "Unicode Character Database". See
** http://www.unicode.org for details.
*/
static const struct TableEntry {
unsigned short iCode;
unsigned char flags;
unsigned char nRange;
} aEntry[] = {
{65, 14, 26}, {181, 64, 1}, {192, 14, 23},
{216, 14, 7}, {256, 1, 48}, {306, 1, 6},
{313, 1, 16}, {330, 1, 46}, {376, 116, 1},
{377, 1, 6}, {383, 104, 1}, {385, 50, 1},
**
** ^The third argument is the value to bind to the parameter.
** ^If the third parameter to sqlite3_bind_text() or sqlite3_bind_text16()
** or sqlite3_bind_blob() is a NULL pointer then the fourth parameter
** is ignored and the end result is the same as sqlite3_bind_null().
** ^If the third parameter to sqlite3_bind_text() is not NULL, then
** it should be a pointer to well-formed UTF8 text.
** ^If the third parameter to sqlite3_bind_text16() is not NULL, then
** it should be a pointer to well-formed UTF16 text.
** ^If the third parameter to sqlite3_bind_text64() is not NULL, then
** it should be a pointer to a well-formed unicode string that is
** either UTF8 if the sixth parameter is SQLITE_UTF8, or UTF16
** otherwise.
**
** [[byte-order determination rules]] ^The byte-order of
** UTF16 input text is determined by the byte-order mark (BOM, U+FEFF)
** found in first character, which is removed, or in the absence of a BOM
** the byte order is the native byte order of the host
** machine for sqlite3_bind_text16() or the byte order specified in
** the 6th parameter for sqlite3_bind_text64().)^
** ^If UTF16 input text contains invalid unicode
** characters, then SQLite might change those invalid characters
** into the unicode replacement character: U+FFFD.
**
** ^(In those routines that have a fourth argument, its value is the
** number of bytes in the parameter. To be clear: the value is the
** number of <u>bytes</u> in the value, not the number of characters.)^
** ^If the fourth parameter to sqlite3_bind_text() or sqlite3_bind_text16()
** is negative, then the length of the string is
** the number of bytes up to the first zero terminator.
** If the fourth parameter to sqlite3_bind_blob() is negative, then
** the behavior is undefined.
** If a non-negative fourth parameter is provided to sqlite3_bind_text()
** specified by the interface procedure. ^So, for example, if
** sqlite3_result_text16le() is invoked with text that begins
** with bytes 0xfe, 0xff (a big-endian byte-order mark) then the
** first two bytes of input are skipped and the remaining input
** is interpreted as UTF16BE text.
**
** ^For UTF16 input text to the sqlite3_result_text16(),
** sqlite3_result_text16be(), sqlite3_result_text16le(), and
** sqlite3_result_text64() routines, if the text contains invalid
** UTF16 characters, the invalid characters might be converted
** into the unicode replacement character, U+FFFD.
**
** ^The sqlite3_result_value() interface sets the result of
** the application-defined function to be a copy of the
** [unprotected sqlite3_value] object specified by the 2nd parameter. ^The
** sqlite3_result_value() interface makes a copy of the [sqlite3_value]
** so that the [sqlite3_value] specified in the parameter may change or
** be deallocated after sqlite3_result_value() returns without harm.
** ^A [protected sqlite3_value] object may always be used where an
** [unprotected sqlite3_value] object is required, so either
** kind of [sqlite3_value] object can be used with this interface.
t/02_logon.t view on Meta::CPAN
use strict;
use warnings;
use lib "t/lib";
use SQLiteTest qw/connect_ok @CALL_FUNCS/;
use Test::More;
use if -d ".git", "Test::FailWarnings";
use DBD::SQLite::Constants ':dbd_sqlite_string_mode';
my $unicode_opt = DBD_SQLITE_STRING_MODE_UNICODE_STRICT;
my $show_diag = 0;
foreach my $call_func (@CALL_FUNCS) {
# Ordinary connect
SCOPE: {
my $dbh = connect_ok();
ok( $dbh->{sqlite_version}, '->{sqlite_version} ok' );
is( $dbh->{AutoCommit}, 1, 'AutoCommit is on by default' );
diag("sqlite_version=$dbh->{sqlite_version}") unless $show_diag++;
t/02_logon.t view on Meta::CPAN
ok( defined $dbh->$call_func(0, 'busy_timeout') );
is( $dbh->$call_func('busy_timeout'), 0, 'Set busy_timeout to 0' );
}
# Attributes in the connect string
SKIP: {
unless ( $] >= 5.008005 ) {
skip( 'Unicode is not supported before 5.8.5', 2 );
}
my $file = 'foo'.$$;
my $dbh = DBI->connect( "dbi:SQLite:dbname=$file;sqlite_string_mode=$unicode_opt", '', '' );
isa_ok( $dbh, 'DBI::db' );
is( $dbh->{sqlite_string_mode}, $unicode_opt, 'Unicode is on' );
$dbh->disconnect;
unlink $file;
}
# dbname, db, database
SCOPE: {
for my $key (qw/database db dbname/) {
my $file = 'foo'.$$;
unlink $file if -f $file;
ok !-f $file, 'database file does not exist';
t/12_unicode.t view on Meta::CPAN
# This is a test for correct handling of the "unicode" database
# handle parameter.
use strict;
use warnings;
use lib "t/lib";
use SQLiteTest;
use Test::More;
use if -d ".git", "Test::FailWarnings";
use DBD::SQLite::Constants ':dbd_sqlite_string_mode';
my $unicode_opt = DBD_SQLITE_STRING_MODE_UNICODE_STRICT;
BEGIN { requires_unicode_support() }
#
# Include std stuff
#
use Carp;
use DBI qw(:sql_types);
# Unintuitively, still has the effect of loading bytes.pm :-)
no bytes;
t/12_unicode.t view on Meta::CPAN
ok(
! is_utf8($textback),
"Reading text gives binary too (for now)",
);
is($bytesback, $bytestring, "No blob corruption");
is($textback, $bytestring, "Same text, different encoding");
}
# Start over but now activate Unicode support.
SCOPE: {
my $dbh = connect_ok( dbfile => 'foo', sqlite_string_mode => $unicode_opt );
is( $dbh->{sqlite_string_mode}, $unicode_opt, 'Unicode is on' );
($textback, $bytesback) = database_roundtrip($dbh, $utfstring, $bytestring);
ok(! is_utf8($bytesback), "Reading blob still gives binary");
ok(is_utf8($textback), "Reading text returns UTF-8");
ok($bytesback eq $bytestring, "Still no blob corruption");
ok($textback eq $utfstring, "Same text");
my $lengths = $dbh->selectall_arrayref(
"SELECT length(a), length(b) FROM table1"
t/13_create_collation.t view on Meta::CPAN
use warnings;
no if $] >= 5.022, "warnings", "locale";
use lib "t/lib";
use SQLiteTest;
use Test::More;
use if -d ".git", "Test::FailWarnings";
use Encode qw/decode/;
use DBD::SQLite;
use DBD::SQLite::Constants ':dbd_sqlite_string_mode';
my $unicode_opt = DBD_SQLITE_STRING_MODE_UNICODE_STRICT;
BEGIN { requires_unicode_support(); }
BEGIN {
# Sadly perl for windows (and probably sqlite, too) may hang
# if the system locale doesn't support european languages.
# en-us should be a safe default. if it doesn't work, use 'C'.
if ( $^O eq 'MSWin32') {
use POSIX 'locale_h';
setlocale(LC_COLLATE, 'en-us');
}
}
t/13_create_collation.t view on Meta::CPAN
"can't override registered collation");
my $tied = tied %DBD::SQLite::COLLATION;
delete $tied->{foo};
$DBD::SQLite::COLLATION{foo} = \&by_num_desc; # override, no longer dies
is($DBD::SQLite::COLLATION{foo}, \&by_num_desc, "overridden collation");
# now really test the collation functions
foreach my $call_func (@CALL_FUNCS) {
for my $unicode_opt (DBD_SQLITE_STRING_MODE_BYTES, DBD_SQLITE_STRING_MODE_UNICODE_STRICT) {
# connect
my $dbh = connect_ok( RaiseError => 1, sqlite_string_mode => $unicode_opt );
# populate test data
my @words = qw{
berger Bergèòe bergèòe Bergere
HOT hôôe
héôéòoclite héôaïòe hêôre héòaut
HAT hâôer
féôu fêôe fèöe ferme
};
if ($unicode_opt != DBD_SQLITE_STRING_MODE_BYTES) {
utf8::upgrade($_) foreach @words;
}
$dbh->do( 'CREATE TEMP TABLE collate_test ( txt )' );
$dbh->do( "INSERT INTO collate_test VALUES ( '$_' )" ) foreach @words;
# test builtin collation "perl"
my @sorted = sort @words;
my $db_sorted = $dbh->selectcol_arrayref("$sql COLLATE perl");
is_deeply(\@sorted, $db_sorted, "collate perl (@sorted // @$db_sorted)");
t/33_non_latin_path.t view on Meta::CPAN
use warnings;
use lib "t/lib";
use SQLiteTest;
use Test::More;
use if -d ".git", "Test::FailWarnings";
use File::Temp ();
use File::Spec::Functions ':ALL';
use DBD::SQLite::Constants ':dbd_sqlite_string_mode';
my $unicode_opt = DBD_SQLITE_STRING_MODE_UNICODE_STRICT;
BEGIN { requires_unicode_support() }
my $dir = File::Temp::tempdir( CLEANUP => 1 );
foreach my $subdir ( 'longascii', 'adatbázis', 'name with spaces', '¿¿¿ ¿¿¿¿¿¿') {
if ($^O eq 'cygwin') {
next if (($subdir eq 'adatbázis') || ($subdir eq '¿¿¿ ¿¿¿¿¿¿'));
}
# rt48048: don't need to "use utf8" nor "require utf8"
utf8::upgrade($subdir);
ok(
mkdir(catdir($dir, $subdir)),
t/33_non_latin_path.t view on Meta::CPAN
RaiseError => 1,
PrintError => 0,
} );
isa_ok( $dbh, 'DBI::db' );
};
is( $@, '', "Could connect to database in $subdir" );
diag( $@ ) if $@;
unlink(_path($dbfile)) if -e _path($dbfile);
# Repeat with the unicode flag on
my $ufile = $dbfile;
eval {
my $dbh = DBI->connect("dbi:SQLite:dbname=$dbfile", undef, undef, {
RaiseError => 1,
PrintError => 0,
sqlite_string_mode => $unicode_opt,
} );
isa_ok( $dbh, 'DBI::db' );
};
is( $@, '', "Could connect to database in $subdir" );
diag( $@ ) if $@;
# Reopen the database
eval {
my $dbh = DBI->connect("dbi:SQLite:dbname=$dbfile", undef, undef, {
RaiseError => 1,
PrintError => 0,
sqlite_string_mode => $unicode_opt,
} );
isa_ok( $dbh, 'DBI::db' );
};
is( $@, '', "Could connect to database in $subdir" );
diag( $@ ) if $@;
unlink(_path($ufile)) if -e _path($ufile);
# when the name of the database file has non-latin characters
my $dbfilex = catfile($dir, "$subdir.db");
t/37_regexp.t view on Meta::CPAN
my @words = qw{
berger Bergère bergère Bergere
HOT hôte
hétéroclite hétaïre hêtre héraut
HAT hâter
fétu fête fève ferme
};
my @regexes = qw( ^b\\w+ (?i:^b\\w+) );
BEGIN { requires_unicode_support() }
BEGIN {
# Sadly perl for windows (and probably sqlite, too) may hang
# if the system locale doesn't support european languages.
# en-us should be a safe default. if it doesn't work, use 'C'.
if ( $^O eq 'MSWin32') {
use POSIX 'locale_h';
setlocale(LC_COLLATE, 'en-us');
}
}
use locale;
t/43_fts3.t view on Meta::CPAN
['"qui gardait"' => 1 ],
["moutons NOT lait" => 1 ],
["il était" => 0 ],
["(il OR elle) AND un*" => 0, 2 ],
);
my $ix_une_native = index($texts[0], "une");
my $ix_une_utf8 = do {use bytes; utf8::upgrade(my $bergere_utf8 = $texts[0]); index($bergere_utf8, "une");};
BEGIN {
requires_unicode_support();
if (!has_fts()) {
plan skip_all => 'FTS is disabled for this DBD::SQLite';
}
if ($DBD::SQLite::sqlite_version_number >= 3011000 and $DBD::SQLite::sqlite_version_number < 3012000 and !has_compile_option('ENABLE_FTS3_TOKENIZER')) {
plan skip_all => 'FTS3 tokenizer is disabled for this DBD::SQLite';
}
}
t/62_regexp_multibyte_char_class.t view on Meta::CPAN
use strict;
use warnings;
use lib "t/lib";
use SQLiteTest;
use Test::More;
#use if -d ".git", "Test::FailWarnings"; # see RT#112220
BEGIN { requires_unicode_support() }
use DBD::SQLite::Constants ':dbd_sqlite_string_mode';
# special case for multibyte (non-ASCII) character class,
# which only works correctly under the unicode mode
my @words = ("\x{e3}\x{83}\x{86}\x{e3}\x{82}\x{b9}\x{e3}\x{83}\x{88}", "\x{e3}\x{83}\x{86}\x{e3}\x{83}\x{b3}\x{e3}\x{83}\x{88}"); # ãã¹ã ãã³ã
my $regex = "\x{e3}\x{83}\x{86}[\x{e3}\x{82}\x{b9}\x{e3}\x{83}\x{b3}]\x{e3}\x{83}\x{88}"; # ã[ã¹ã³]ã
foreach my $call_func (@CALL_FUNCS) {
for my $string_mode (DBD_SQLITE_STRING_MODE_PV, DBD_SQLITE_STRING_MODE_UNICODE_STRICT) {
# connect
my $dbh = connect_ok( RaiseError => 1, sqlite_string_mode => $string_mode );
t/lib/SQLiteTest.pm view on Meta::CPAN
# Support code for DBD::SQLite tests
use strict;
use Exporter ();
use File::Spec ();
use Test::More ();
our @ISA = 'Exporter';
our @EXPORT = qw/
connect_ok dies dbfile @CALL_FUNCS $sqlite_call
has_sqlite requires_sqlite requires_unicode_support
allow_warnings has_compile_option has_fts
/;
our @CALL_FUNCS;
our $sqlite_call;
my $parent;
my %dbfiles;
BEGIN {
# Allow tests to load modules bundled in /inc
t/lib/SQLiteTest.pm view on Meta::CPAN
=cut
sub requires_sqlite {
my $version = shift;
unless (has_sqlite($version)) {
Test::More::plan skip_all => "this test requires SQLite $version and newer";
exit;
}
}
=head2 requires_unicode_support
BEGIN { requires_unicode_support(); }
skips all the tests if Perl does not have sane Unicode support.
=cut
sub requires_unicode_support {
unless ($] >= 5.008005) {
Test::More::plan skip_all => "Unicode is not supported before 5.8.5";
exit;
}
}
=head2 allow_warnings
allow_warnings { eval {...} };
t/rt_25371_asymmetric_unicode.t view on Meta::CPAN
use strict;
use warnings;
use lib "t/lib";
use SQLiteTest;
use Test::More;
use if -d ".git", "Test::FailWarnings";
use DBD::SQLite::Constants ':dbd_sqlite_string_mode';
my $unicode_opt = DBD_SQLITE_STRING_MODE_UNICODE_STRICT;
BEGIN { requires_unicode_support(); }
my $dbh = connect_ok( sqlite_string_mode => $unicode_opt );
is( $dbh->{sqlite_string_mode}, $unicode_opt, 'string mode is unicode/strict' );
ok( $dbh->do(<<'END_SQL'), 'CREATE TABLE' );
CREATE TABLE foo (
bar varchar(255)
)
END_SQL
foreach ( "\0", "A", "\xe9", "\x{20ac}" ) {
ok( $dbh->do("INSERT INTO foo VALUES ( ? )", {}, $_), 'INSERT' );
my $foo = $dbh->selectall_arrayref("SELECT bar FROM foo");
t/rt_25924_user_defined_func_unicode.t view on Meta::CPAN
use strict;
use warnings;
use lib "t/lib";
use SQLiteTest;
use Test::More;
use if -d ".git", "Test::FailWarnings";
use DBD::SQLite::Constants ':dbd_sqlite_string_mode';
my $unicode_opt = DBD_SQLITE_STRING_MODE_UNICODE_STRICT;
BEGIN { requires_unicode_support() }
foreach my $call_func (@CALL_FUNCS) {
my $dbh = connect_ok( sqlite_string_mode => $unicode_opt );
ok($dbh->$call_func( "perl_uc", 1, \&perl_uc, "create_function" ));
ok( $dbh->do(<<'END_SQL'), 'CREATE TABLE' );
CREATE TABLE foo (
bar varchar(255)
)
END_SQL
my @words = qw{Bergère hôte hétaïre hêtre};
foreach my $word (@words) {
# rt48048: don't need to "use utf8" nor "require utf8"
utf8::upgrade($word);
ok( $dbh->do("INSERT INTO foo VALUES ( ? )", {}, $word), 'INSERT' );
my $foo = $dbh->selectall_arrayref("SELECT perl_uc(bar) FROM foo");
is_deeply( $foo, [ [ perl_uc($word) ] ], 'unicode upcase ok' );
ok( $dbh->do("DELETE FROM foo"), 'DELETE ok' );
}
$dbh->disconnect;
}
sub perl_uc {
my $string = shift;
return uc($string);
}
t/rt_71311_bind_col_and_unicode.t view on Meta::CPAN
use strict;
use warnings;
use lib "t/lib";
use SQLiteTest;
use Test::More;
use if -d ".git", "Test::FailWarnings";
use DBI qw/:sql_types/;
use DBD::SQLite::Constants ':dbd_sqlite_string_mode';
BEGIN{ requires_unicode_support(); }
my $dbh = connect_ok(sqlite_string_mode => DBD_SQLITE_STRING_MODE_UNICODE_STRICT);
$dbh->do('create table test1 (id integer, b blob)');
my $blob = "\x{82}\x{A0}";
my $str = "\x{20ac}";
{
my $sth = $dbh->prepare('insert into test1 values (?, ?)');
t/rt_71311_bind_col_and_unicode.t view on Meta::CPAN
my $sth = $dbh->prepare('select * from test1 order by id');
$sth->execute;
my $expected = [undef, 1, 0, 0, 1, 1, 1];
for (1..6) {
my $row = $sth->fetch;
ok $row && $row->[0] == $_;
ok $row && utf8::is_utf8($row->[1]) == $expected->[$_],
"row $_ is ".($expected->[$_] ? "unicode" : "not unicode");
}
$sth->finish;
}
{
my $sth = $dbh->prepare('select * from test1 order by id');
$sth->bind_col(1, \my $col1);
$sth->bind_col(2, \my $col2);
$sth->execute;
my $expected = [undef, 1, 0, 0, 1, 1, 1];
for (1..6) {
$sth->fetch;
ok $col1 && $col1 == $_;
ok $col1 && utf8::is_utf8($col2) == $expected->[$_],
"row $_ is ".($expected->[$_] ? "unicode" : "not unicode");
}
$sth->finish;
}
{
my $sth = $dbh->prepare('select * from test1 order by id');
$sth->bind_col(1, \my $col1);
$sth->bind_col(2, \my $col2, SQL_BLOB);
$sth->execute;
my $expected = [undef, 0, 0, 0, 0, 0, 0];
for (1..6) {
$sth->fetch;
ok $col1 && $col1 == $_;
ok $col2 && utf8::is_utf8($col2) == $expected->[$_],
"row $_ is ".($expected->[$_] ? "unicode" : "not unicode");
}
$sth->finish;
}
{
my $sth = $dbh->prepare('select * from test1 order by id');
$sth->bind_col(1, \my $col1);
$sth->bind_col(2, \my $col2, {TYPE => SQL_BLOB});
$sth->execute;
my $expected = [undef, 0, 0, 0, 0, 0, 0];
for (1..6) {
$sth->fetch;
ok $col1 && $col1 == $_;
ok $col2 && utf8::is_utf8($col2) == $expected->[$_],
"row $_ is ".($expected->[$_] ? "unicode" : "not unicode");
}
$sth->finish;
}
done_testing;
t/rt_78833_utf8_flag_for_column_names.t view on Meta::CPAN
use strict;
use warnings;
use lib "t/lib";
use SQLiteTest;
use Test::More;
use if -d ".git", "Test::FailWarnings";
use Encode;
use DBD::SQLite::Constants ':dbd_sqlite_string_mode';
BEGIN { requires_unicode_support() }
unicode_test("\x{263A}"); # (decoded) smiley character
unicode_test("\x{0100}"); # (decoded) capital A with macron
sub unicode_test {
my $unicode = shift;
ok Encode::is_utf8($unicode), "correctly decoded";
my $unicode_encoded = encode_utf8($unicode);
{ # tests for an environment where everything is encoded
my $dbh = connect_ok(sqlite_string_mode => DBD_SQLITE_STRING_MODE_BYTES);
$dbh->do("pragma foreign_keys = on");
my $unicode_quoted = $dbh->quote_identifier($unicode_encoded);
$dbh->do("create table $unicode_quoted (id, $unicode_quoted primary key)");
$dbh->do("create table bar (id, ref references $unicode_quoted ($unicode_encoded))");
ok $dbh->do("insert into $unicode_quoted values (?, ?)", undef, 1, "text"), "insert successfully";
ok $dbh->do("insert into $unicode_quoted (id, $unicode_quoted) values (?, ?)", undef, 2, "text2"), "insert with unicode name successfully";
{
my $sth = $dbh->prepare("insert into $unicode_quoted (id) values (:$unicode_encoded)");
$sth->bind_param(":$unicode_encoded", 5);
$sth->execute;
my ($id) = $dbh->selectrow_array("select id from $unicode_quoted where id = :$unicode_encoded", undef, 5);
is $id => 5, "unicode placeholders";
}
{
my $sth = $dbh->prepare("select * from $unicode_quoted where id = ?");
$sth->execute(1);
my $row = $sth->fetchrow_hashref;
is $row->{id} => 1, "got correct row";
is $row->{$unicode_encoded} => "text", "got correct (encoded) unicode column data";
ok !exists $row->{$unicode}, "(decoded) unicode column does not exist";
}
{
my $sth = $dbh->prepare("select $unicode_quoted from $unicode_quoted where id = ?");
$sth->execute(1);
my $row = $sth->fetchrow_hashref;
is $row->{$unicode_encoded} => "text", "got correct (encoded) unicode column data";
ok !exists $row->{$unicode}, "(decoded) unicode column does not exist";
}
{
my $sth = $dbh->prepare("select id from $unicode_quoted where $unicode_quoted = ?");
$sth->execute("text");
my ($id) = $sth->fetchrow_array;
is $id => 1, "got correct id by the (encoded) unicode column value";
}
{
my $sth = $dbh->column_info(undef, undef, $unicode_encoded, $unicode_encoded);
my $column_info = $sth->fetchrow_hashref;
is $column_info->{COLUMN_NAME} => $unicode_encoded, "column_info returns the correctly encoded column name";
}
{
my $sth = $dbh->primary_key_info(undef, undef, $unicode_encoded);
my $primary_key_info = $sth->fetchrow_hashref;
is $primary_key_info->{COLUMN_NAME} => $unicode_encoded, "primary_key_info returns the correctly encoded primary key name";
}
if (has_sqlite('3.6.14')) {
my $sth = $dbh->foreign_key_info(undef, undef, $unicode_encoded, undef, undef, 'bar');
my $foreign_key_info = $sth->fetchrow_hashref;
is $foreign_key_info->{PKCOLUMN_NAME} => $unicode_encoded, "foreign_key_info returns the correctly encoded foreign key name";
}
{
my $sth = $dbh->table_info(undef, undef, $unicode_encoded);
my $table_info = $sth->fetchrow_hashref;
is $table_info->{TABLE_NAME} => $unicode_encoded, "table_info returns the correctly encoded table name";
}
}
{ # tests for an environment where everything is decoded
my $dbh = connect_ok(sqlite_string_mode => DBD_SQLITE_STRING_MODE_UNICODE_STRICT);
$dbh->do("pragma foreign_keys = on");
my $unicode_quoted = $dbh->quote_identifier($unicode);
$dbh->do("create table $unicode_quoted (id, $unicode_quoted primary key)");
$dbh->do("create table bar (id, ref references $unicode_quoted ($unicode_quoted))");
ok $dbh->do("insert into $unicode_quoted values (?, ?)", undef, 1, "text"), "insert successfully";
ok $dbh->do("insert into $unicode_quoted (id, $unicode_quoted) values (?, ?)", undef, 2, "text2"), "insert with unicode name successfully";
{
my $sth = $dbh->prepare("insert into $unicode_quoted (id) values (:$unicode)");
$sth->bind_param(":$unicode", 5);
$sth->execute;
my ($id) = $dbh->selectrow_array("select id from $unicode_quoted where id = :$unicode", undef, 5);
is $id => 5, "unicode placeholders";
}
{
my $sth = $dbh->prepare("select * from $unicode_quoted where id = ?");
$sth->execute(1);
my $row = $sth->fetchrow_hashref;
is $row->{id} => 1, "got correct row";
is $row->{$unicode} => "text", "got correct (decoded) unicode column data";
ok !exists $row->{$unicode_encoded}, "(encoded) unicode column does not exist";
}
{
my $sth = $dbh->prepare("select $unicode_quoted from $unicode_quoted where id = ?");
$sth->execute(1);
my $row = $sth->fetchrow_hashref;
is $row->{$unicode} => "text", "got correct (decoded) unicode column data";
ok !exists $row->{$unicode_encoded}, "(encoded) unicode column does not exist";
}
{
my $sth = $dbh->prepare("select id from $unicode_quoted where $unicode_quoted = ?");
$sth->execute("text2");
my ($id) = $sth->fetchrow_array;
is $id => 2, "got correct id by the (decoded) unicode column value";
}
{
my $sth = $dbh->column_info(undef, undef, $unicode, $unicode);
my $column_info = $sth->fetchrow_hashref;
is $column_info->{COLUMN_NAME} => $unicode, "column_info returns the correctly decoded column name";
}
{
my $sth = $dbh->primary_key_info(undef, undef, $unicode);
my $primary_key_info = $sth->fetchrow_hashref;
is $primary_key_info->{COLUMN_NAME} => $unicode, "primary_key_info returns the correctly decoded primary key name";
}
if (has_sqlite('3.6.14')) {
my $sth = $dbh->foreign_key_info(undef, undef, $unicode, undef, undef, 'bar');
my $foreign_key_info = $sth->fetchrow_hashref;
is $foreign_key_info->{PKCOLUMN_NAME} => $unicode, "foreign_key_info returns the correctly decoded foreign key name";
}
{
my $sth = $dbh->table_info(undef, undef, $unicode);
my $table_info = $sth->fetchrow_hashref;
is $table_info->{TABLE_NAME} => $unicode, "table_info returns the correctly decoded table name";
}
}
}
done_testing;
t/rt_96877_unicode_statements.t view on Meta::CPAN
# According to the sqlite doc, the SQL argument to sqlite3_prepare_v2
# should be in utf8, but DBD::SQLite does not ensure this (even with
# sqlite_unicode => 1). Only bind values are properly converted.
use strict;
use warnings;
use lib "t/lib";
use SQLiteTest;
use Test::More;
use if -d ".git", "Test::FailWarnings";
BEGIN { requires_unicode_support() }
use DBD::SQLite::Constants ':dbd_sqlite_string_mode';
my $dbh = connect_ok( sqlite_string_mode => DBD_SQLITE_STRING_MODE_UNICODE_NAIVE );
is( $dbh->{sqlite_string_mode}, DBD_SQLITE_STRING_MODE_UNICODE_NAIVE, 'Unicode is on' );
ok( $dbh->do(<<'END_SQL'), 'CREATE TABLE' );
CREATE TABLE foo (
bar varchar(255)
)
t/virtual_table/21_perldata_charinfo.t view on Meta::CPAN
use strict;
use warnings;
# test the example described in
# L<DBD::SQLite::VirtualTable::PerlData/"Hashref example : unicode characters">
use lib "t/lib";
use SQLiteTest qw/connect_ok $sqlite_call/;
use Test::More;
BEGIN {
# check for old Perls which did not have Unicode::UCD in core
unless (eval "use Unicode::UCD 'charinfo'; 1") {
plan skip_all => "Unicode::UCD does not seem to be installed";
}