DiaColloDB
view release on metacpan or search on metacpan
v0.10.008 Wed, 24 Aug 2016 14:12:21 +0200 moocow
* merged in debugging changes from v0.10.004 debugging branch (_v0.10.004_0[123])
- conditionally enabled by new dcdb-create.perl -debug option
* added Utils::fh_flush() and Utils::fh_reopen() methods
- fh_reopen() should simulate flush() even on systems which don't support flush()
* updated Persistent subclasses to call fh_reopen() from their flush() methods:
- EnumFile(+FixedLen +FixedMap +MMap), MultiMapFile(+MMap), PackedFile(+MMap)
v0.10.007 Wed, 24 Aug 2016 08:59:10 +0200 moocow
* removed "hard" pdl dependencies, moved to 'recommends'
* fixed default option inheritance for dcdb-create.perl hash-valued options -tdf-option, -option
* added use_ok(DiaColloDB::Upgrade) test: weird errors w/ DDC::Any
* fixed import DiaColloDB::Utils::packsize() in Relation.pm
* fixed native query-parsing for TDF, DDC relations
- direct ddc-parsing can be forced with "[QUERY]" or "(QUERY)"
v0.10.006 Mon, 11 Jul 2016 11:03:01 +0200 moocow
* better version dependency for v5.10.0
v0.10.005 Thu, 07 Jul 2016 14:40:04 +0200 moocow
* replaced DDC::XS query-parsing and -manipulation with DDC::Any from DDC::Concordance >= v0.25
- obviates troublesome Alien::DDC::Concordance dependency
- still only expected to run correctly on *NIX systems due to runtime calls to sort etc.
* commented out DiaColloDB::create() debugging code from v0.10.004_01
v0.10.004_03 Thu, 21 Jul 2016 13:43:49 +0200 moocow
* added dcdb-create -nommap option: see if mmap use in VirtualBox/MacOS is causing errors
v0.10.004_02 2017-07-15 moocow
* debugging test for PackedFile::MMap -- no joy
v0.10.004_01 Tue, 05 Jul 2016 09:27:52 +0200 moocow
* debugging release for un-reproducible 'undefined value' errors on Birmingham data
v0.10.004 Tue, 28 Jun 2016 09:30:42 +0100 moocow
* updated -nofilters option to dcdb-create.perl (alias -use-all-the-data, a la Mark Lauersdorf)
* added DDCTabs 'foreign' option (-dO=foreign=1)
* added (p|w|l)(good|bad)file options to DiaColloDB::create (stoplist files)
v0.10.003 Tue, 21 Jun 2016 15:37:23 +0200 moocow
* added -subclient-option to dcdb-query.perl (common options for list:// sub-clients)
* fixed stringification bug for ddc-diff queries introduced in v0.09.002
'Can't use string ("l") as a HASH ref while "strict refs" in use at DiaColloDB/Relation.pm line 281.'
v0.10.002 Mon, 13 Jun 2016 15:51:42 +0200 moocow
* native query syntax fix: identify CQOr queries and throw an error
v0.10.001 Thu, 12 May 2016 16:57:56 +0200 moocow
* added -log-level option to dcdb-info.perl
* removed dates from generic term-tuple vocabulary ("x-tuples" -> "t-tuples"), a la tdf relation
* changed db structure for more efficient 2-pass Cofreqs queries (f2 bug-fix)
- Cofreqs now 3-level (id1 -> (date -> (id2->f)))
- Unigrams now 2-level (id1 -> (date -> f))
- Relation::subprofile1() and subprofile2() calling conventions changed
- changed temporary file format for "tokens.dat" used by DiaColloDB::create(): added dates
* changed export text file formats
- Unigrams: added dates
- Cofreqs: added dates and un-collocated f1 lines
- "x-tuple" exports replaced by corresponding "t-tuple" exports xenum->tenum, ATTR_2x.*->ATTR_2t, etc.
* added upgrade package v0_10_x2t
- added compatibility wrappers Compat::v0_09::* for transparent use of old indices
* added auto-backup of changed files to upgrade framework
- upgraders are now instantiated as objects, not just packages: cache header & options
* added DiaColloDB::Upgrade::Base::revert() method and -revert option to dcdb-upgrade.perl
- default implementation relies on subclass revert_created() and revert_updated() methods
* added dcdb-upgrade.perl options -keep, -[no]backup
* added DiaColloDB::Utils functions copyto(), moveto(), copyto_a(), cp_a()
* added DiaColloDB::Persistent method-wrappers copyto(), moveto(), copyto_a()
* added optimized PackedFile::MMap::bsearch() method
- for faster v0.10.x Cofreqs 'onepass' mode; still not as fast as v0.09.x 1-pass but it's incorrect anyways
* removed unused methods Cofreqs::f1(), Cofreqs::f12()
* removed obsolete method DiaColloDB::xidsByDate()
* re-factored compatibility wrappers into DiaColloDB::Compat::vX_Y_Z::*
v0.09.004 Tue, 03 May 2016 14:03:13 +0200 moocow
* devel only, no CPAN release
* cofreqs (load|save)TextFh() idempotency tweaks for un-collocated f1
* mmap optimization for Cofreqs::subprofile2(): ca. 26% improvement
* PackedFile dump tweaks: better handling of non-singleton pack formats
* added Utils::packsingle(): better check for singleton pack formats
v0.09.003 Wed, 27 Apr 2016 09:55:14 +0200 moocow
* fixed 'undefined value in vec' warning in DiaColloDB/Relation.pm
v0.09.002 Tue, 26 Apr 2016 15:46:17 +0200 moocow
* fixed comparison profile stringification for new pack()-encoded profiles,
regression for v0.09.001 "f2 bug" fix
v0.09.001 Tue, 26 Apr 2016 14:49:29 +0200 moocow
* fixed double-counting f2 for multiple item1 targets with shared item2 collocates in Cofreqs::subprofile1() 1-pass mode
* added auto-upgrade framework
- DiaColloDB::Upgrade - top-level API
- DiaColloDB::Upgrade::Base - subclass API & defaults
- added subclass ::v0_08_to_v0_09_multimap for v0.09.x multimap format change
- dcdb-upgrade.perl : top-level auto-upgrade script
* added compatiblity mode for multimaps as DiaColloDB::MultiMapFile::v0_08
* fixed -nokeep option to dcdb-create.perl
* TDF union: avoid storage of non-persistent object keys qw(docmeta wdmfile logas reusedir)
* TDF union: fixed 'bus error' resulting from attempt to mmap() temporary data beyond EOF
- arose in dta+dwds trying to include 'pnd' metadata only indexed in dta
- temporary PackedFile tdf.d/mvals_pnd.pf had no entries for dwds data (pnd not indexed)
- readPdlFile(...,Dims=>[$NC]) choked with 'bus error'
* Client::list overhaul
- new default fudge=>10 should be safe (but rather expensive)
- re-factored Client::list::profile() and compare() methods
* improved Client and Client::list documentation
- added "incorrect independent collocate frequencies" section to Client::list documentation
- milder form of this bug applies even to single native CoFreqs indices ("f2 bug", see below)
* workaround for incorrect independent collocate frequency acquisition code in Cofreqs ("f2 bug")
- f2 were computed as marginals only over those (x1,x2,date) triples with f(x1,x2,date) > 0,
rather than over all (*,x2,date \in slice)
- result were in general underestimates of f2
- fix uses 2-pass acquisition strategy, ca. 10x slower for frequent targets (e.g. 'Mann')
~ old subprofile() method refactored into subprofile1() and subprofile2()
- todo: possibly re-factor db structure to use tdf-style {tenum} rather than {xenum},
minimize group-key lookup & optimize for serial cofreqs dba2 file access
- added 'onepass' query option for fast, old, incorrect f2 frequency acquisition (Cofreqs only)
v0.08.006 Thu, 10 Mar 2016 16:52:19 +0100 moocow
* added dbexport() support for TDF relations
* allow option pass-through for Profile::Multi::compile()
( run in 1.965 second using v1.01-cache-2.11-cpan-39bf76dae61 )