DTA-CAB
view release on metacpan or search on metacpan
+ it seems though that we can't rely on these, since they don't survive e.g. XML-RPC coding
+ also we need some hook besides analysis class, for parsers (data doesn't yet have a class)
+ we also appear to have solved the 'generic access' problems with closures, so we don't need analysis classes there
+ upshot: lose analysis classes in next checkin
* more fixes for CAB::Chain
+ dta-cab-analyze output for new chain now identical to old version (services) for test-kant-8k
+ TODO: format updates, documentation updates, ...
* started re-factorization for abstract Chain analyzers
- current conundrum: how to handle flexible {src}, {dst} as previously passed in in %opts e.g. for Automaton ?
- idea: abstract Analysis class, API
* fixed buglet (no "return $tok" in MorphSafe analysis sub) --
- maybe re-think that API (e.g. analyze() is always destructive?)
- next steps: re-factor CAB hacks into analyzers, get old CAB working as Analyzer::Chain
- benchmark old closure-style analyzeToken() vs. new force-document analyzeTypes() [via XML-RPC? in-memory?]
- add default control options to chain (e.g. doAnalyzeWhatever=>BOOL), add {name} convention for all analyzers
- re-work I/O Formats -- better flexibility & handling of new fields
* fixed bug 'no start without stop on dta-cab.sh restart'
v0.18 2009-12-01 14:46 moocow
* Format/XmlNative.pm safety fixes
v0.17 2009-11-12 10:30 moocow
* DocClassify dummy document fixes
v0.16 2009-11-12 10:18 moocow
* updated pid files
v0.15 2009-10-16 09:45 moocow
* added configs tweet-server-[1234].(rc|plm) for round-robin
* use 'funconly-nofeatures' morph variant by default
* add @NEW tag to DHMM
* added tag @NEW to negra-yy.123
* use corpus as target language for tweet rewrite (also re-build ../automata/tweeted)
* added tweet-server.rc to dta-cab.sh
v0.14 2009-09-23 12:15 moocow
* tweet stuff
* added negra-yy.123
* added dta-words.tf
* added basic PoS-tagger CAB::Analyzer::Moot
* added dta-words.de.lex.latin1.tf.t
* re-routed word-list
* removed word-list dta-words.lex.tf: now build by 'make -C words/'
* words/: build from /home/dta/dta_tokenized_xml
* added words/ make build-system for word-lists
* updated CAB: use FSTs for eqpho, eqrw
- only get latin-1 forms (xlit/unicruft) on output side, but this is exactly what we need for DDC
v0.13 2009-08-28 13:36 moocow
* updated eqrw rules (use FST instead of dict)
* added EqRW.pm, EqRW/Dict.pm
* moved Dict::EqRW -> EqRW::Dict
* fixed latin-1/utf-8 bug in CAB::Analyzer::Automaton
v0.12 2009-08-06 11:29 moocow
* equiv-expander work
- TODO: get eqrw working via FST
v0.11 2009-08-03 14:26 moocow
* removed eqpho-dict
- TODO: get eqrw working with 1-sided FST (explicit cascade direct from token-stored rw output)
* added EqPho/FST.pm
- updated Analyzer::Automaton for non-deterministic analysis
- e.g. split Text->Pho and Pho->EqText into 2 FST analyzers
* updated dta-eqrw.dict (after additional punishments for 'hülfe' in target lg)
* more rewrite-equivalence class testing
+ got integrated in DTA::CAB class, server config, etc.
+ got dictionary building
+ found some more data-type bugs (tagh, rewrite, msafe, ...):
- hülfe -> helf~en ... [subjII] : see misc/notes/*
+ found more tokenizer problems/bugs: see misc/notes/tokenizer.txt
+ added XmlRpc server config arg 'aos=>\%name2options' to allow server to set default options on a per-analyzer basis
- useful for e.g. always requiring 'xlit' to run without shamelessly wasting memory by duplicationg $cab
v0.10 2009-07-24 14:37 moocow
* added dta-cab-compile.perl: compile analyzer configs to binary
* added binary I/O routines for analyzers in DTA::CAB::Persistent
* re-worked Dict::EqClass to use non-deterministic kernel (so now any relation can be used to induce the equivalence class)
* added system/resources/Makefile rules to generate rewrite-equivalence dictionary for use with Dict::EqClass
* initial tests seem to work well
v0.09 2009-07-24 14:34 moocow
* dictionary/cache updates
v0.08 2009-07-23 14:34 moocow
* removed stale old-format cache files
* added cache-generation to resources Makefile
* moved EqClass, LatinDict to Dict:: namespace
* added EqPho analyzer via Gfsm::XL cascade
- loads quicker, runs slower, still maybe some buglets
* updated rewrite dict with better upper/lower case heuristics
v0.07 2009-07-03 13:42 moocow
* added linear-function max_weight computation for Gfsm::XL (rewrite) cascades
v0.0602 2009-07-03 13:39 moocow
* updated system/cab.plm to use new rewrite FST, dict
* updated dta-rw.dict
* added -log-config option to dta-cab-analyze.perl
* added cab-server-nodict.plm: useful for testing e.g. rewrite cascade w/o exception lexicon
* MorphSafe back-changes: ITJ is unsafe
* minor MorphSafe changes, new rw dict
v0.0601 2009-06-26 14:28 moocow
* added dta-rw.dict, updated MorphSafe
* added dta-rw.dict: extracted from grimm/wm-eval data
* updated resource makefile
* added symlink taxi-resources
* Morph/Latin uses tolower=>1
v0.06 2009-06-25 18:48 moocow
* Morph/Latin: set tolower=>1 by default
* minor server log format and config updates
* added magic bless() to cab.plm
* added latin resource to cab.plm
* got latin recognizer working via Gfsm subclass Analyzer::Morph::Latin
v0.05 2009-06-17 14:49 moocow
* more dta-cab link-up stuff
* more attribute pass-through for dta-tokwrap sentence & document attributes
* added dta-tokwrap pass-through token attributes {other}{xmlid}, {other}{chars}
v0.04 2009-06-11 12:21 moocow
* added Unicruft to Makefile.PL PREREQ_PM
* replaced Transliterator with Unicruft (using libunicruft)
v0.03 2009-06-09 14:26 moocow
* more encoding hell, started replace Transliterator with Unicruft-based version
* added parsing and pass-through of '$tok->{other}' attributes for Format::XmlNative
* updated Text, TT Formats
* updated log4perl config to use 24-hour time
* updated init script
* minor doc fix
* added -verbose options to perl scripts
* more doc updates
* doc update
* updated docs, incremented version to v0.03
v0.02 2009-06-05 12:58 moocow
* added Format/XmlTW.pm: dta-tokwrap interface format (1st stab)
* doc fix
* added test-word 'oede' to dta-lts.china.dict
* added analyzer aliases to cab-server.plm, system/cab-server.plm
* moved dta-cab-multi.sh to dta-cab.sh
* changed default xml-rpc port to 8088
* moved Protocol.pod to XmlRpcProtocol.pod
v0.01 2009-05-08 20:51 moocow
* updated cab.plm
* added system/ directory: system-wide installation stuff
* added client-request-level logging to Server::XmlRpc (used RPC::XML::Procedure subclass)
* added server options: -daemon , -pidfile=FILE
* MorphSafe fixes (for changed analysis structure)
* updated program --version behavior: report some SVN keywords
* added svn:keywords
* more documentation
* documented (Client|Server)/XmlRpc.pm
* documented Analyzers
* documented, documented, documented
* moved *.POD to *.rpod (avoid auto-installation)
( run in 0.468 second using v1.01-cache-2.11-cpan-0bb4e1dffa6 )