ALBD
view release on metacpan or search on metacpan
NAME
ALBD README
SYNOPSIS
This package consists of Perl modules along with supporting Perl
programs that perform Literature Based Discovery (LBD). The core
data from which LBD is performed are co-occurrences matrices
generated from UMLS::Association. ALBD is based on the ABC
co-occurrence model. Many options can be specified, and many
ranking methods are available. The novel ranking methods that use
association measure are available as well as frequency based
ranking methods. See samples/lbd for more info. Can perform open and
closed LBD as well as time slicing evaluation.
ALBD requires UMLS::Association both to compute the co-occurrence
database that the co-occurrence matrix is derived from, but also for
ranking the generated C terms.
UMLS::Association requires the UMLS::Interface module to access
the Unified Medical Language System (UMLS) for semantic type filtering
and to determine if CUIs are valid.
The following sections describe the organization of this software
package and how to use it. A few typical examples are given to help
clearly understand the usage of the modules and the supporting
utilities.
INSTALL
To install the module, run the following magic commands:
perl Makefile.PL
make
make test
make install
This will install the module in the standard location. You will, most
probably, require root privileges to install in standard system
directories. To install in a non-standard directory, specify a prefix
during the 'perl Makefile.PL' stage as:
perl Makefile.PL PREFIX=/home/programs
It is possible to modify other parameters during installation. The
details of these can be found in the ExtUtils::MakeMaker documentation.
However, it is highly recommended not messing around with other
parameters, unless you know what you're doing.
CO-OCCURRENCE MATRIX SETUP
ALBD requires that a co-occurrence matrix of CUIs has been created. This
matrix is stored as a flat file, in a sparse matrix format such that
each line contains three tab seperated values, cui_1, cui_2, n_11 = the
count of their co-occurrences. Any matrix with that format is
acceptable, however the intended method of matrix generation is to
convert a UMLS::Association database into a flat matrix file. These
databases are created using the CUICollector tool of UMLS::Association,
and are run over the MetaMapped Medline baseline. With that file, run
utils/datasetCreator/fromMySQL/dbToTab.pl to convert the desired
database into a matrix file. Notice that code in dbToTab.pl is just a
sample mysql command. If the input database is created in another
method, a different command may be needed. As long as the resulting
co-occurrence matrix is in the correct format LBD may be run on it. This
allows flexibility in where co-occurrence information comes from.
Note: utils/datasetCreator/fromMySQL/removeQuotes.pl may need to be run
on the resulting tab seperated file, if quotes are inlcuded in the
resulting co-ocurrence matrix file.
Set Up Dummy UMLS::Association Database
UMLS::Association requires that a database can be connected to that is
in the correct format. Although this database is not required for ALBD
(since co-occurrence data is loaded from a co-occurrence matrix), it is
required to run UMLS:Association. If you ran UMLS::Association to
generate a co-occurrence matrix, you should be fine. Otherwise you will
( run in 0.517 second using v1.01-cache-2.11-cpan-ceb78f64989 )