view release on metacpan or search on metacpan
above, provided that you also meet all of these conditions:
a) You must cause the modified files to carry prominent notices
stating that you changed the files and the date of any change.
b) You must cause any work that you distribute or publish, that in
whole or in part contains or is derived from the Program or any
part thereof, to be licensed as a whole at no charge to all third
parties under the terms of this License.
c) If the modified program normally reads commands interactively
when run, you must cause it, when started running for such
interactive use in the most ordinary way, to print or display an
announcement including an appropriate copyright notice and a
notice that there is no warranty (or else, saying that you provide
a warranty) and that users may redistribute the program under
these conditions, and telling the user how to view a copy of this
License. (Exception: if the Program itself is interactive but
does not normally print such an announcement, your work based on
the Program is not required to print an announcement.)
Also add information on how to contact you by electronic and paper mail.
If the program is interactive, make it output a short notice like this
when it starts in an interactive mode:
Gnomovision version 69, Copyright (C) year name of author
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
This is free software, and you are welcome to redistribute it
under certain conditions; type `show c' for details.
The hypothetical commands `show w' and `show c' should show the appropriate
parts of the General Public License. Of course, the commands you use may
be called something other than `show w' and `show c'; they could even be
mouse-clicks or menu items--whatever suits your program.
You should also get your employer (if you work as a programmer) or your
school, if any, to sign a "copyright disclaimer" for the program, if
necessary. Here is a sample; alter the names:
Yoyodyne, Inc., hereby disclaims all copyright interest in the program
`Gnomovision' (which makes passes at compilers) written by James Hacker.
<http://search.cpan.org/dist/UMLS-Interface/>
UMLS::Association
Use to calculate association scores used in most of the ranking method.
The package is freely available at:
<http://search.cpan.org/dist/UMLS-Association/>
Stage 3: Install ALBD package
The usual way to install the package is to run the following commands:
perl Makefile.PL
make
make test
make install
You will often need root access/superuser privileges to run make
install. The module can also be installed locally. To do a local
install, you need to specify a PREFIX option when you run 'perl
Makefile.PL'. For example,
ALBD requires that a co-occurrence matrix of CUIs has been created. This
matrix is stored as a flat file, in a sparse matrix format such that
each line contains three tab seperated values, cui_1, cui_2, n_11 = the
count of their co-occurrences. Any matrix with that format is
acceptable, however the intended method of matrix generation is to
convert a UMLS::Association database into a flat matrix file. These
databases are created using the CUICollector tool of UMLS::Association,
and are run over the MetaMapped Medline baseline. With that file, run
utils/datasetCreator/fromMySQL/dbToTab.pl to convert the desired
database into a matrix file. Notice that code in dbToTab.pl is just a
sample mysql command. If the input database is created in another
method, a different command may be needed. As long as the resulting
co-occurrence matrix is in the correct format LBD may be run on it. This
allows flexibility in where co-occurrence information comes from.
Note: utils/datasetCreator/fromMySQL/removeQuotes.pl may need to be run
on the resulting tab seperated file, if quotes are inlcuded in the
resulting co-ocurrence matrix file.
Stage 5: Set up Dummy UMLS::Association Database
UMLS::Association requires that a database can be connected to that is
in the correct format. Although this database is not required for ALBD
UMLS::Association requires the UMLS::Interface module to access
the Unified Medical Language System (UMLS) for semantic type filtering
and to determine if CUIs are valid.
The following sections describe the organization of this software
package and how to use it. A few typical examples are given to help
clearly understand the usage of the modules and the supporting
utilities.
INSTALL
To install the module, run the following magic commands:
perl Makefile.PL
make
make test
make install
This will install the module in the standard location. You will, most
probably, require root privileges to install in standard system
directories. To install in a non-standard directory, specify a prefix
during the 'perl Makefile.PL' stage as:
ALBD requires that a co-occurrence matrix of CUIs has been created. This
matrix is stored as a flat file, in a sparse matrix format such that
each line contains three tab seperated values, cui_1, cui_2, n_11 = the
count of their co-occurrences. Any matrix with that format is
acceptable, however the intended method of matrix generation is to
convert a UMLS::Association database into a flat matrix file. These
databases are created using the CUICollector tool of UMLS::Association,
and are run over the MetaMapped Medline baseline. With that file, run
utils/datasetCreator/fromMySQL/dbToTab.pl to convert the desired
database into a matrix file. Notice that code in dbToTab.pl is just a
sample mysql command. If the input database is created in another
method, a different command may be needed. As long as the resulting
co-occurrence matrix is in the correct format LBD may be run on it. This
allows flexibility in where co-occurrence information comes from.
Note: utils/datasetCreator/fromMySQL/removeQuotes.pl may need to be run
on the resulting tab seperated file, if quotes are inlcuded in the
resulting co-ocurrence matrix file.
Set Up Dummy UMLS::Association Database
UMLS::Association requires that a database can be connected to that is
in the correct format. Although this database is not required for ALBD
lib/ALBD.pm view on Meta::CPAN
data from which LBD is performed are co-occurrences matrices
generated from UMLS::Association. ALBD is based on the ABC
co-occurrence model. Many options can be specified, and many
ranking methods are available. The novel ranking methods that use
association measure are available as well as frequency based
ranking methods. See samples/lbd for more info. Can perform open and
closed LBD as well as time slicing evaluation.
=head1 INSTALL
To install the module, run the following magic commands:
perl Makefile.PL
make
make test
make install
This will install the module in the standard location. You will, most
probably, require root privileges to install in standard system
directories. To install in a non-standard directory, specify a prefix
during the 'perl Makefile.PL' stage as:
lib/ALBD.pm view on Meta::CPAN
my $self = {};
my $className = shift;
my $optionsHashRef = shift;
bless($self, $className);
$self->_initialize($optionsHashRef);
return $self;
}
# Initializes everything needed for Literature Based Discovery
# input: $optionsHashRef <- reference to LBD options hash (command line input)
# output: none, but global parameters are set
sub _initialize {
my $self = shift;
my $optionsHashRef = shift;
#initialize UMLS::Interface
my %tHash = ();
$tHash{'t'} = 1; #default hash values are with t=1 (silence module output)
my $componentOptions = \%tHash;
if (${$optionsHashRef}{'interfaceConfig'} ne '') {
lib/LiteratureBasedDiscovery/Rank.pm view on Meta::CPAN
$scores{$cKey} = $score;
}
}
return \%scores;
}
# gets a list of A->C pairs, and sets the value as the implicit matrix value
# input: $startingMatrixRef <- ref to the starting matrix
# $implicitMatrixRef <- ref to the implicit matrix
# output: a hash ref where keys are comma seperated cui pairs hash{'C000,C111'}
# and values are set to the value at that index in the implicit matrix
sub _getACPairs {
my $startingMatrixRef = shift;
my $implicitMatrixRef = shift;
#generate a list of ac pairs
my %acPairs = ();
foreach my $keyA (keys %{$implicitMatrixRef}) {
foreach my $keyC (%{${$implicitMatrixRef}{$keyA}}) {
$acPairs{$keyA,$keyC} = ${${$implicitMatrixRef}{$keyA}}{$keyC};
lib/LiteratureBasedDiscovery/TimeSlicing.pm view on Meta::CPAN
}
# generates a hash of all association scores from the matrix
# the hash keys are $rowKey,$colKey. Hash values are the association scores
# between the $rowKey and $colKey. All co-occurring cui pairs from the matrix
# are calculated
# input: $matrixRef <- a reference to a sparse matrix
# $rankingMeasue <- a string specifying the ranking measure to use
# $umls_association <- an instance of UMLS::Association
# output: \%cuiPairs <- a ref to a hash of CUI pairs and their assocaition
# each key of the hash is a comma seperated string
# containing cui1, and cui2 of the pair
# (e.g. 'cui1,cui2'), and each value is their association
# score using the specified assocition measure
sub getAssociationScores {
my $matrixRef = shift;
my $rankingMeasure = shift;
my $umls_association = shift;
print " getting Association Scores, rankingMeasure = $rankingMeasure\n";
#generate a list of cui pairs in the matrix
lib/LiteratureBasedDiscovery/TimeSlicing.pm view on Meta::CPAN
$max = $val;
}
}
return ($min,$max);
}
# Applies a threshold to a matrix using a corresponding association scores
# hash. Any keys less than the threshold are not copied to the new matrix
# input: $threshold <- a scalar threshold
# $assocScoresRef <- a reference to a cui pair hash of association
# scores. Each key is a comma seperated cui pair
# (e.g. 'cui1,cui2'), values are their association
# scores.
# $matrixRef <- a reference to a co-occurrence sparse matrix that
# corresponds to the assocScoresRef
# output: \%thresholdedMatrix < a ref to a new matrix, built from the
# $matrixRef after applying the $threshold
sub applyThreshold {
my $threshold = shift;
my $assocScoresRef = shift;
my $matrixRef = shift;
lib/LiteratureBasedDiscovery/TimeSlicing.pm view on Meta::CPAN
}
#return the thresholded matrix
return \%thresholdedMatrix;
}
# Grabs the K highest ranked samples. This is for thresholding based the number
# of samples. Used in explicit timeslicing
# input: $k <- the number of samples to get
# $assocScoresRef <- a reference to a cui pair hash of association
# scores. Each key is a comma seperated cui pair
# (e.g. 'cui1,cui2'), values are their association
# scores.
# $matrixRef <- a reference to a co-occurrence sparse matrix that
# corresponds to the assocScoresRef
# output: \%thresholdedMatrix <- a ref to a sparse matrix containing only the
# $k ranked samples (cui pairs)
sub grabKHighestRankedSamples {
my $k = shift;
my $assocScoresRef = shift;
my $matrixRef = shift;
samples/configFileSamples/UMLSAssociationConfig view on Meta::CPAN
# UMLS::Association
<database>CUI_Bigram
# If the UMLS::Association Database is not installed on the local machine
# The following parameters may be needed to connect to the server
<hostname>192.168.00.00
<username>username
<password>password
<socket>/var/run/mysqld.sock
# makes the UMLS::Association not print to the command line
<t>
samples/configFileSamples/UMLSInterfaceConfig view on Meta::CPAN
# are in <>'s, and values follow directly after with no space. As as example,
# the line "<database>umls" will pass the 'database' parameter with a value
# of 'umls' of UMLS::Interface options hash for its initialization.
#
# For parameters where no value is needed, just write the name of the
# parameter in '<>' (e.g. '<debug>')
#
#
# See UMLS::Interface for more detail
# makes the UMLS::Interface not print to the command line
<t>
# Put the full pathname of the UMLS::Interface Config File
<config>/home/sam/assocLBD-0.01/config/interfaceConfig
# If the UMLS::Interface Database is not installed on the local machine
# The following parameters may be needed to connect to the server
<hostname>192.168.00.00
<username>username
<password>password
samples/lbdConfig view on Meta::CPAN
# pmi - Pointwise Mutual Information
# phi - Phi Coefficient
# chi - Pearson's Chi Squared Test
# ps - Poisson Stirling Measure
# tscore - T-score
<rankingMeasure>ll
# The output path of the results of lbd
<implicitOutputFile>sampleOutput
# a comma seperated list of linking (B) term accept semantic groups, which
# restricts the linking terms to the semantic groups specified. Group names
# come directly from the UMLS.
# See https://metamap.nlm.nih.gov/Docs/SemGroups_2013.txt for a list
#<linkingAcceptGroups>CHEM,DISO,GENE,PHYS,ANAT
# similar to linking accept groups, this restricts the acceptable linking (B)
# terms to terms within the semantic types listed
# See http://metampa.nlm.gov/Docs/SemanticTypes_2013AA.txt for a list
#<linkingAcceptGroups>clnd,chem
# a comma seperated list of target (C) term accept semantic groups, which
# restricts the linking terms to the semantic groups specified. Group names
# come directly from the UMLS.
# See https://metamap.nlm.nih.gov/Docs/SemGroups_2013.txt for a list
#<targetAcceptGroups>CHEM,GENE
# similar to target termcept groups, this restricts the acceptable target (C)
# terms to terms within the semantic types listed
# See http://metampa.nlm.gov/Docs/SemanticTypes_2013AA.txt for a list
#<linkingAcceptGroups>clnd,chem
# Input file path for the explicit co-occurrence matrix used in LBD
<explicitInputFile>sampleExplicitMatrix
# A comma seperated list of starting (A) cuis used in LBD
<startCuis>C0001554,C1961131
# A comma seperated list of target (C) cuis. If specified, system enters closed
# discovery mode
#<targetCuis>
samples/timeSlicingConfig view on Meta::CPAN
# ll - Log-likelihood ratio
# tmi - Mutual Information
# odds - Odds Ratio
# pmi - Pointwise Mutual Information
# phi - Phi Coefficient
# chi - Pearson's Chi Squared Test
# ps - Poisson Stirling Measure
# tscore - T-score
<rankingMeasure>ll
# a comma seperated list of linking (B) term accept semantic groups, which
# restricts the linking terms to the semantic groups specified. Group names
# come directly from the UMLS.
# See https://metamap.nlm.nih.gov/Docs/SemGroups_2013.txt for a list
#<linkingAcceptGroups>CHEM,DISO,GENE,PHYS,ANAT
# similar to linking accept groups, this restricts the acceptable linking (B)
# terms to terms within the semantic types listed
# See http://metampa.nlm.gov/Docs/SemanticTypes_2013AA.txt for a list
#<linkingAcceptGroups>clnd,chem
# a comma seperated list of target (C) term accept semantic groups, which
# restricts the linking terms to the semantic groups specified. Group names
# come directly from the UMLS.
# See https://metamap.nlm.nih.gov/Docs/SemGroups_2013.txt for a list
#<targetAcceptGroups>CHEM,GENE
# similar to target termcept groups, this restricts the acceptable target (C)
# terms to terms within the semantic types listed
# See http://metampa.nlm.gov/Docs/SemanticTypes_2013AA.txt for a list
#<linkingAcceptGroups>clnd,chem
#check that the correct number of values are read for all the
# time slicing metrics
ok (scalar @{$newAPScoresRef} == 11, "Correct Count of Average Precisions");
ok (scalar @{$newPAtKScoresRef} == 19, "Correct Count of Precision at K's");
ok (scalar @{$newFAtKScoresRef} == 19, "Correct Count of Freq at K's");
#check that each of the AP scores match the gold (within error tolerance)
my $apSame = 1;
for (my $i = 0; $i < scalar @{$goldAPScoresRef}; $i++) {
#check both comma seperated values (precision and recall)
my @goldScores = split(',',${$goldAPScoresRef}[$i]);
my @newScores = split(',',${$newAPScoresRef}[$i]);
if ((abs($goldScores[0]-$newScores[0]) > $precRecallErrorTol)
&& (abs($goldScores[1]-$newScores[1]) > $precRecallErrorTol)) {
$apSame = 0;
last;
}
}
ok($apSame == 1, "Average Precisions Match");
utils/datasetCreator/fromMySQL/dbToTab.pl view on Meta::CPAN
#converts a mysql database to tab seperated readable by LBD
#command is of the form:
#`mysql <DB_NAME> -e "SELECT * FROM N_11 INTO OUTFILE '<OUTPUT_FILE>' FIELDS TERMINATED BY '\t' OPTIONALLY ENCLOSED BY '\"' LINES TERMINATED BY '\n';"`
#
# the following line is an example using a database with cui co-occurrence
# counts from 1980 to 1984 with a window size of 1. The mysql database is
# called 1980_1984_window1, and the output matrix file is called
# 1980_1984_window1_data.txt
`mysql 1980_1984_window1 -e "SELECT * FROM N_11 INTO OUTFILE '1980_1984_window1_data.txt' FIELDS TERMINATED BY '\t' OPTIONALLY ENCLOSED BY '\"' LINES TERMINATED BY '\n';"`;
utils/runDiscovery.pl view on Meta::CPAN
=head1 INPUT
=head2 LBD_CONFIG_FILE
Configuration file specifying the parameters of LBD.
See '../config/lbd' for an example
=head1 OPTIONS
Optional command line arguements
=head2 General Options:
=head3 --help
displays help, a quick summary of program options
=head3 --assocConfig
path to a UMLS::Association configuration file. Default location is
utils/runDiscovery.pl view on Meta::CPAN
." --help Print this help screen.\n"
." --version Print the version number\n"
."\nOPTIONS\n"
." --assocConfig path to the UMLS::Association Config File\n"
." --interfaceConfig path to the UMLS::Interface Config File\n"
."\nUSAGE EXAMPLES\n"
." runDiscovery lbdConfigFile\n";
;
#############################################################################
# Parse command line options
#############################################################################
my $DEBUG = 0; # Prints EVERYTHING. Use with small testing files.
my $HELP = ''; # Prints usage and exits if true.
my $VERSION;
#set default param values
my %options = ();
$options{'assocConfig'} = '';
$options{'interfaceConfig'} = '';