view release on metacpan or search on metacpan
These requirements apply to the modified work as a whole. If
identifiable sections of that work are not derived from the Program,
and can be reasonably considered independent and separate works in
themselves, then this License, and its terms, do not apply to those
sections when you distribute them as separate works. But when you
distribute the same sections as part of a whole which is a work based
on the Program, the distribution of the whole must be on the terms of
this License, whose permissions for other licensees extend to the
entire whole, and thus to each and every part regardless of who wrote it.
Thus, it is not the intent of this section to claim rights or contest
your rights to work written entirely by you; rather, the intent is to
exercise the right to control the distribution of derivative or
collective works based on the Program.
In addition, mere aggregation of another work not based on the Program
with the Program (or with a work based on the Program) on a volume of
a storage or distribution medium does not bring the other work under
the scope of this License.
3. You may copy and distribute the Program (or a work based on it,
all those who receive copies directly or indirectly through you, then
the only way you could satisfy both it and this License would be to
refrain entirely from distribution of the Program.
If any portion of this section is held invalid or unenforceable under
any particular circumstance, the balance of the section is intended to
apply and the section as a whole is intended to apply in other
circumstances.
It is not the purpose of this section to induce you to infringe any
patents or other property right claims or to contest validity of any
such claims; this section has the sole purpose of protecting the
integrity of the free software distribution system, which is
implemented by public license practices. Many people have made
generous contributions to the wide range of software distributed
through that system in reliance on consistent application of that
system; it is up to the author/donor to decide if he or she is willing
to distribute software through any other system and a licensee cannot
impose that choice.
This section is intended to make thoroughly clear what is believed to
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.
END OF TERMS AND CONDITIONS
How to Apply These Terms to Your New Programs
If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it
free software which everyone can redistribute and change under these terms.
To do so, attach the following notices to the program. It is safest
to attach them to the start of each source file to most effectively
convey the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.
<one line to give the program's name and a brief idea of what it does.>
Copyright (C) <year> <name of author>
NAME
ALBD Installation Guide
TESTING PLATFORMS
ALBD has been developed and tested on Linux primarily using Perl.
SYNOPSIS
perl Makefile.PL
make
make test
make install
DESCRIPTION
ALBD provides a system for performing ABC co-occurrence literature based
discovery using a variety of options, and association-based ranking
methods
REQUIREMENTS
ALBD REQUIRES that the following software packages and data:
The package is freely available at:
<http://search.cpan.org/dist/UMLS-Association/>
Stage 3: Install ALBD package
The usual way to install the package is to run the following commands:
perl Makefile.PL
make
make test
make install
You will often need root access/superuser privileges to run make
install. The module can also be installed locally. To do a local
install, you need to specify a PREFIX option when you run 'perl
Makefile.PL'. For example,
perl Makefile.PL PREFIX=/home
or
Of course, you could also add the 'use lib' line to the top of the
program yourself, but you might not want to do that. You will need to
replace 5.8.3 with whatever version of Perl you are using. The preceding
instructions should be sufficient for standard and slightly non-standard
installations. However, if you need to modify other makefile options you
should look at the ExtUtils::MakeMaker documentation. Modifying other
makefile options is not recommended unless you really, absolutely, and
completely know what you're doing!
NOTE: If one (or more) of the tests run by 'make test' fails, you will
see a summary of the tests that failed, followed by a message of the
form "make: *** [test_dynamic] Error Y" where Y is a number between 1
and 255 (inclusive). If the number is less than 255, then it indicates
how many test failed (if more than 254 tests failed, then 254 will still
be shown). If one or more tests died, then 255 will be shown. For more
details, see:
<http://search.cpan.org/dist/Test-Simple/lib/Test/Builder.pm#EXIT_CODES>
Stage 4: Create an co-occurrence matrix
ALBD requires that a co-occurrence matrix of CUIs has been created. This
matrix is stored as a flat file, in a sparse matrix format such that
each line contains three tab seperated values, cui_1, cui_2, n_11 = the
count of their co-occurrences. Any matrix with that format is
acceptable, however the intended method of matrix generation is to
samples/lbdConfig
samples/postCutoffMatrix
samples/runSample.pl
samples/sampleExplicitMatrix
samples/sampleGoldMatrix
samples/timeSliceCuiList
samples/timeSlicingConfig
samples/configFileSamples/UMLSAssociationConfig
samples/configFileSamples/UMLSInterfaceConfig
samples/configFileSamples/UMLSInterfaceInternalConfig
t/test.t
t/goldSampleOutput
t/goldSampleTimeSliceOutput
utils/runDiscovery.pl
utils/datasetCreator/applyMaxThreshold.pl
utils/datasetCreator/applyMinThreshold.pl
utils/datasetCreator/applySemanticFilter.pl
utils/datasetCreator/combineCooccurrenceMatrices.pl
utils/datasetCreator/makeOrderNotMatter.pl
utils/datasetCreator/removeCUIPair.pl
utils/datasetCreator/removeExplicit.pl
utils/datasetCreator/testMatrixEquality.pl
utils/datasetCreator/dataStats/getCUICooccurrences.pl
utils/datasetCreator/dataStats/getMatrixStats.pl
utils/datasetCreator/dataStats/metaAnalysis.pl
utils/datasetCreator/fromMySQL/dbToTab.pl
utils/datasetCreator/fromMySQL/removeQuotes.pl
utils/datasetCreator/squaring/convertForSquaring_MATLAB.pl
utils/datasetCreator/squaring/squareMatrix.m
utils/datasetCreator/squaring/squareMatrix_partial.m
utils/datasetCreator/squaring/squareMatrix_perl.pl
META.yml Module YAML meta-data (added by MakeMaker)
The following sections describe the organization of this software
package and how to use it. A few typical examples are given to help
clearly understand the usage of the modules and the supporting
utilities.
INSTALL
To install the module, run the following magic commands:
perl Makefile.PL
make
make test
make install
This will install the module in the standard location. You will, most
probably, require root privileges to install in standard system
directories. To install in a non-standard directory, specify a prefix
during the 'perl Makefile.PL' stage as:
perl Makefile.PL PREFIX=/home/programs
It is possible to modify other parameters during installation. The
removeCUIPair.pl -- removes all occurrences of the specified CUI pair
from the co-occurrence matrix
removeExplicit.pl -- removes any keys that occur in an explicit
co-occurrence matrix from another co-occurrence matrix (typically the
squared explicit co-occurrence matrix itself, which generates a
prediction matrix, or the post cutoff matrix used in time slicing to
generate a gold standard file)
testMatrixEquality.pl -- checks to see if two co-occurrence matrix files
contain the same data
Also included are several subfolders with more specific purposes. Within
the dataStats subfolder are scripts to collect various statistics about
the co-occurrence matrices used in LBD. These scriptsinclude:
getCUICooccurrences.pl -- a data statistics file that gets the number of
co-occurrences, and number of unique co-occurrences for every CUI in the
dataset
lib/ALBD.pm view on Meta::CPAN
association measure are available as well as frequency based
ranking methods. See samples/lbd for more info. Can perform open and
closed LBD as well as time slicing evaluation.
=head1 INSTALL
To install the module, run the following magic commands:
perl Makefile.PL
make
make test
make install
This will install the module in the standard location. You will, most
probably, require root privileges to install in standard system
directories. To install in a non-standard directory, specify a prefix
during the 'perl Makefile.PL' stage as:
perl Makefile.PL PREFIX=/home/sid
It is possible to modify other parameters during installation. The
lib/ALBD.pm view on Meta::CPAN
#Done
print "DONE!\n\n";
}
=cut
##################################################
################ Time Slicing ####################
##################################################
#NOTE: This function isn't really tested, and is really slow right now
# Generates precision and recall values by varying the threshold
# of the A->B ranking measure.
# input: none
# output: none, but precision and recall values are printed to STDOUT
sub timeSlicing_generatePrecisionAndRecall_explicit {
my $NUM_SAMPLES = 100; #TODO, read fomr file number of samples to average over for timeslicing
my $self = shift;
print "In timeSlicing_generatePrecisionAndRecall\n";
my $numIntervals = 10;
lib/LiteratureBasedDiscovery/Rank.pm view on Meta::CPAN
foreach my $cuiPair (sort {$tiedAMWScores{$b} <=> $tiedAMWScores{$a}} keys %tiedAMWScores) {
$ltcAMWScores{$cuiPair} = $currentRank;
$currentRank--;
}
}
#return the scores
return \%ltcAMWScores;
}
#TODO this is an untested method
# gets the max cosine distance score between all a terms and each cTerm
# input: $startingMatrixRef <- ref to the starting matrix
# $explicitMatrixRef <- ref to the explicit matrix
# $implicitMatrixRef <- ref to the implicit matrix
# output: a hash ref of scores for each implicit key. (hash{cui} = score)
sub score_cosineDistance {
#LBD Info
my $startingMatrixRef = shift;
my $explicitMatrixRef = shift;
my $implicitMatrixRef = shift;
lib/LiteratureBasedDiscovery/TimeSlicing.pm view on Meta::CPAN
# predictions are cuis are ordered in descending order
# based on association. (from Rank::RankDescending)
# $numIntervals <- the number of recall intervals to generate
# output: (\%precision, \%recall) <- refs to hashes of precision and recall.
# Each hash key is the interval number, and
# the value is the precision and recall
# respectively
sub calculatePrecisionAndRecall_implicit {
my $trueMatrixRef = shift; #a ref to the true matrix
my $rowRanksRef = shift; #a ref to ranked predictions, each hash element are the predictions for a single cui, at each element is an array of cuis ordered by their rank
my $numIntervals = shift; #the recall intervals to test at
#find precision and recall curves for each cui that is being predicted
# take the sum of precisions, then average after the loop
my %precision = ();
my %recall = ();
foreach my $rowKey (keys %{$trueMatrixRef}) {
my $trueRef = ${$trueMatrixRef}{$rowKey}; #a list of true discoveries
my $rankedPredictionsRef = ${$rowRanksRef}{$rowKey}; #an array ref of ranked predictions
#get the number of predicted discoveries and true discoveries
lib/LiteratureBasedDiscovery/TimeSlicing.pm view on Meta::CPAN
if ($numTrue == 0) {
next;
}
#skip if there are NO predictions for this start term
if ($numPredictions == 0) {
next;
}
#determine precision and recall at 10% intervals of the number of
#predicted true vaules. This is done by simulating a threshold being
#applied, so the top $numToTest ranked terms are tested at 10% intervals
my $interval = $numPredictions/$numIntervals;
for (my $i = 0; $i <= 1; $i+=(1/$numIntervals)) {
#determine the number true to grab
my $numTrueForInterval = 1; #at $i = 0, grab just the first term that is true
if ($i > 0) {
$numTrueForInterval = $numTrue*$i;
}
#grab true discoveries until the recall rate is exceeded
lib/LiteratureBasedDiscovery/TimeSlicing.pm view on Meta::CPAN
#sum precision at this interval, average over number of rows is
# taken outside of the loop
$precision{$i} += ($truePositive / $numChecked); #number that are selected that are true
$recall{$i} += ($truePositive / $numTrue); #number of true that are selected
}
}
#calculate the average precision at each interval
foreach my $i (keys %precision) {
#divide by the number of rows in the true matrix ref
# because those are the number of cuis we are testing
# it is possible that the predictions has rows that are
# not in the true, and those should be ignored.
$precision{$i} /= (scalar keys %{$trueMatrixRef});
$recall{$i} /= (scalar keys %{$trueMatrixRef});
}
#return the precision and recall at 10% intervals
return (\%precision, \%recall);
}
lib/LiteratureBasedDiscovery/TimeSlicing.pm view on Meta::CPAN
my $trueMatrixRef = shift; # a matrix of true discoveries
my $rowRanksRef = shift; # a hash of ranked predicted discoveries
#generate mean cooccurrences at k at intervals of 10 for k = 10-100
my %meanCooccurrenceCount = (); #count of the number of co-occurrences for each k
my $interval = 1;
for (my $k = 1; $k <= 100; $k+=$interval) {
$meanCooccurrenceCount{$k} = 0;
#average the mean co-occurrenes over all terms
# the true matrix contains only rows for the cuis being tested
# or in time slicing
foreach my $rowKey (keys %{$trueMatrixRef}) {
my $rankedPredictionsRef = ${$rowRanksRef}{$rowKey}; #an array ref of ranked predictions
#skip for rows that have no predictions
if (!defined $rankedPredictionsRef) {
next;
}
my $trueRef = ${$trueMatrixRef}{$rowKey}; #a list of true discoveries
samples/lbdConfig view on Meta::CPAN
#
# *all procedures require a measure to be specified except LTC and freq
<rankingProcedure>averageMinimumWeight
# The association measure to use as a value in the ranking procedure.
# The string is passed directly to UMLS::Association, so as that gets
# updated, new associatio measures will work automatically.
# At the time of this writing, valid arguments are:
# freq - Frequency
# dice - Dice Coefficient
# left - Fishers exact test - left sided
# right - Fishers exact test - right sided
# twotailed - Fishers twotailed test
# jaccard - Jaccard Coefficient
# ll - Log-likelihood ratio
# tmi - Mutual Information
# odds - Odds Ratio
# pmi - Pointwise Mutual Information
# phi - Phi Coefficient
# chi - Pearson's Chi Squared Test
# ps - Poisson Stirling Measure
# tscore - T-score
<rankingMeasure>ll
samples/timeSlicingConfig view on Meta::CPAN
#
# *all procedures require a measure to be specified except LTC and freq
<rankingProcedure>averageMinimumWeight
# The association measure to use as a value in the ranking procedure.
# The string is passed directly to UMLS::Association, so as that gets
# updated, new associatio measures will work automatically.
# At the time of this writing, valid arguments are:
# freq - Frequency
# dice - Dice Coefficient
# left - Fishers exact test - left sided
# right - Fishers exact test - right sided
# twotailed - Fishers twotailed test
# jaccard - Jaccard Coefficient
# ll - Log-likelihood ratio
# tmi - Mutual Information
# odds - Odds Ratio
# pmi - Pointwise Mutual Information
# phi - Phi Coefficient
# chi - Pearson's Chi Squared Test
# ps - Poisson Stirling Measure
# tscore - T-score
<rankingMeasure>ll
#!/usr/local/bin/perl -w
# Before `make install' is performed this script should be runnable with
# `make test'. After `make install' it should work as `perl t/lch.t'
use strict;
use warnings;
use Test::Simple tests => 10;
#error tolerance for exact numerical matches due to precision issues
# and sort issues (again due to precision) there may be small
# differences between runs. The precision at K difference is
# larger due to small differences in ranking making big differences
# in scores when the K < 10. See Rank::rankDescending for more
# details as to why the ranking imprecision occurrs
my $precRecallErrorTol = 0.0001;
my $atKErrorTol = 1.0;
#######################################################
# test script to run the sample code and compare its
# output to the expected output. This tests both the
# open and closed discovery code portions
#########################################################
#Test that the demo file can run correctly
`(cd ./samples/; perl runSample.pl) &`;
#######################################################
#test that the demo output matches the expected demo output
#########################################################
print "Performing Open Discovery Tests:\n";
#read in the gold scores from the open discovery gold
my %goldScores = ();
open IN, './t/goldSampleOutput'
or die ("Error: Cannot open gold sample output\n");
while (my $line = <IN>) {
if ($line =~ /\d+\t(\d+\.\d+)\t(C\d+)/) {
$goldScores{$2} = $1;
}
}
ok ($allExist == 1, "All CUIs exist in the output"); #all cuis exist in the new output file
ok ($allMatch == 1, "All Scores are the same in the output"); #all scores are the same in the new output file
print "Done with Open Discovery Tests\n\n";
#######################################################
#test that time slicing is computed correctly
#########################################################
print "Performing Time Slicing Tests\n";
#read in gold time slicing output
(my $goldAPScoresRef, my $goldMAP, my $goldPAtKScoresRef, my $goldFAtKScoresRef)
= &readTimeSlicingData('./t/goldSampleTimeSliceOutput');
#read in new time slicing output
(my $newAPScoresRef, my $newMAP, my $newPAtKScoresRef, my $newFAtKScoresRef)
= &readTimeSlicingData('./samples/sampleTimeSliceOutput');
utils/datasetCreator/fromMySQL/removeQuotes.pl view on Meta::CPAN
#renoves quotes from a db to tab file
my $inFile = '1980_1984_window1_retest_data.txt';
my $outFile = '1980_1984_window1_restest_DELETEME';
open IN, $inFile or die ("unable to open inFile: $inFile\n");
open OUT, '>'.$outFile or die ("unable to open outFile: $outFile\n");
while (my $line = <IN>) {
$line =~ s/"//g;
#print $line;
print OUT $line;
}
utils/runDiscovery.pl view on Meta::CPAN
."\nOPTIONS\n"
." --assocConfig path to the UMLS::Association Config File\n"
." --interfaceConfig path to the UMLS::Interface Config File\n"
."\nUSAGE EXAMPLES\n"
." runDiscovery lbdConfigFile\n";
;
#############################################################################
# Parse command line options
#############################################################################
my $DEBUG = 0; # Prints EVERYTHING. Use with small testing files.
my $HELP = ''; # Prints usage and exits if true.
my $VERSION;
#set default param values
my %options = ();
$options{'assocConfig'} = '';
$options{'interfaceConfig'} = '';
#grab all the options and set values
GetOptions( 'debug' => \$DEBUG,