ALBD

 view release on metacpan or  search on metacpan

GPL.txt  view on Meta::CPAN

114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
These requirements apply to the modified work as a whole.  If
identifiable sections of that work are not derived from the Program,
and can be reasonably considered independent and separate works in
themselves, then this License, and its terms, do not apply to those
sections when you distribute them as separate works.  But when you
distribute the same sections as part of a whole which is a work based
on the Program, the distribution of the whole must be on the terms of
this License, whose permissions for other licensees extend to the
entire whole, and thus to each and every part regardless of who wrote it.
 
Thus, it is not the intent of this section to claim rights or contest
your rights to work written entirely by you; rather, the intent is to
exercise the right to control the distribution of derivative or
collective works based on the Program.
 
In addition, mere aggregation of another work not based on the Program
with the Program (or with a work based on the Program) on a volume of
a storage or distribution medium does not bring the other work under
the scope of this License.
 
  3. You may copy and distribute the Program (or a work based on it,

GPL.txt  view on Meta::CPAN

206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
all those who receive copies directly or indirectly through you, then
the only way you could satisfy both it and this License would be to
refrain entirely from distribution of the Program.
 
If any portion of this section is held invalid or unenforceable under
any particular circumstance, the balance of the section is intended to
apply and the section as a whole is intended to apply in other
circumstances.
 
It is not the purpose of this section to induce you to infringe any
patents or other property right claims or to contest validity of any
such claims; this section has the sole purpose of protecting the
integrity of the free software distribution system, which is
implemented by public license practices.  Many people have made
generous contributions to the wide range of software distributed
through that system in reliance on consistent application of that
system; it is up to the author/donor to decide if he or she is willing
to distribute software through any other system and a licensee cannot
impose that choice.
 
This section is intended to make thoroughly clear what is believed to

GPL.txt  view on Meta::CPAN

274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.
 
                     END OF TERMS AND CONDITIONS
            How to Apply These Terms to Your New Programs
 
  If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it
free software which everyone can redistribute and change under these terms.
 
  To do so, attach the following notices to the program.  It is safest
to attach them to the start of each source file to most effectively
convey the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.
 
    <one line to give the program's name and a brief idea of what it does.>
    Copyright (C) <year>  <name of author>

INSTALL  view on Meta::CPAN

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
NAME
    ALBD Installation Guide
 
TESTING PLATFORMS
    ALBD has been developed and tested on Linux primarily using Perl.
 
SYNOPSIS
     perl Makefile.PL
 
     make
 
     make test
 
     make install
 
DESCRIPTION
    ALBD provides a system for performing ABC co-occurrence literature based
    discovery using a variety of options, and association-based ranking
    methods
 
REQUIREMENTS
    ALBD REQUIRES that the following software packages and data:

INSTALL  view on Meta::CPAN

66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
  The package is freely available at:
 
 
Stage 3: Install ALBD package
  The usual way to install the package is to run the following commands:
 
      perl Makefile.PL
      make
      make test
      make install
 
  You will often need root access/superuser privileges to run make
  install. The module can also be installed locally. To do a local
  install, you need to specify a PREFIX option when you run 'perl
  Makefile.PL'. For example,
 
      perl Makefile.PL PREFIX=/home
 
      or

INSTALL  view on Meta::CPAN

117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
  Of course, you could also add the 'use lib' line to the top of the
  program yourself, but you might not want to do that. You will need to
  replace 5.8.3 with whatever version of Perl you are using. The preceding
  instructions should be sufficient for standard and slightly non-standard
  installations. However, if you need to modify other makefile options you
  should look at the ExtUtils::MakeMaker documentation. Modifying other
  makefile options is not recommended unless you really, absolutely, and
  completely know what you're doing!
 
  NOTE: If one (or more) of the tests run by 'make test' fails, you will
  see a summary of the tests that failed, followed by a message of the
  form "make: *** [test_dynamic] Error Y" where Y is a number between 1
  and 255 (inclusive). If the number is less than 255, then it indicates
  how many test failed (if more than 254 tests failed, then 254 will still
  be shown). If one or more tests died, then 255 will be shown. For more
  details, see:
 
 
Stage 4: Create an co-occurrence matrix
  ALBD requires that a co-occurrence matrix of CUIs has been created. This
  matrix is stored as a flat file, in a sparse matrix format such that
  each line contains three tab seperated values, cui_1, cui_2, n_11 = the
  count of their co-occurrences. Any matrix with that format is
  acceptable, however the intended method of matrix generation is to

MANIFEST  view on Meta::CPAN

20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
samples/lbdConfig
samples/postCutoffMatrix
samples/runSample.pl
samples/sampleExplicitMatrix
samples/sampleGoldMatrix
samples/timeSliceCuiList
samples/timeSlicingConfig
samples/configFileSamples/UMLSAssociationConfig
samples/configFileSamples/UMLSInterfaceConfig
samples/configFileSamples/UMLSInterfaceInternalConfig
t/test.t
t/goldSampleOutput
t/goldSampleTimeSliceOutput
utils/runDiscovery.pl
utils/datasetCreator/applyMaxThreshold.pl
utils/datasetCreator/applyMinThreshold.pl
utils/datasetCreator/applySemanticFilter.pl
utils/datasetCreator/combineCooccurrenceMatrices.pl
utils/datasetCreator/makeOrderNotMatter.pl
utils/datasetCreator/removeCUIPair.pl
utils/datasetCreator/removeExplicit.pl
utils/datasetCreator/testMatrixEquality.pl
utils/datasetCreator/dataStats/getCUICooccurrences.pl
utils/datasetCreator/dataStats/getMatrixStats.pl
utils/datasetCreator/dataStats/metaAnalysis.pl
utils/datasetCreator/fromMySQL/dbToTab.pl
utils/datasetCreator/fromMySQL/removeQuotes.pl
utils/datasetCreator/squaring/convertForSquaring_MATLAB.pl
utils/datasetCreator/squaring/squareMatrix.m
utils/datasetCreator/squaring/squareMatrix_partial.m
utils/datasetCreator/squaring/squareMatrix_perl.pl
META.yml                                 Module YAML meta-data (added by MakeMaker)

README  view on Meta::CPAN

23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
      The following sections describe the organization of this software
      package and how to use it. A few typical examples are given to help
      clearly understand the usage of the modules and the supporting
      utilities.
 
INSTALL
      To install the module, run the following magic commands:
 
        perl Makefile.PL
        make
        make test
        make install
 
      This will install the module in the standard location. You will, most
      probably, require root privileges to install in standard system
      directories. To install in a non-standard directory, specify a prefix
      during the 'perl Makefile.PL' stage as:
 
        perl Makefile.PL PREFIX=/home/programs
 
      It is possible to modify other parameters during installation. The

README  view on Meta::CPAN

141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
removeCUIPair.pl -- removes all occurrences of the specified CUI pair
from the co-occurrence matrix
 
removeExplicit.pl -- removes any keys that occur in an explicit
co-occurrence matrix from another co-occurrence matrix (typically the
squared explicit co-occurrence matrix itself, which generates a
prediction matrix, or the post cutoff matrix used in time slicing to
generate a gold standard file)
 
testMatrixEquality.pl -- checks to see if two co-occurrence matrix files
contain the same data
 
Also included are several subfolders with more specific purposes. Within
the dataStats subfolder are scripts to collect various statistics about
the co-occurrence matrices used in LBD. These scriptsinclude:
 
getCUICooccurrences.pl -- a data statistics file that gets the number of
co-occurrences, and number of unique co-occurrences for every CUI in the
dataset

lib/ALBD.pm  view on Meta::CPAN

50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
      association measure are available as well as frequency based
      ranking methods. See samples/lbd for more info. Can perform open and
      closed LBD as well as time slicing evaluation.
 
=head1 INSTALL
 
To install the module, run the following magic commands:
 
  perl Makefile.PL
  make
  make test
  make install
 
This will install the module in the standard location. You will, most
probably, require root privileges to install in standard system
directories. To install in a non-standard directory, specify a prefix
during the 'perl Makefile.PL' stage as:
 
  perl Makefile.PL PREFIX=/home/sid
 
It is possible to modify other parameters during installation. The

lib/ALBD.pm  view on Meta::CPAN

462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
#Done
    print "DONE!\n\n";
}
=cut
 
 
##################################################
################ Time Slicing ####################
##################################################
 
#NOTE: This function isn't really tested, and is really slow right now
# Generates precision and recall values by varying the threshold
# of the A->B ranking measure.
# input:  none
# output: none, but precision and recall values are printed to STDOUT
sub timeSlicing_generatePrecisionAndRecall_explicit {
    my $NUM_SAMPLES = 100; #TODO, read fomr file number of samples to average over for timeslicing
    my $self = shift;
    print "In timeSlicing_generatePrecisionAndRecall\n";
 
    my $numIntervals = 10;

lib/LiteratureBasedDiscovery/Rank.pm  view on Meta::CPAN

282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
        foreach my $cuiPair (sort {$tiedAMWScores{$b} <=> $tiedAMWScores{$a}} keys %tiedAMWScores) {
            $ltcAMWScores{$cuiPair} = $currentRank;
            $currentRank--;
        }
    }
 
    #return the scores
    return \%ltcAMWScores;
}
 
#TODO this is an untested method
# gets the max cosine distance score between all a terms and each cTerm
# input:  $startingMatrixRef <- ref to the starting matrix
#         $explicitMatrixRef <- ref to the explicit matrix
#         $implicitMatrixRef <- ref to the implicit matrix
# output: a hash ref of scores for each implicit key. (hash{cui} = score)
sub score_cosineDistance {
    #LBD Info
    my $startingMatrixRef = shift;
    my $explicitMatrixRef = shift;
    my $implicitMatrixRef = shift;

lib/LiteratureBasedDiscovery/TimeSlicing.pm  view on Meta::CPAN

511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
#                         predictions are cuis are ordered in descending order
#                         based on association. (from Rank::RankDescending)
#         $numIntervals <- the number of recall intervals to generate
# output: (\%precision, \%recall) <- refs to hashes of precision and recall.
#                                    Each hash key is the interval number, and
#                                    the value is the precision and recall
#                                    respectively
sub calculatePrecisionAndRecall_implicit {
    my $trueMatrixRef = shift; #a ref to the true matrix
    my $rowRanksRef = shift; #a ref to ranked predictions, each hash element are the predictions for a single cui, at each element is an array of cuis ordered by their rank
    my $numIntervals = shift; #the recall intervals to test at
 
    #find precision and recall curves for each cui that is being predicted
    #  take the sum of precisions, then average after the loop
    my %precision = ();
    my %recall = ();
    foreach my $rowKey (keys %{$trueMatrixRef}) {
        my $trueRef = ${$trueMatrixRef}{$rowKey}; #a list of true discoveries
        my $rankedPredictionsRef = ${$rowRanksRef}{$rowKey}; #an array ref of ranked predictions
         
        #get the number of predicted discoveries and true discoveries

lib/LiteratureBasedDiscovery/TimeSlicing.pm  view on Meta::CPAN

536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
if ($numTrue == 0) {
    next;
}
#skip if there are NO predictions for this start term
if ($numPredictions == 0) {
    next;
}
 
#determine precision and recall at 10% intervals of the number of
#predicted true vaules. This is done by simulating a threshold being
#applied, so the top $numToTest ranked terms are tested at 10% intervals
my $interval = $numPredictions/$numIntervals;
for (my $i = 0; $i <= 1; $i+=(1/$numIntervals)) {
     
    #determine the number true to grab
    my $numTrueForInterval = 1; #at $i = 0, grab just the first term that is true
    if ($i > 0) {
        $numTrueForInterval = $numTrue*$i;
    }
 
    #grab true discoveries until the recall rate is exceeded

lib/LiteratureBasedDiscovery/TimeSlicing.pm  view on Meta::CPAN

573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
            #sum precision at this interval, average over number of rows is
            # taken outside of the loop
            $precision{$i} += ($truePositive / $numChecked); #number that are selected that are true
            $recall{$i} += ($truePositive / $numTrue); #number of true that are selected       
        }
    }
 
    #calculate the average precision at each interval
    foreach my $i (keys %precision) {
        #divide by the number of rows in the true matrix ref
        # because those are the number of cuis we are testing
        # it is possible that the predictions has rows that are
        # not in the true, and those should be ignored.
        $precision{$i} /= (scalar keys %{$trueMatrixRef});
        $recall{$i} /= (scalar keys %{$trueMatrixRef});
    }
 
    #return the precision and recall at 10% intervals
    return (\%precision, \%recall);
}

lib/LiteratureBasedDiscovery/TimeSlicing.pm  view on Meta::CPAN

729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
my $trueMatrixRef = shift; # a matrix of true discoveries
my $rowRanksRef = shift; # a hash of ranked predicted discoveries
 
#generate mean cooccurrences at k at intervals of 10 for k = 10-100
my %meanCooccurrenceCount = (); #count of the number of co-occurrences for each k
my $interval = 1;
for (my $k = 1; $k <= 100; $k+=$interval) {
    $meanCooccurrenceCount{$k} = 0;
 
    #average the mean co-occurrenes over all terms
    #  the true matrix contains only rows for the cuis being tested
    #  or in time slicing
    foreach my $rowKey (keys %{$trueMatrixRef}) {
        my $rankedPredictionsRef = ${$rowRanksRef}{$rowKey}; #an array ref of ranked predictions
         
        #skip for rows that have no predictions
        if (!defined $rankedPredictionsRef) {
            next;
        }
        my $trueRef = ${$trueMatrixRef}{$rowKey}; #a list of true discoveries

samples/lbdConfig  view on Meta::CPAN

22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
#
#   *all procedures require a measure to be specified except LTC and freq
<rankingProcedure>averageMinimumWeight
 
# The association measure to use as a value in the ranking procedure.
# The string is passed directly to UMLS::Association, so as that gets
# updated, new associatio measures will work automatically.
# At the time of this writing, valid arguments are:
# freq - Frequency
# dice - Dice Coefficient
# left - Fishers exact test - left sided
# right - Fishers exact test - right sided
# twotailed - Fishers twotailed test
# jaccard - Jaccard Coefficient
# ll - Log-likelihood ratio
# tmi - Mutual Information
# odds - Odds Ratio
# pmi - Pointwise Mutual Information
# phi - Phi Coefficient
# chi - Pearson's Chi Squared Test
# ps - Poisson Stirling Measure
# tscore - T-score
<rankingMeasure>ll

samples/timeSlicingConfig  view on Meta::CPAN

40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
#
#   *all procedures require a measure to be specified except LTC and freq
<rankingProcedure>averageMinimumWeight
 
# The association measure to use as a value in the ranking procedure.
# The string is passed directly to UMLS::Association, so as that gets
# updated, new associatio measures will work automatically.
# At the time of this writing, valid arguments are:
# freq - Frequency
# dice - Dice Coefficient
# left - Fishers exact test - left sided
# right - Fishers exact test - right sided
# twotailed - Fishers twotailed test
# jaccard - Jaccard Coefficient
# ll - Log-likelihood ratio
# tmi - Mutual Information
# odds - Odds Ratio
# pmi - Pointwise Mutual Information
# phi - Phi Coefficient
# chi - Pearson's Chi Squared Test
# ps - Poisson Stirling Measure
# tscore - T-score
<rankingMeasure>ll

t/test.t  view on Meta::CPAN

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#!/usr/local/bin/perl -w
 
# Before `make install' is performed this script should be runnable with
# `make test'. After `make install' it should work as `perl t/lch.t'
 
use strict;
use Test::Simple tests => 10;
 
#error tolerance for exact numerical matches due to precision issues
# and sort issues (again due to precision) there may be small
# differences between runs. The precision at K difference is
# larger due to small differences in ranking making big differences
# in scores when the K < 10. See Rank::rankDescending for more
# details as to why the ranking imprecision occurrs
my $precRecallErrorTol = 0.0001;
my $atKErrorTol = 1.0;
 
#######################################################
# test script to run the sample code and compare its
# output to the expected output. This tests both the
# open and closed discovery code portions
#########################################################
 
 
#Test that the demo file can run correctly
`(cd ./samples/; perl runSample.pl) &`;
 
#######################################################
#test that the demo output matches the expected demo output
#########################################################
print "Performing Open Discovery Tests:\n";
 
#read in the gold scores from the open discovery gold
my %goldScores = ();
open IN, './t/goldSampleOutput'
    or die ("Error: Cannot open gold sample output\n");
while (my $line = <IN>) {
    if ($line =~ /\d+\t(\d+\.\d+)\t(C\d+)/) {
        $goldScores{$2} = $1;

t/test.t  view on Meta::CPAN

73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
    }
}
ok ($allExist == 1, "All CUIs exist in the output");  #all cuis exist in the new output file
ok ($allMatch == 1, "All Scores are the same in the output");  #all scores are the same in the new output file
 
print "Done with Open Discovery Tests\n\n";
 
 
 
#######################################################
#test that time slicing is computed correctly
#########################################################
print "Performing Time Slicing Tests\n";
 
#read in gold time slicing output
(my $goldAPScoresRef, my $goldMAP, my $goldPAtKScoresRef, my $goldFAtKScoresRef)
    = &readTimeSlicingData('./t/goldSampleTimeSliceOutput');
 
#read in new time slicing output
(my $newAPScoresRef, my $newMAP, my $newPAtKScoresRef, my $newFAtKScoresRef)
    = &readTimeSlicingData('./samples/sampleTimeSliceOutput');

utils/datasetCreator/fromMySQL/removeQuotes.pl  view on Meta::CPAN

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#renoves quotes from a db to tab file
 
my $inFile = '1980_1984_window1_retest_data.txt';
my $outFile = '1980_1984_window1_restest_DELETEME';
 
 
open IN, $inFile or die ("unable to open inFile: $inFile\n");
open OUT, '>'.$outFile or die ("unable to open outFile: $outFile\n");
 
while (my $line  = <IN>) {
    $line =~ s/"//g;
    #print $line;
    print OUT $line;
}

utils/runDiscovery.pl  view on Meta::CPAN

128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
."\nOPTIONS\n"
."   --assocConfig        path to the UMLS::Association Config File\n"
."   --interfaceConfig    path to the UMLS::Interface Config File\n"
."\nUSAGE EXAMPLES\n"
."   runDiscovery lbdConfigFile\n";
;
 
#############################################################################
#                       Parse command line options
#############################################################################
my $DEBUG = 0;      # Prints EVERYTHING. Use with small testing files.       
my $HELP = '';      # Prints usage and exits if true.
my $VERSION;
 
#set default param values
my %options = ();
$options{'assocConfig'}  = '';
$options{'interfaceConfig'} = '';
 
#grab all the options and set values
GetOptions( 'debug'             => \$DEBUG,



( run in 0.318 second using v1.01-cache-2.11-cpan-0d8aa00de5b )