test results from the CPAN

test

ALBD

view release on metacpan or search on metacpan

These requirements apply to the modified work as a whole.  If
identifiable sections of that work are not derived from the Program,
and can be reasonably considered independent and separate works in
themselves, then this License, and its terms, do not apply to those
sections when you distribute them as separate works.  But when you
distribute the same sections as part of a whole which is a work based
on the Program, the distribution of the whole must be on the terms of
this License, whose permissions for other licensees extend to the
entire whole, and thus to each and every part regardless of who wrote it.

Thus, it is not the intent of this section to claim rights or contest
your rights to work written entirely by you; rather, the intent is to
exercise the right to control the distribution of derivative or
collective works based on the Program.

In addition, mere aggregation of another work not based on the Program
with the Program (or with a work based on the Program) on a volume of
a storage or distribution medium does not bring the other work under
the scope of this License.

  3. You may copy and distribute the Program (or a work based on it,

GPL.txt view on Meta::CPAN

all those who receive copies directly or indirectly through you, then
the only way you could satisfy both it and this License would be to
refrain entirely from distribution of the Program.

If any portion of this section is held invalid or unenforceable under
any particular circumstance, the balance of the section is intended to
apply and the section as a whole is intended to apply in other
circumstances.

It is not the purpose of this section to induce you to infringe any
patents or other property right claims or to contest validity of any
such claims; this section has the sole purpose of protecting the
integrity of the free software distribution system, which is
implemented by public license practices.  Many people have made
generous contributions to the wide range of software distributed
through that system in reliance on consistent application of that
system; it is up to the author/donor to decide if he or she is willing
to distribute software through any other system and a licensee cannot
impose that choice.

This section is intended to make thoroughly clear what is believed to

GPL.txt view on Meta::CPAN

OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.

		     END OF TERMS AND CONDITIONS

	    How to Apply These Terms to Your New Programs

  If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it
free software which everyone can redistribute and change under these terms.

  To do so, attach the following notices to the program.  It is safest
to attach them to the start of each source file to most effectively
convey the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.

    <one line to give the program's name and a brief idea of what it does.>
    Copyright (C) <year>  <name of author>

INSTALL view on Meta::CPAN

NAME
    ALBD Installation Guide

TESTING PLATFORMS
    ALBD has been developed and tested on Linux primarily using Perl.

SYNOPSIS
     perl Makefile.PL

     make

     make test

     make install

DESCRIPTION
    ALBD provides a system for performing ABC co-occurrence literature based
    discovery using a variety of options, and association-based ranking
    methods

REQUIREMENTS
    ALBD REQUIRES that the following software packages and data:

INSTALL view on Meta::CPAN


    The package is freely available at:

        <http://search.cpan.org/dist/UMLS-Association/>

  Stage 3: Install ALBD package
    The usual way to install the package is to run the following commands:

        perl Makefile.PL
        make
        make test
        make install

    You will often need root access/superuser privileges to run make
    install. The module can also be installed locally. To do a local
    install, you need to specify a PREFIX option when you run 'perl
    Makefile.PL'. For example,

        perl Makefile.PL PREFIX=/home

        or

INSTALL view on Meta::CPAN


    Of course, you could also add the 'use lib' line to the top of the
    program yourself, but you might not want to do that. You will need to
    replace 5.8.3 with whatever version of Perl you are using. The preceding
    instructions should be sufficient for standard and slightly non-standard
    installations. However, if you need to modify other makefile options you
    should look at the ExtUtils::MakeMaker documentation. Modifying other
    makefile options is not recommended unless you really, absolutely, and
    completely know what you're doing!

    NOTE: If one (or more) of the tests run by 'make test' fails, you will
    see a summary of the tests that failed, followed by a message of the
    form "make: *** [test_dynamic] Error Y" where Y is a number between 1
    and 255 (inclusive). If the number is less than 255, then it indicates
    how many test failed (if more than 254 tests failed, then 254 will still
    be shown). If one or more tests died, then 255 will be shown. For more
    details, see:

        <http://search.cpan.org/dist/Test-Simple/lib/Test/Builder.pm#EXIT_CODES>

  Stage 4: Create an co-occurrence matrix
    ALBD requires that a co-occurrence matrix of CUIs has been created. This
    matrix is stored as a flat file, in a sparse matrix format such that
    each line contains three tab seperated values, cui_1, cui_2, n_11 = the
    count of their co-occurrences. Any matrix with that format is
    acceptable, however the intended method of matrix generation is to

MANIFEST view on Meta::CPAN

samples/lbdConfig
samples/postCutoffMatrix
samples/runSample.pl
samples/sampleExplicitMatrix
samples/sampleGoldMatrix
samples/timeSliceCuiList
samples/timeSlicingConfig
samples/configFileSamples/UMLSAssociationConfig
samples/configFileSamples/UMLSInterfaceConfig
samples/configFileSamples/UMLSInterfaceInternalConfig
t/test.t
t/goldSampleOutput
t/goldSampleTimeSliceOutput
utils/runDiscovery.pl
utils/datasetCreator/applyMaxThreshold.pl
utils/datasetCreator/applyMinThreshold.pl
utils/datasetCreator/applySemanticFilter.pl
utils/datasetCreator/combineCooccurrenceMatrices.pl
utils/datasetCreator/makeOrderNotMatter.pl
utils/datasetCreator/removeCUIPair.pl
utils/datasetCreator/removeExplicit.pl
utils/datasetCreator/testMatrixEquality.pl
utils/datasetCreator/dataStats/getCUICooccurrences.pl
utils/datasetCreator/dataStats/getMatrixStats.pl
utils/datasetCreator/dataStats/metaAnalysis.pl
utils/datasetCreator/fromMySQL/dbToTab.pl
utils/datasetCreator/fromMySQL/removeQuotes.pl
utils/datasetCreator/squaring/convertForSquaring_MATLAB.pl
utils/datasetCreator/squaring/squareMatrix.m
utils/datasetCreator/squaring/squareMatrix_partial.m
utils/datasetCreator/squaring/squareMatrix_perl.pl
META.yml                                 Module YAML meta-data (added by MakeMaker)

README view on Meta::CPAN

        The following sections describe the organization of this software
        package and how to use it. A few typical examples are given to help
        clearly understand the usage of the modules and the supporting
        utilities.

  INSTALL
        To install the module, run the following magic commands:

          perl Makefile.PL
          make
          make test
          make install

        This will install the module in the standard location. You will, most
        probably, require root privileges to install in standard system
        directories. To install in a non-standard directory, specify a prefix
        during the 'perl Makefile.PL' stage as:

          perl Makefile.PL PREFIX=/home/programs

        It is possible to modify other parameters during installation. The

README view on Meta::CPAN


    removeCUIPair.pl -- removes all occurrences of the specified CUI pair
    from the co-occurrence matrix

    removeExplicit.pl -- removes any keys that occur in an explicit
    co-occurrence matrix from another co-occurrence matrix (typically the
    squared explicit co-occurrence matrix itself, which generates a
    prediction matrix, or the post cutoff matrix used in time slicing to
    generate a gold standard file)

    testMatrixEquality.pl -- checks to see if two co-occurrence matrix files
    contain the same data

    Also included are several subfolders with more specific purposes. Within
    the dataStats subfolder are scripts to collect various statistics about
    the co-occurrence matrices used in LBD. These scriptsinclude:

    getCUICooccurrences.pl -- a data statistics file that gets the number of
    co-occurrences, and number of unique co-occurrences for every CUI in the
    dataset

lib/ALBD.pm view on Meta::CPAN

      association measure are available as well as frequency based
      ranking methods. See samples/lbd for more info. Can perform open and
      closed LBD as well as time slicing evaluation.

=head1 INSTALL

To install the module, run the following magic commands:

  perl Makefile.PL
  make
  make test
  make install

This will install the module in the standard location. You will, most
probably, require root privileges to install in standard system
directories. To install in a non-standard directory, specify a prefix
during the 'perl Makefile.PL' stage as:

  perl Makefile.PL PREFIX=/home/sid

It is possible to modify other parameters during installation. The

lib/ALBD.pm view on Meta::CPAN

#Done
    print "DONE!\n\n";
}
=cut


##################################################
################ Time Slicing ####################
##################################################

#NOTE: This function isn't really tested, and is really slow right now
# Generates precision and recall values by varying the threshold
# of the A->B ranking measure.
# input:  none
# output: none, but precision and recall values are printed to STDOUT
sub timeSlicing_generatePrecisionAndRecall_explicit {
    my $NUM_SAMPLES = 100; #TODO, read fomr file number of samples to average over for timeslicing
    my $self = shift;
    print "In timeSlicing_generatePrecisionAndRecall\n";

    my $numIntervals = 10;

lib/LiteratureBasedDiscovery/Rank.pm view on Meta::CPAN

	foreach my $cuiPair (sort {$tiedAMWScores{$b} <=> $tiedAMWScores{$a}} keys %tiedAMWScores) {
	    $ltcAMWScores{$cuiPair} = $currentRank;
	    $currentRank--;
	}
    }

    #return the scores
    return \%ltcAMWScores;
}

#TODO this is an untested method
# gets the max cosine distance score between all a terms and each cTerm 
# input:  $startingMatrixRef <- ref to the starting matrix
#         $explicitMatrixRef <- ref to the explicit matrix
#         $implicitMatrixRef <- ref to the implicit matrix
# output: a hash ref of scores for each implicit key. (hash{cui} = score)
sub score_cosineDistance {
    #LBD Info
    my $startingMatrixRef = shift;
    my $explicitMatrixRef = shift;
    my $implicitMatrixRef = shift;

lib/LiteratureBasedDiscovery/TimeSlicing.pm view on Meta::CPAN

#                         predictions are cuis are ordered in descending order 
#                         based on association. (from Rank::RankDescending)
#         $numIntervals <- the number of recall intervals to generate
# output: (\%precision, \%recall) <- refs to hashes of precision and recall. 
#                                    Each hash key is the interval number, and 
#                                    the value is the precision and recall 
#                                    respectively
sub calculatePrecisionAndRecall_implicit {
    my $trueMatrixRef = shift; #a ref to the true matrix
    my $rowRanksRef = shift; #a ref to ranked predictions, each hash element are the predictions for a single cui, at each element is an array of cuis ordered by their rank
    my $numIntervals = shift; #the recall intervals to test at

    #find precision and recall curves for each cui that is being predicted
    #  take the sum of precisions, then average after the loop
    my %precision = ();
    my %recall = ();
    foreach my $rowKey (keys %{$trueMatrixRef}) {
	my $trueRef = ${$trueMatrixRef}{$rowKey}; #a list of true discoveries
	my $rankedPredictionsRef = ${$rowRanksRef}{$rowKey}; #an array ref of ranked predictions
	
	#get the number of predicted discoveries and true discoveries

lib/LiteratureBasedDiscovery/TimeSlicing.pm view on Meta::CPAN

	if ($numTrue == 0) {
	    next;
	}
	#skip if there are NO predictions for this start term
	if ($numPredictions == 0) {
	    next;
	}

	#determine precision and recall at 10% intervals of the number of 
	#predicted true vaules. This is done by simulating a threshold being
	#applied, so the top $numToTest ranked terms are tested at 10% intervals
	my $interval = $numPredictions/$numIntervals;
	for (my $i = 0; $i <= 1; $i+=(1/$numIntervals)) {
	    
	    #determine the number true to grab
	    my $numTrueForInterval = 1; #at $i = 0, grab just the first term that is true
	    if ($i > 0) {
		$numTrueForInterval = $numTrue*$i;
	    }

	    #grab true discoveries until the recall rate is exceeded

lib/LiteratureBasedDiscovery/TimeSlicing.pm view on Meta::CPAN

	    #sum precision at this interval, average over number of rows is 
	    # taken outside of the loop
	    $precision{$i} += ($truePositive / $numChecked); #number that are selected that are true
	    $recall{$i} += ($truePositive / $numTrue); #number of true that are selected	
	}
    }

    #calculate the average precision at each interval
    foreach my $i (keys %precision) {
	#divide by the number of rows in the true matrix ref
	# because those are the number of cuis we are testing
	# it is possible that the predictions has rows that are 
	# not in the true, and those should be ignored.
	$precision{$i} /= (scalar keys %{$trueMatrixRef});
	$recall{$i} /= (scalar keys %{$trueMatrixRef});
    }

    #return the precision and recall at 10% intervals
    return (\%precision, \%recall);
}

lib/LiteratureBasedDiscovery/TimeSlicing.pm view on Meta::CPAN

    my $trueMatrixRef = shift; # a matrix of true discoveries
    my $rowRanksRef = shift; # a hash of ranked predicted discoveries
  
    #generate mean cooccurrences at k at intervals of 10 for k = 10-100
    my %meanCooccurrenceCount = (); #count of the number of co-occurrences for each k
    my $interval = 1;
    for (my $k = 1; $k <= 100; $k+=$interval) {
	$meanCooccurrenceCount{$k} = 0;

	#average the mean co-occurrenes over all terms
	#  the true matrix contains only rows for the cuis being tested 
        #  or in time slicing
	foreach my $rowKey (keys %{$trueMatrixRef}) {
	    my $rankedPredictionsRef = ${$rowRanksRef}{$rowKey}; #an array ref of ranked predictions
	    
	    #skip for rows that have no predictions
	    if (!defined $rankedPredictionsRef) {
		next;
	    } 
	    my $trueRef = ${$trueMatrixRef}{$rowKey}; #a list of true discoveries

samples/lbdConfig view on Meta::CPAN

#
#   *all procedures require a measure to be specified except LTC and freq
<rankingProcedure>averageMinimumWeight

# The association measure to use as a value in the ranking procedure.
# The string is passed directly to UMLS::Association, so as that gets
# updated, new associatio measures will work automatically.
# At the time of this writing, valid arguments are:
# freq - Frequency
# dice - Dice Coefficient
# left - Fishers exact test - left sided 
# right - Fishers exact test - right sided 
# twotailed - Fishers twotailed test
# jaccard - Jaccard Coefficient
# ll - Log-likelihood ratio
# tmi - Mutual Information
# odds - Odds Ratio
# pmi - Pointwise Mutual Information
# phi - Phi Coefficient
# chi - Pearson's Chi Squared Test
# ps - Poisson Stirling Measure 
# tscore - T-score
<rankingMeasure>ll

samples/timeSlicingConfig view on Meta::CPAN

#
#   *all procedures require a measure to be specified except LTC and freq
<rankingProcedure>averageMinimumWeight

# The association measure to use as a value in the ranking procedure.
# The string is passed directly to UMLS::Association, so as that gets
# updated, new associatio measures will work automatically.
# At the time of this writing, valid arguments are:
# freq - Frequency
# dice - Dice Coefficient
# left - Fishers exact test - left sided 
# right - Fishers exact test - right sided 
# twotailed - Fishers twotailed test
# jaccard - Jaccard Coefficient
# ll - Log-likelihood ratio
# tmi - Mutual Information
# odds - Odds Ratio
# pmi - Pointwise Mutual Information
# phi - Phi Coefficient
# chi - Pearson's Chi Squared Test
# ps - Poisson Stirling Measure 
# tscore - T-score
<rankingMeasure>ll

t/test.t view on Meta::CPAN

#!/usr/local/bin/perl -w

# Before `make install' is performed this script should be runnable with
# `make test'. After `make install' it should work as `perl t/lch.t'

use strict;
use warnings;
use Test::Simple tests => 10;

#error tolerance for exact numerical matches due to precision issues 
# and sort issues (again due to precision) there may be small
# differences between runs. The precision at K difference is 
# larger due to small differences in ranking making big differences
# in scores when the K < 10. See Rank::rankDescending for more
# details as to why the ranking imprecision occurrs
my $precRecallErrorTol = 0.0001;
my $atKErrorTol = 1.0;

#######################################################
# test script to run the sample code and compare its 
# output to the expected output. This tests both the 
# open and closed discovery code portions
#########################################################


#Test that the demo file can run correctly
`(cd ./samples/; perl runSample.pl) &`;

#######################################################
#test that the demo output matches the expected demo output
#########################################################
print "Performing Open Discovery Tests:\n";

#read in the gold scores from the open discovery gold
my %goldScores = ();
open IN, './t/goldSampleOutput' 
    or die ("Error: Cannot open gold sample output\n");
while (my $line = <IN>) {
    if ($line =~ /\d+\t(\d+\.\d+)\t(C\d+)/) {
	$goldScores{$2} = $1;

t/test.t view on Meta::CPAN

    }
}
ok ($allExist == 1, "All CUIs exist in the output");  #all cuis exist in the new output file
ok ($allMatch == 1, "All Scores are the same in the output");  #all scores are the same in the new output file

print "Done with Open Discovery Tests\n\n";



#######################################################
#test that time slicing is computed correctly
#########################################################
print "Performing Time Slicing Tests\n";

#read in gold time slicing output
(my $goldAPScoresRef, my $goldMAP, my $goldPAtKScoresRef, my $goldFAtKScoresRef)
    = &readTimeSlicingData('./t/goldSampleTimeSliceOutput');

#read in new time slicing output
(my $newAPScoresRef, my $newMAP, my $newPAtKScoresRef, my $newFAtKScoresRef)
    = &readTimeSlicingData('./samples/sampleTimeSliceOutput');

utils/datasetCreator/fromMySQL/removeQuotes.pl view on Meta::CPAN

#renoves quotes from a db to tab file

my $inFile = '1980_1984_window1_retest_data.txt';
my $outFile = '1980_1984_window1_restest_DELETEME';


open IN, $inFile or die ("unable to open inFile: $inFile\n");
open OUT, '>'.$outFile or die ("unable to open outFile: $outFile\n");

while (my $line  = <IN>) {
    $line =~ s/"//g;
    #print $line;
    print OUT $line;
}

utils/runDiscovery.pl view on Meta::CPAN

."\nOPTIONS\n"
."   --assocConfig        path to the UMLS::Association Config File\n"
."   --interfaceConfig    path to the UMLS::Interface Config File\n"
."\nUSAGE EXAMPLES\n"
."   runDiscovery lbdConfigFile\n";
;

#############################################################################
#                       Parse command line options 
#############################################################################
my $DEBUG = 0;      # Prints EVERYTHING. Use with small testing files.        
my $HELP = '';      # Prints usage and exits if true.
my $VERSION;

#set default param values
my %options = ();
$options{'assocConfig'}  = '';
$options{'interfaceConfig'} = '';

#grab all the options and set values
GetOptions( 'debug'             => \$DEBUG,

( run in 0.392 second using v1.01-cache-2.11-cpan-0d8aa00de5b )