ALBD

 view release on metacpan or  search on metacpan

GPL.txt  view on Meta::CPAN

above, provided that you also meet all of these conditions:

    a) You must cause the modified files to carry prominent notices
    stating that you changed the files and the date of any change.

    b) You must cause any work that you distribute or publish, that in
    whole or in part contains or is derived from the Program or any
    part thereof, to be licensed as a whole at no charge to all third
    parties under the terms of this License.

    c) If the modified program normally reads commands interactively
    when run, you must cause it, when started running for such
    interactive use in the most ordinary way, to print or display an
    announcement including an appropriate copyright notice and a
    notice that there is no warranty (or else, saying that you provide
    a warranty) and that users may redistribute the program under
    these conditions, and telling the user how to view a copy of this
    License.  (Exception: if the Program itself is interactive but
    does not normally print such an announcement, your work based on
    the Program is not required to print an announcement.)


GPL.txt  view on Meta::CPAN

Also add information on how to contact you by electronic and paper mail.

If the program is interactive, make it output a short notice like this
when it starts in an interactive mode:

    Gnomovision version 69, Copyright (C) year name of author
    Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
    This is free software, and you are welcome to redistribute it
    under certain conditions; type `show c' for details.

The hypothetical commands `show w' and `show c' should show the appropriate
parts of the General Public License.  Of course, the commands you use may
be called something other than `show w' and `show c'; they could even be
mouse-clicks or menu items--whatever suits your program.

You should also get your employer (if you work as a programmer) or your
school, if any, to sign a "copyright disclaimer" for the program, if
necessary.  Here is a sample; alter the names:

  Yoyodyne, Inc., hereby disclaims all copyright interest in the program
  `Gnomovision' (which makes passes at compilers) written by James Hacker.

INSTALL  view on Meta::CPAN

    <http://search.cpan.org/dist/UMLS-Interface/>

   UMLS::Association
    Use to calculate association scores used in most of the ranking method.

    The package is freely available at:

        <http://search.cpan.org/dist/UMLS-Association/>

  Stage 3: Install ALBD package
    The usual way to install the package is to run the following commands:

        perl Makefile.PL
        make
        make test
        make install

    You will often need root access/superuser privileges to run make
    install. The module can also be installed locally. To do a local
    install, you need to specify a PREFIX option when you run 'perl
    Makefile.PL'. For example,

INSTALL  view on Meta::CPAN

    ALBD requires that a co-occurrence matrix of CUIs has been created. This
    matrix is stored as a flat file, in a sparse matrix format such that
    each line contains three tab seperated values, cui_1, cui_2, n_11 = the
    count of their co-occurrences. Any matrix with that format is
    acceptable, however the intended method of matrix generation is to
    convert a UMLS::Association database into a flat matrix file. These
    databases are created using the CUICollector tool of UMLS::Association,
    and are run over the MetaMapped Medline baseline. With that file, run
    utils/datasetCreator/fromMySQL/dbToTab.pl to convert the desired
    database into a matrix file. Notice that code in dbToTab.pl is just a
    sample mysql command. If the input database is created in another
    method, a different command may be needed. As long as the resulting
    co-occurrence matrix is in the correct format LBD may be run on it. This
    allows flexibility in where co-occurrence information comes from.

    Note: utils/datasetCreator/fromMySQL/removeQuotes.pl may need to be run
    on the resulting tab seperated file, if quotes are inlcuded in the
    resulting co-ocurrence matrix file.

  Stage 5: Set up Dummy UMLS::Association Database
    UMLS::Association requires that a database can be connected to that is
    in the correct format. Although this database is not required for ALBD

README  view on Meta::CPAN

        UMLS::Association requires the UMLS::Interface module to access 
        the Unified Medical Language System (UMLS) for semantic type filtering
        and to determine if CUIs are valid.

        The following sections describe the organization of this software
        package and how to use it. A few typical examples are given to help
        clearly understand the usage of the modules and the supporting
        utilities.

  INSTALL
        To install the module, run the following magic commands:

          perl Makefile.PL
          make
          make test
          make install

        This will install the module in the standard location. You will, most
        probably, require root privileges to install in standard system
        directories. To install in a non-standard directory, specify a prefix
        during the 'perl Makefile.PL' stage as:

README  view on Meta::CPAN

    ALBD requires that a co-occurrence matrix of CUIs has been created. This
    matrix is stored as a flat file, in a sparse matrix format such that
    each line contains three tab seperated values, cui_1, cui_2, n_11 = the
    count of their co-occurrences. Any matrix with that format is
    acceptable, however the intended method of matrix generation is to
    convert a UMLS::Association database into a flat matrix file. These
    databases are created using the CUICollector tool of UMLS::Association,
    and are run over the MetaMapped Medline baseline. With that file, run
    utils/datasetCreator/fromMySQL/dbToTab.pl to convert the desired
    database into a matrix file. Notice that code in dbToTab.pl is just a
    sample mysql command. If the input database is created in another
    method, a different command may be needed. As long as the resulting
    co-occurrence matrix is in the correct format LBD may be run on it. This
    allows flexibility in where co-occurrence information comes from.

    Note: utils/datasetCreator/fromMySQL/removeQuotes.pl may need to be run
    on the resulting tab seperated file, if quotes are inlcuded in the
    resulting co-ocurrence matrix file.

  Set Up Dummy UMLS::Association Database
    UMLS::Association requires that a database can be connected to that is
    in the correct format. Although this database is not required for ALBD

lib/ALBD.pm  view on Meta::CPAN

      data from which LBD is performed are co-occurrences matrices 
      generated from UMLS::Association. ALBD is based on the ABC
      co-occurrence model. Many options can be specified, and many
      ranking methods are available. The novel ranking methods that use
      association measure are available as well as frequency based
      ranking methods. See samples/lbd for more info. Can perform open and
      closed LBD as well as time slicing evaluation.

=head1 INSTALL

To install the module, run the following magic commands:

  perl Makefile.PL
  make
  make test
  make install

This will install the module in the standard location. You will, most
probably, require root privileges to install in standard system
directories. To install in a non-standard directory, specify a prefix
during the 'perl Makefile.PL' stage as:

lib/ALBD.pm  view on Meta::CPAN

    my $self = {};
    my $className = shift;
    my $optionsHashRef = shift;
    bless($self, $className);

    $self->_initialize($optionsHashRef);
    return $self;
}

# Initializes everything needed for Literature Based Discovery
# input: $optionsHashRef <- reference to LBD options hash (command line input)
# output: none, but global parameters are set
sub _initialize {
    my $self = shift;
    my $optionsHashRef = shift; 

    #initialize UMLS::Interface
    my %tHash = ();
    $tHash{'t'} = 1; #default hash values are with t=1 (silence module output)
    my $componentOptions = \%tHash;
    if (${$optionsHashRef}{'interfaceConfig'} ne '') {

lib/LiteratureBasedDiscovery/Rank.pm  view on Meta::CPAN

	    $scores{$cKey} = $score;
	}	
    }
    
    return \%scores;
}

# gets a list of A->C pairs, and sets the value as the implicit matrix value
# input:  $startingMatrixRef <- ref to the starting matrix
#         $implicitMatrixRef <- ref to the implicit matrix
# output: a hash ref where keys are comma seperated cui pairs hash{'C000,C111'}
#         and values are set to the value at that index in the implicit matrix
sub _getACPairs {
    my $startingMatrixRef = shift;
    my $implicitMatrixRef = shift;

    #generate a list of ac pairs
    my %acPairs = ();
    foreach my $keyA (keys %{$implicitMatrixRef}) {
	foreach my $keyC (%{${$implicitMatrixRef}{$keyA}}) {
	    $acPairs{$keyA,$keyC} = ${${$implicitMatrixRef}{$keyA}}{$keyC};

lib/LiteratureBasedDiscovery/TimeSlicing.pm  view on Meta::CPAN

}

# generates a hash of all association scores from the matrix
# the hash keys are $rowKey,$colKey. Hash values are the association scores
# between the $rowKey and $colKey. All co-occurring cui pairs from the matrix
# are calculated
# input:  $matrixRef <- a reference to a sparse matrix
#         $rankingMeasue <- a string specifying the ranking measure to use
#         $umls_association <- an instance of UMLS::Association
# output: \%cuiPairs <- a ref to a hash of CUI pairs and their assocaition
#                       each key of the hash is a comma seperated string 
#                       containing cui1, and cui2 of the pair 
#                       (e.g. 'cui1,cui2'), and each value is their association
#                       score using the specified assocition measure
sub getAssociationScores {
    my $matrixRef = shift;
    my $rankingMeasure = shift;
    my $umls_association = shift;
    print "   getting Association Scores, rankingMeasure = $rankingMeasure\n";
    
    #generate a list of cui pairs in the matrix

lib/LiteratureBasedDiscovery/TimeSlicing.pm  view on Meta::CPAN

	    $max = $val;
	}
    }
    return ($min,$max);
}

# Applies a threshold to a matrix using a corresponding association scores
# hash. Any keys less than the threshold are not copied to the new matrix
# input:  $threshold <- a scalar threshold
#         $assocScoresRef <- a reference to a cui pair hash of association
#                            scores. Each key is a comma seperated cui pair
#                            (e.g. 'cui1,cui2'), values are their association
#                            scores.
#         $matrixRef <- a reference to a co-occurrence sparse matrix that 
#                       corresponds to the assocScoresRef
# output: \%thresholdedMatrix < a ref to a new matrix, built from the 
#         $matrixRef after applying the $threshold
sub applyThreshold {
    my $threshold = shift;
    my $assocScoresRef = shift;
    my $matrixRef = shift;

lib/LiteratureBasedDiscovery/TimeSlicing.pm  view on Meta::CPAN

    }

    #return the thresholded matrix
    return \%thresholdedMatrix;
}

# Grabs the K highest ranked samples. This is for thresholding based the number 
# of samples. Used in explicit timeslicing
# input:  $k <- the number of samples to get
#         $assocScoresRef <- a reference to a cui pair hash of association
#                            scores. Each key is a comma seperated cui pair
#                            (e.g. 'cui1,cui2'), values are their association
#                            scores.
#         $matrixRef <- a reference to a co-occurrence sparse matrix that 
#                       corresponds to the assocScoresRef
# output: \%thresholdedMatrix <- a ref to a sparse matrix containing only the 
#                                $k ranked samples (cui pairs)
sub grabKHighestRankedSamples {
    my $k = shift;
    my $assocScoresRef = shift;
    my $matrixRef = shift;

samples/configFileSamples/UMLSAssociationConfig  view on Meta::CPAN

# UMLS::Association
<database>CUI_Bigram

# If the UMLS::Association Database is not installed on the local machine
# The following parameters may be needed to connect to the server
<hostname>192.168.00.00
<username>username
<password>password
<socket>/var/run/mysqld.sock

# makes the UMLS::Association not print to the command line
<t>

samples/configFileSamples/UMLSInterfaceConfig  view on Meta::CPAN

# are in <>'s, and values follow directly after with no space. As as example, 
# the line "<database>umls" will pass the 'database' parameter with a value 
# of 'umls' of UMLS::Interface options hash for its initialization.
#
# For parameters where no value is needed, just write the name of the
# parameter in '<>' (e.g. '<debug>')
#
#
# See UMLS::Interface for more detail

# makes the UMLS::Interface not print to the command line
<t>

# Put the full pathname of the UMLS::Interface Config File
<config>/home/sam/assocLBD-0.01/config/interfaceConfig

# If the UMLS::Interface Database is not installed on the local machine
# The following parameters may be needed to connect to the server
<hostname>192.168.00.00
<username>username
<password>password

samples/lbdConfig  view on Meta::CPAN

# pmi - Pointwise Mutual Information
# phi - Phi Coefficient
# chi - Pearson's Chi Squared Test
# ps - Poisson Stirling Measure 
# tscore - T-score
<rankingMeasure>ll

# The output path of the results of lbd
<implicitOutputFile>sampleOutput

# a comma seperated list of linking (B) term accept semantic groups, which 
# restricts the linking terms to the semantic groups specified. Group names
# come directly from the UMLS. 
# See https://metamap.nlm.nih.gov/Docs/SemGroups_2013.txt for a list
#<linkingAcceptGroups>CHEM,DISO,GENE,PHYS,ANAT

# similar to linking accept groups, this restricts the acceptable linking (B) 
# terms to terms within the semantic types listed
# See http://metampa.nlm.gov/Docs/SemanticTypes_2013AA.txt for a list
#<linkingAcceptGroups>clnd,chem

# a comma seperated list of target (C) term accept semantic groups, which 
# restricts the linking terms to the semantic groups specified. Group names
# come directly from the UMLS. 
# See https://metamap.nlm.nih.gov/Docs/SemGroups_2013.txt for a list
#<targetAcceptGroups>CHEM,GENE

# similar to target termcept groups, this restricts the acceptable target (C) 
# terms to terms within the semantic types listed
# See http://metampa.nlm.gov/Docs/SemanticTypes_2013AA.txt for a list
#<linkingAcceptGroups>clnd,chem

# Input file path for the explicit co-occurrence matrix used in LBD
<explicitInputFile>sampleExplicitMatrix

# A comma seperated list of starting (A) cuis used in LBD
<startCuis>C0001554,C1961131

# A comma seperated list of target (C) cuis. If specified, system enters closed
# discovery mode
#<targetCuis>

samples/timeSlicingConfig  view on Meta::CPAN

# ll - Log-likelihood ratio
# tmi - Mutual Information
# odds - Odds Ratio
# pmi - Pointwise Mutual Information
# phi - Phi Coefficient
# chi - Pearson's Chi Squared Test
# ps - Poisson Stirling Measure 
# tscore - T-score
<rankingMeasure>ll

# a comma seperated list of linking (B) term accept semantic groups, which 
# restricts the linking terms to the semantic groups specified. Group names
# come directly from the UMLS. 
# See https://metamap.nlm.nih.gov/Docs/SemGroups_2013.txt for a list
#<linkingAcceptGroups>CHEM,DISO,GENE,PHYS,ANAT

# similar to linking accept groups, this restricts the acceptable linking (B) 
# terms to terms within the semantic types listed
# See http://metampa.nlm.gov/Docs/SemanticTypes_2013AA.txt for a list
#<linkingAcceptGroups>clnd,chem

# a comma seperated list of target (C) term accept semantic groups, which 
# restricts the linking terms to the semantic groups specified. Group names
# come directly from the UMLS. 
# See https://metamap.nlm.nih.gov/Docs/SemGroups_2013.txt for a list
#<targetAcceptGroups>CHEM,GENE

# similar to target termcept groups, this restricts the acceptable target (C) 
# terms to terms within the semantic types listed
# See http://metampa.nlm.gov/Docs/SemanticTypes_2013AA.txt for a list
#<linkingAcceptGroups>clnd,chem

t/test.t  view on Meta::CPAN

#check that the correct number of values are read for all the 
# time slicing metrics
ok (scalar @{$newAPScoresRef} == 11, "Correct Count of Average Precisions");
ok (scalar @{$newPAtKScoresRef} == 19, "Correct Count of Precision at K's");
ok (scalar @{$newFAtKScoresRef} == 19, "Correct Count of Freq at K's");

#check that each of the AP scores match the gold (within error tolerance)
my $apSame = 1;
for (my $i = 0; $i < scalar @{$goldAPScoresRef}; $i++) {
    
    #check both comma seperated values (precision and recall)
    my @goldScores = split(',',${$goldAPScoresRef}[$i]);
    my @newScores = split(',',${$newAPScoresRef}[$i]);

    if ((abs($goldScores[0]-$newScores[0]) > $precRecallErrorTol)
	&& (abs($goldScores[1]-$newScores[1]) > $precRecallErrorTol)) {
	$apSame = 0;
	last;
    }
}
ok($apSame == 1, "Average Precisions Match");

utils/datasetCreator/fromMySQL/dbToTab.pl  view on Meta::CPAN

#converts a mysql database to tab seperated readable by LBD
#command is of the form:
#`mysql <DB_NAME> -e "SELECT * FROM N_11 INTO OUTFILE '<OUTPUT_FILE>' FIELDS TERMINATED BY '\t' OPTIONALLY ENCLOSED BY '\"' LINES TERMINATED BY '\n';"`
#
# the following line is an example using a database with cui co-occurrence 
# counts from 1980 to 1984 with a window size of 1. The mysql database is 
# called 1980_1984_window1, and the output matrix file is called 
# 1980_1984_window1_data.txt
`mysql 1980_1984_window1 -e "SELECT * FROM N_11 INTO OUTFILE '1980_1984_window1_data.txt' FIELDS TERMINATED BY '\t' OPTIONALLY ENCLOSED BY '\"' LINES TERMINATED BY '\n';"`;

utils/runDiscovery.pl  view on Meta::CPAN


=head1 INPUT

=head2 LBD_CONFIG_FILE

Configuration file specifying the parameters of LBD. 
See '../config/lbd' for an example

=head1 OPTIONS

Optional command line arguements

=head2 General Options:

=head3 --help

displays help, a quick summary of program options

=head3 --assocConfig

path to a UMLS::Association configuration file. Default location is 

utils/runDiscovery.pl  view on Meta::CPAN

."   --help        Print this help screen.\n"
."   --version     Print the version number\n"
."\nOPTIONS\n"
."   --assocConfig        path to the UMLS::Association Config File\n"
."   --interfaceConfig    path to the UMLS::Interface Config File\n"
."\nUSAGE EXAMPLES\n"
."   runDiscovery lbdConfigFile\n";
;

#############################################################################
#                       Parse command line options 
#############################################################################
my $DEBUG = 0;      # Prints EVERYTHING. Use with small testing files.        
my $HELP = '';      # Prints usage and exits if true.
my $VERSION;

#set default param values
my %options = ();
$options{'assocConfig'}  = '';
$options{'interfaceConfig'} = '';



( run in 1.730 second using v1.01-cache-2.11-cpan-d8267643d1d )