ALBD

 view release on metacpan or  search on metacpan

lib/ALBD.pm  view on Meta::CPAN

# ALBD::ALBD
#
# Primary module
# This module contains only the top level functions. So a step by step methods
# Filtering, everything else is in different modules. 
#
# Copyright (c) 2017
#
# Sam Henry
# henryst at vcu.edu
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to
#
# The Free Software Foundation, Inc.,
# 59 Temple Place - Suite 330,
# Boston, MA  02111-1307, USA.


=head1 NAME

ALBD - a perl implementation of Literature Based Discovery

=head1 SYNOPSIS
    
    use ALBD;
    %options = ();
    $options{'lbdConfig'} = 'configFile'
    my $lbd = LiteratureBasedDiscovery->new(\%options);
    $lbd->performLBD();

=head1 ABSTRACT

      This package consists of Perl modules along with supporting Perl
      programs that perform Literature Based Discovery (LBD). The core 
      data from which LBD is performed are co-occurrences matrices 
      generated from UMLS::Association. ALBD is based on the ABC
      co-occurrence model. Many options can be specified, and many
      ranking methods are available. The novel ranking methods that use
      association measure are available as well as frequency based
      ranking methods. See samples/lbd for more info. Can perform open and
      closed LBD as well as time slicing evaluation.

=head1 INSTALL

To install the module, run the following magic commands:

  perl Makefile.PL
  make
  make test
  make install

This will install the module in the standard location. You will, most
probably, require root privileges to install in standard system
directories. To install in a non-standard directory, specify a prefix
during the 'perl Makefile.PL' stage as:

  perl Makefile.PL PREFIX=/home/sid

It is possible to modify other parameters during installation. The
details of these can be found in the ExtUtils::MakeMaker
documentation. However, it is highly recommended not messing around
with other parameters, unless you know what you're doing.

=head1 CONFIGURATION FILE

There are many parameters that can be specified, both for open and
close discovery as well as time slicing evaluation. Please see the 
samples folder for info and sample configuration files.

=cut


######################################################################
#                          Description
######################################################################
#
# This is a description heared more towards understanding or modifying
# the code, rather than using the program.
#
# LiteratureBasedDiscovery.pm - provides functionality to perform LBD
#
# Matrix Representation:
# LBD is performed using Matrix and Vector operations. The major components 
# are an explicit knowledge matrix, which is squared to find the implicit 
# knowledge matrix.
#
# The explicit knowledge is read from UMLS::Association N11 matrix. This 
# matrix contains the co-occurrence counts for all CUI pairs. The 
# UMLS::Association database is completely independent from 
# implementation, so any dataset, window size, or anything else may be used. 
# Data is read in as a sparse matrix using the Discovery::tableToSparseMatrix 
# function. This returns the primary data structures and variables used 
# throughtout LBD.
#
# Matrix representation: 
# This module uses a matrix representation for LBD. All operations are 
# performed either as matrix or vector operations. The core data structure
# are the co-occurrence matrices explicitMatrix and implicitMatrix. These
# matrices have dimensions vocabulary size by vocabulary size. Each row 
# corresponds to the all co-occurrences for a single CUI. Each column of that 
# row corresponding to a co-occurrence with a single CUI. Since the matrices 
# tend to be sparse, they are stored as hashes of hashes, where the the first 
# key is for a row, and the second key is for a column. The keys of each hash 
# are the indeces within the matrix. The hash values are the number of 
# co-ocurrences for that CUI pair (e.g. ${${$explicit{C0000000}}{C1111111} = 10 
# means that CUI C0000000 and C1111111 co-occurred 10 times).
#
# Now with an understanding of the data strucutres, below is a breif 
# description of each: 
#
# startingMatrix <- A matrix containing the explicit matrix rows for all of the
#                   start terms. This makes it easy to have multiple start terms
#                   and using this matrix as opposed to the entire explicit 
#                   matrix drastically improves performance.
# explicitMatrix <- A matrix containing explicit connections (known connections)
#                   for every CUI in the dataset.            
# implicitMatrix <- A matrix containing implicit connections (discovered 
#                   connections) for every CUI in the datast


package ALBD;

use strict;
use warnings;

use LiteratureBasedDiscovery::Discovery;
use LiteratureBasedDiscovery::Evaluation;



( run in 1.336 second using v1.01-cache-2.11-cpan-ceb78f64989 )