type results from the CPAN

ALBD

view release on metacpan or search on metacpan

    along with this program; if not, write to the Free Software
    Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA


Also add information on how to contact you by electronic and paper mail.

If the program is interactive, make it output a short notice like this
when it starts in an interactive mode:

    Gnomovision version 69, Copyright (C) year name of author
    Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
    This is free software, and you are welcome to redistribute it
    under certain conditions; type `show c' for details.

The hypothetical commands `show w' and `show c' should show the appropriate
parts of the General Public License.  Of course, the commands you use may
be called something other than `show w' and `show c'; they could even be
mouse-clicks or menu items--whatever suits your program.

You should also get your employer (if you work as a programmer) or your
school, if any, to sign a "copyright disclaimer" for the program, if
necessary.  Here is a sample; alter the names:

INSTALL view on Meta::CPAN


  Stage 5: Set up Dummy UMLS::Association Database
    UMLS::Association requires that a database can be connected to that is
    in the correct format. Although this database is not required for ALBD
    (since co-occurrence data is loaded from a co-occurrence matrix), it is
    required to run UMLS:Association. If you ran UMLS::Association to
    generate a co-occurrence matrix, you should be fine. Otherwise you will
    need to create a dummy database that it can connect to. This can be done
    in a few steps:

    1) open mysql type mysql at the terminal

    2) create the default database in the correct format, type: CREATE
    DATABASE cuicounts; use cuicounts; CREATE TABLE N_11(cui_1 CHAR(10),
    cui_2 CHAR(10), n_11 BIGINT(20));

CONTACT US
    If you have any trouble installing and using ALBD, please contact us us
    directly :

        Sam Henry: henryst at vcu.edu

        Bridget McInnes: btmcinnes at vcu.edu

README view on Meta::CPAN

        ranking methods are available. The novel ranking methods that use
        association measure are available as well as frequency based
        ranking methods. See samples/lbd for more info. Can perform open and
        closed LBD as well as time slicing evaluation.

        ALBD requires UMLS::Association both to compute the co-occurrence
        database that the co-occurrence matrix is derived from, but also for 
        ranking the generated C terms.

        UMLS::Association requires the UMLS::Interface module to access 
        the Unified Medical Language System (UMLS) for semantic type filtering
        and to determine if CUIs are valid.

        The following sections describe the organization of this software
        package and how to use it. A few typical examples are given to help
        clearly understand the usage of the modules and the supporting
        utilities.

  INSTALL
        To install the module, run the following magic commands:

README view on Meta::CPAN


  Set Up Dummy UMLS::Association Database
    UMLS::Association requires that a database can be connected to that is
    in the correct format. Although this database is not required for ALBD
    (since co-occurrence data is loaded from a co-occurrence matrix), it is
    required to run UMLS:Association. If you ran UMLS::Association to
    generate a co-occurrence matrix, you should be fine. Otherwise you will
    need to create a dummy database that it can connect to. This can be done
    in a few steps:

    1) open mysql type mysql at the terminal

    2) create the default database in the correct format, type: CREATE
    DATABASE cuicounts; use cuicounts; CREATE TABLE N_11(cui_1 CHAR(10),
    cui_2 CHAR(10), n_11 BIGINT(20));

  INITIALIZING THE MODULE
    To create an instance of the ALBD object, using default values for all
    configuration options: %options = (); $options{'lbdConfig'} =
    'configFile'; my $lbd = LiteratureBasedDiscovery->new(\%options);
    $lbd->performLBD();

    The following configuration options are also provided though:

README view on Meta::CPAN

    Co-occurrence matrices in the utils/datasetCreator/ directory. These are
    short scripts and generally require modifying the code at the top with
    user input paramaters specific for each run. These scripts include:

    applyMaxThreshold.pl -- applies a maximum co-occurrence threshold to the
    co-occurrence matrix

    applyMinThreshold.pl -- applies a minimum co-occurrence threshold to the
    co-occurrence matrix

    applySemanticFilter.pl -- applies a semantic type and/or group filter to
    the co-occurrence matrix.

    combineCooccurrenceMatrices.pl -- combines the co-occurrence counts of
    multiple co-occurrence matrices

    makeOrderNotMatter.pl -- makes the order of CUI co-occurrences not
    matter by updating the co-occurrence matrix file. (UMLS::Association
    generates co-occurrence files where order does matter, so the sentence
    'cui1 cui2' will only mark a co-occurrence between cui1 and cui2, but
    not between cui2 and cui1).

lib/ALBD.pm view on Meta::CPAN

    $start = time();
    my $startingMatrixRef = 
	Discovery::getRows($startCuisRef, $explicitMatrixRef);
    print "Got Starting Matrix in ".(time() - $start)."\n";

    #if using average minimum weight, grab the a->b scores
    my %abPairsWithScores = ();
    if ($lbdOptions{'rankingProcedure'} eq 'averageMinimumWeight' 
	|| $lbdOptions{'rankingProcedure'} eq 'ltc_amw') {

	#apply semantic type filter to columns only
	if ((scalar keys %{$linkingAcceptTypesRef}) > 0) {
	    Filters::semanticTypeFilter_columns(
		$explicitMatrixRef, $linkingAcceptTypesRef, $umls_interface);
	}
	#initialize the abPairs to frequency of co-occurrence
	foreach my $row (keys %{$startingMatrixRef}) {
	    foreach my $col (keys %{${$startingMatrixRef}{$row}}) {
		$abPairsWithScores{"$row,$col"} = ${${$startingMatrixRef}{$row}}{$col};
	    }
	}

lib/ALBD.pm view on Meta::CPAN

    #get association scores for the starting matrix
    my $assocScoresRef = TimeSlicing::getAssociationScores(
	$startingMatrixRef, $lbdOptions{'rankingMeasure'}, $umls_association);
    my ($min, $max) = TimeSlicing::getMinMax($assocScoresRef);
    my $range = $max-$min;

    #load the post cutoff matrix for the necassary rows
    my $postCutoffMatrixRef 
	= TimeSlicing::loadPostCutOffMatrix($startingMatrixRef, $explicitMatrixRef, $lbdOptions{'postCutoffFileName'});

    #apply a semantic type filter to the post cutoff matrix
    if ((scalar keys %{$targetAcceptTypesRef}) > 0) {
	    Filters::semanticTypeFilter_columns(
		$postCutoffMatrixRef, $targetAcceptTypesRef, $umls_interface);
    }

    #apply a threshold at $numIntervals% intervals to generate an 11 point
    # interpolated precision/recall curve for linking term ranking/thresholding
    #stats for collecting info about predicted vs. true
    my $predictedAverage = 0;
    my $trueAverage = 0;

lib/ALBD.pm view on Meta::CPAN

	#apply a threshold (number of samples)
	my $thresholdedStartingMatrixRef = TimeSlicing::grabKHighestRankedSamples($numSamples, $assocScoresRef, $startingMatrixRef);

	#generate implicit knowledge
	my $implicitMatrixRef = Discovery::findImplicit($explicitMatrixRef, $thresholdedStartingMatrixRef);

	#Remove Known Connections
	$implicitMatrixRef 
	    = Discovery::removeExplicit($startingMatrixRef, $implicitMatrixRef);

	#apply a semantic type filter to the implicit matrix
	if ((scalar keys %{$targetAcceptTypesRef}) > 0) {
	    Filters::semanticTypeFilter_columns(
		$implicitMatrixRef, $targetAcceptTypesRef, $umls_interface);
	}

	#calculate precision and recall
	my ($precision, $recall) = TimeSlicing::calculatePrecisionRecall(
	    $implicitMatrixRef, $postCutoffMatrixRef);
	print "precision = $precision, recall = $recall\n";

lib/ALBD.pm view on Meta::CPAN

	print "inputting gold\n";
	$goldMatrixRef = Discovery::fileToSparseMatrix($lbdOptions{'goldInputFile'});
    }
    else {
	print "loading post cutoff\n";
	$goldMatrixRef = TimeSlicing::loadPostCutOffMatrix($startingMatrixRef, $explicitMatrixRef, $lbdOptions{'postCutoffFileName'});

	#remove explicit knowledge from the post cutoff matrix
	$goldMatrixRef = Discovery::removeExplicit($startingMatrixRef, $goldMatrixRef);

	#apply a semantic type filter to the post cutoff matrix
	print "applying semantic filter to post-cutoff matrix\n";
	if ((scalar keys %{$targetAcceptTypesRef}) > 0) {
	    Filters::semanticTypeFilter_columns(
		$goldMatrixRef, $targetAcceptTypesRef, $umls_interface);
	}

	#TODO why is the gold matrix outputting with an extra line between samples?
	#output the gold matrix
	if (exists $lbdOptions{'goldOutputFile'}) {
	    print "outputting gold\n";

lib/ALBD.pm view on Meta::CPAN

  
#-------
# AB Scoring (if needed)
#-------
    #if using average minimum weight, grab the a->b scores, #TODO this is sloppy here, but it has to be here...how to make it fit better?
    my %abPairsWithScores = ();
    if ($lbdOptions{'rankingProcedure'} eq 'averageMinimumWeight'
		|| $lbdOptions{'rankingProcedure'} eq 'ltc_amw') {
	print "getting AB scores\n";

	#apply semantic type filter to columns only
	if ((scalar keys %{$linkingAcceptTypesRef}) > 0) {
	    Filters::semanticTypeFilter_columns(
		$explicitMatrixRef, $linkingAcceptTypesRef, $umls_interface);
	}
	#intitialize the abPairs to the frequency of co-ocurrence
	foreach my $row (keys %{$startingMatrixRef}) {
	    foreach my $col (keys %{${$startingMatrixRef}{$row}}) {
		$abPairsWithScores{"$row,$col"} = ${${$startingMatrixRef}{$row}}{$col}; 
	    }
	}

lib/ALBD.pm view on Meta::CPAN

	#generate implicit knowledge
	print "Squaring Matrix\n";
	$predictionsMatrixRef = Discovery::findImplicit(
	    $explicitMatrixRef, $startingMatrixRef);

	#Remove Known Connections
	print "Removing Known from Predictions\n";
	$predictionsMatrixRef 
	    = Discovery::removeExplicit($startingMatrixRef, $predictionsMatrixRef);

	#apply a semantic type filter to the predictions matrix
	print "Applying Semantic Filter to Predictions\n";
	if ((scalar keys %{$targetAcceptTypesRef}) > 0) {
	    Filters::semanticTypeFilter_columns(
		$predictionsMatrixRef, $targetAcceptTypesRef, $umls_interface);
	}

	#save the implicit knowledge matrix to file
	if (exists ($lbdOptions{'predictionsOutFile'})) {
	    print "outputting predictions\n";
	    Discovery::outputMatrixToFile($lbdOptions{'predictionsOutFile'}, $predictionsMatrixRef);

lib/ALBD.pm view on Meta::CPAN


# transforms the string of target cuis to an array
# input:  none
# output: an array ref of CUIs
sub _getTargetCuis {
    my $self = shift;
    my @targetCuis = split(',',$lbdOptions{'targetCuis'});
    return \@targetCuis;
}

# transforms the string of accept types or groups into a hash of accept TUIs
# input:  a string specifying whether linking or target types are being defined
# output: a hash of acceptable TUIs
sub _getAcceptTypes {
    my $self = shift;
    my $stepString = shift; #either 'linking' or 'target'

    #get the accept types 
    my %acceptTypes = ();

    #add all types for groups specified
    my $string = $stepString.'AcceptGroups';
    if (defined $lbdOptions{$string}) {
	#accept groups were specified
	my @acceptGroups = split(',',$lbdOptions{$string});

	#add all the types of each group
	foreach my $group(@acceptGroups) {
	    my $typesRef = Filters::getTypesOfGroup($group, $umls_interface);
	    foreach my $key(keys %{$typesRef}) {
		$acceptTypes{$key} = 1;
	    }
	}
    }

    #add all types specified
    $string = $stepString.'AcceptTypes';
    if (defined $lbdOptions{$string}) {
	#convert each type to a tui and add
	my $tui;
	my @acceptTypes = split(',',$lbdOptions{$string});
	foreach my $abr(@acceptTypes) {
	    $tui = uc $umls_interface->getStTui($abr);
	    $acceptTypes{$tui} = 1;
	}
    }
    
    return \%acceptTypes;
}

lib/LiteratureBasedDiscovery/Filters.pm view on Meta::CPAN

# 59 Temple Place - Suite 330,
# Boston, MA  02111-1307, USA.

package Filters;
use strict;
use warnings;

use UMLS::Interface;

# applies a semantic group filter to the matrix, by removing keys that 
# are not allowed semantic type. Eliminates both rows and columns, so
# is applied to the full explicit matrix
# input:  $matrixRef <- ref to a sparse matrix to be filtered
#         $acceptTypesRef <- a ref to a hash of accept type strings
#         $umls <- an instance of UMLS::Interface
# output: None, but $vectorRef is updated 
sub semanticTypeFilter_rowsAndColumns {
    my $matrixRef = shift;
    my $acceptTypesRef = shift;
    my $umls = shift;
 
=comment   
    #Count the number of keys before and after filtering (for debugging)
    my %termsHash = ();
    foreach my $key1 (keys %{$matrixRef}) {
	foreach my $key2 (keys %{${$matrixRef}{$key1}}) {
	    $termsHash{$key2} = 1;
	}
    }
    print "   number of keys before filtering = ".(scalar keys %termsHash)."\n";
=cut

    #eliminate values that are incorrect semantic groups
    #do each row at a time, remove column values that 
    #are the incorrect semantic type
    my %cuisChecked = ();
    #cuisChecked keeps track of cuis that have been checked 
    # for elimination. If the cui has been checked its key
    # will exist in the hash. Values of -1 indicate it should
    # be eliminated, values of 1 indicate it should stay.

    #eliminate cuis from rows
    foreach my $cui (keys %{$matrixRef}) {
	#update cui checked hash
	if (!exists $cuisChecked{$cui}) {
	    $cuisChecked{$cui} = -1;

	    my $typesRef = $umls->getSt($cui);
	    foreach my $type(@{$typesRef}) {
		my $abr = $umls->getStAbr($type);

		#check the cui for removal
		if (exists ${$acceptTypesRef}{$type}) {
		    $cuisChecked{$cui} = 1;
		    last;
		}
	    }
	}

	#eliminate if needed
	if ($cuisChecked{$cui} < 0) {
	    delete ${$matrixRef}{$cui};
	}
    }

    #eliminate cuis from columns
    foreach my $cui1 (keys %{$matrixRef}) {
	foreach my $cui2 (keys %{${$matrixRef}{$cui1}}) {
	    #update cui checked hash
	    if (!exists $cuisChecked{$cui2}) {
		$cuisChecked{$cui2} = -1;

		my $typesRef = $umls->getSt($cui2);
		foreach my $type(@{$typesRef}) {
		    my $abr = $umls->getStAbr($type);

		    #check the cui for removal
		    if (exists ${$acceptTypesRef}{$type}) {
			$cuisChecked{$cui2} = 1;
			last;
		    }
		}
	    }

	    #eliminate if needed
	    if ($cuisChecked{$cui2} < 0) {
		delete ${${$matrixRef}{$cui1}}{$cui2};
	    }

lib/LiteratureBasedDiscovery/Filters.pm view on Meta::CPAN

	foreach my $key2 (keys %{${$matrixRef}{$key1}}) {
	    $termsHash{$key2} = 1;
	}
    }
    print "   number of keys after filtering = ".(scalar keys %termsHash)."\n";
=cut
}


# applies a semantic group filter to the matrix, by removing keys that 
# are not allowed semantic type. Only removes types from rows, 
# so is applied for times slicing, before randomly selecting terms of 
# one semantic type
# input:  $matrixRef <- ref to a sparse matrix to be filtered
#         $acceptTypesRef <- a ref to a hash of accept type strings
#         $umls <- an instance of UMLS::Interface
# output: None, but $vectorRef is updated 
sub semanticTypeFilter_rows {
    my $matrixRef = shift;
    my $acceptTypesRef = shift;
    my $umls = shift;
    
=comment
    #Count the number of keys before and after filtering (for debugging)
    my %termsHash = ();
    foreach my $key1 (keys %{$matrixRef}) {
	foreach my $key2 (keys %{${$matrixRef}{$key1}}) {
	    $termsHash{$key2} = 1;
	}
    }
    print "   number of keys before filtering = ".(scalar keys %termsHash)."\n";
=cut

    #eliminate values that are incorrect semantic groups
    #do each row at a time, remove column values that 
    #are the incorrect semantic type
    my $keep = -1;
    #cuisChecked keeps track of cuis that have been checked 
    # for elimination. If the cui has been checked its key
    # will exist in the hash. Values of -1 indicate it should
    # be eliminated, values of 1 indicate it should stay.
    #eliminate cuis from columns
    foreach my $cui1 (keys %{$matrixRef}) {
	my $typesRef = $umls->getSt($cui1);
	foreach my $type(@{$typesRef}) {
	    my $abr = $umls->getStAbr($type);

	    #check the cui for removal
	    if (exists ${$acceptTypesRef}{$type}) {
		$keep = 1;
		last;
	    }
	}

	#eliminate if needed
	if ($keep < 0) {
	    delete ${$matrixRef}{$cui1};
	}
	$keep = -1;

lib/LiteratureBasedDiscovery/Filters.pm view on Meta::CPAN

	foreach my $key2 (keys %{${$matrixRef}{$key1}}) {
	    $termsHash{$key2} = 1;
	}
    }
    print "   number of keys after filtering = ".(scalar keys %termsHash)."\n";
=cut
}


# applies a semantic group filter to the matrix, by removing keys that 
# are not allowed semantic type. Only removes types from columns, 
# so is applied to the implicit matrix (starting term rows with implicit
# columns).
# input:  $matrixRef <- ref to a sparse matrix to be filtered
#         $acceptTypesRef <- a ref to a hash of accept type strings
#         $umls <- an instance of UMLS::Interface
# output: None, but $vectorRef is updated 
sub semanticTypeFilter_columns {
    my $matrixRef = shift;
    my $acceptTypesRef = shift;
    my $umls = shift;
 
=comment   
    #Count the number of keys before and after filtering (for debugging)
    my %termsHash = ();
    foreach my $key1 (keys %{$matrixRef}) {
	foreach my $key2 (keys %{${$matrixRef}{$key1}}) {
	    $termsHash{$key2} = 1;
	}
    }
    print "   number of keys before filtering = ".(scalar keys %termsHash)."\n";
=cut

    #eliminate values that are incorrect semantic groups
    #do each row at a time, remove column values that 
    #are the incorrect semantic type
    my %cuisChecked = ();
    #cuisChecked keeps track of cuis that have been checked 
    # for elimination. If the cui has been checked its key
    # will exist in the hash. Values of -1 indicate it should
    # be eliminated, values of 1 indicate it should stay.
    #eliminate cuis from columns
    foreach my $cui1 (keys %{$matrixRef}) {
	foreach my $cui2 (keys %{${$matrixRef}{$cui1}}) {
	    #update cui checked hash
	    if (!exists $cuisChecked{$cui2}) {
		$cuisChecked{$cui2} = -1;

		my $typesRef = $umls->getSt($cui2);
		foreach my $type(@{$typesRef}) {
		    my $abr = $umls->getStAbr($type);

		    #check the cui for removal
		    if (exists ${$acceptTypesRef}{$type}) {
			$cuisChecked{$cui2} = 1;
			last;
		    }
		}
	    }

	    #eliminate if needed
	    if ($cuisChecked{$cui2} < 0) {
		delete ${${$matrixRef}{$cui1}}{$cui2};
	    }

lib/LiteratureBasedDiscovery/Filters.pm view on Meta::CPAN

    foreach my $key1 (keys %{$matrixRef}) {
	foreach my $key2 (keys %{${$matrixRef}{$key1}}) {
	    $termsHash{$key2} = 1;
	}
    }
    print "   number of keys after filtering = ".(scalar keys %termsHash)."\n";
=cut

}

# gets the semantic types of the group
# input:  $group <- a string specifying a semantic group
#         $umls <- an instance of UMLS::Interface
# output: a ref to a hash of TUIs
sub getTypesOfGroup {
    my $group = shift;
    my $umls = shift;

    #add each type of the group to the set of accept types
    my %acceptTuis = ();
    my @groupTypes = @{ $umls->getStsFromSg($group) };
    foreach my $abr(@groupTypes) {
	#check that it is defined (types that are no longer in 
	#the UMLS may be returned as part of the group)
	if (defined $abr) {
	    my $tui = uc $umls->getStTui($abr);
	    $acceptTuis{$tui} = 1;
	}
    }

    return \%acceptTuis;
}

# gets all semantic types of the UMLS
# input:  $umls <- an instance of UMLS::Interface
# output: a ref to an array of TUIs
sub getAllTypes {
    my $umls = shift;

    my $abrRef = $umls->getAllSts();
    my @tuis = ();
    foreach my $abr(@{$abrRef}) {
	push @tuis, uc $umls->getStTui($abr);
    }

lib/LiteratureBasedDiscovery/TimeSlicing.pm view on Meta::CPAN


    #return the post cutoff matrix
    return \%postCutoffMatrix;
}

#TODO numRows should be read from file and sent with the lbdOptionsRef
# generates a starting matrix of numRows randomly selected terms
# input:  $explicitMatrixRef <- a ref to the explicit sparse matrix
#         $lbdOptionsRef <- the LBD options
#         $startTermAcceptTypesRef <- a reference to an hash of accept 
#                                     types for start terms (TUIs)
#         $numRows <- the number of random rows to load (if random)
#         $umls_interface <- an instance of the UMLS::Interface
# output: \%startingMatrix <- a ref to the starting sparse matrix
sub generateStartingMatrix {
    my $explicitMatrixRef = shift;
    my $lbdOptionsRef = shift;
    my $startTermAcceptTypesRef = shift;
    my $numRows = shift;
    my $umls_interface = shift;

lib/LiteratureBasedDiscovery/TimeSlicing.pm view on Meta::CPAN

	    }
	    close OUT;
	}
    }

    #return the starting matrix
    return \%startingMatrix;
}


# gets and returns a hash of row keys of the specifies semantic types
# input:  $matrixRef <- a ref to a sparse matrix
#         $acceptTypesRef <- a ref to a hash of accept types (TUIs)
#         $umls <- an instance of UMLS::Interface
# output: \%rowsToKeep <- a ref to hash of rows to keep, each key is 
#                         a CUI, and values are 1. All CUIs specify rows
#                         of acceptable semantic types
sub getRowsOfSemanticTypes {
    my $matrixRef = shift;
    my $acceptTypesRef = shift;
    my $umls = shift;
    
    #loop through the matrix and keep the rows that are of the 
    # desired semantic types
    my %rowsToKeep = ();
    foreach my $cui1 (keys %{$matrixRef}) {
	my $typesRef = $umls->getSt($cui1);
	foreach my $type(@{$typesRef}) {
	    my $abr = $umls->getStAbr($type);

	    #check the cui for removal
	    if (exists ${$acceptTypesRef}{$type}) {
		$rowsToKeep{$cui1} = 1;
		last;
	    }
	}
    }

    #return the rowsToKeep
    return \%rowsToKeep
}

samples/lbdConfig view on Meta::CPAN

# The output path of the results of lbd
<implicitOutputFile>sampleOutput

# a comma seperated list of linking (B) term accept semantic groups, which 
# restricts the linking terms to the semantic groups specified. Group names
# come directly from the UMLS. 
# See https://metamap.nlm.nih.gov/Docs/SemGroups_2013.txt for a list
#<linkingAcceptGroups>CHEM,DISO,GENE,PHYS,ANAT

# similar to linking accept groups, this restricts the acceptable linking (B) 
# terms to terms within the semantic types listed
# See http://metampa.nlm.gov/Docs/SemanticTypes_2013AA.txt for a list
#<linkingAcceptGroups>clnd,chem

# a comma seperated list of target (C) term accept semantic groups, which 
# restricts the linking terms to the semantic groups specified. Group names
# come directly from the UMLS. 
# See https://metamap.nlm.nih.gov/Docs/SemGroups_2013.txt for a list
#<targetAcceptGroups>CHEM,GENE

# similar to target termcept groups, this restricts the acceptable target (C) 
# terms to terms within the semantic types listed
# See http://metampa.nlm.gov/Docs/SemanticTypes_2013AA.txt for a list
#<linkingAcceptGroups>clnd,chem

# Input file path for the explicit co-occurrence matrix used in LBD
<explicitInputFile>sampleExplicitMatrix

# A comma seperated list of starting (A) cuis used in LBD
<startCuis>C0001554,C1961131

# A comma seperated list of target (C) cuis. If specified, system enters closed

samples/timeSlicingConfig view on Meta::CPAN

#----- Time Slicing Specific Parameters ------------------------

#Tell LBD to enter precision and recall mode (time slicing)
<precisionAndRecall_implicit>

# name of the file that contains a new line seperated list of cuis
# each cui serves as a start term. The average of all cuis in this
# file are what is reported for precision and recall
<cuiListFileName>timeSliceCuiList

# A list of starting accept types. This is used to randomly generate 100
# starting terms if a cuiListFileName is not specified. All starting terms
# will be of the types listed
<startAcceptTypes>dsyn


#--------------------------------------

# The ranking procedure to use for LBD
# valid ranking procedures are:
#   allPairs (maxBC) - maximum B to C term value
#   averageMinimumWeight (AMW) - average of minimum A to B and B to C values
#   linkingTermCount* (LTC) - count of shared linking terms

samples/timeSlicingConfig view on Meta::CPAN

# tscore - T-score
<rankingMeasure>ll

# a comma seperated list of linking (B) term accept semantic groups, which 
# restricts the linking terms to the semantic groups specified. Group names
# come directly from the UMLS. 
# See https://metamap.nlm.nih.gov/Docs/SemGroups_2013.txt for a list
#<linkingAcceptGroups>CHEM,DISO,GENE,PHYS,ANAT

# similar to linking accept groups, this restricts the acceptable linking (B) 
# terms to terms within the semantic types listed
# See http://metampa.nlm.gov/Docs/SemanticTypes_2013AA.txt for a list
#<linkingAcceptGroups>clnd,chem

# a comma seperated list of target (C) term accept semantic groups, which 
# restricts the linking terms to the semantic groups specified. Group names
# come directly from the UMLS. 
# See https://metamap.nlm.nih.gov/Docs/SemGroups_2013.txt for a list
#<targetAcceptGroups>CHEM,GENE

# similar to target termcept groups, this restricts the acceptable target (C) 
# terms to terms within the semantic types listed
# See http://metampa.nlm.gov/Docs/SemanticTypes_2013AA.txt for a list
#<linkingAcceptGroups>clnd,chem

# Input file path for the explicit co-occurrence matrix used in LBD
<explicitInputFile>sampleExplicitMatrix

# Input file path for the gold standard matrix (matrix of true predictions)
# See utils/datasetCreator on how to make this
<goldInputFile>sampleGoldMatrix

utils/datasetCreator/applySemanticFilter.pl view on Meta::CPAN

my $columnsOnly = 0; #apply to columns only, or rows and columns

&applySemanticFilter($matrixFileName, $outputFileName, 
		     $acceptTypesString, $acceptGroupsString,



###################################################################
###################################################################

# Applies the semantic type filter
sub applySemanticFilter {
    #grab the input
    my $matrixFileName = shift;
    my $outputFileName = shift;
    my $acceptTypesString = shift;
    my $acceptGroupsString = shift;
    my $interfaceConfig = shift;
    my $columnsOnly = shift;

    print STDERR "Applying Semantic Filter to $matrixFileName\n";

utils/datasetCreator/applySemanticFilter.pl view on Meta::CPAN


    #output the matrix
    Discovery::outputMatrixToFile($outputFileName, $matrixRef);

    #TODO re-enable this and then try to run again
    #disconnect from the database and return
    #$umls_interface->disconnect();
}


# transforms the string of accept types or groups into a hash of accept TUIs
# input:  a string specifying whether linking or target types are being defined
# output: a hash of acceptable TUIs
sub getAcceptTypes {
    my $umls_interface = shift;
    my $acceptTypesString = shift;
    my $acceptGroupsString = shift;

    #get the accept types 
    my %acceptTypes = ();

    #add all types for groups specified
    #accept groups were specified
    my @acceptGroups = split(',',$acceptGroupsString);

    #add all the types of each group
    foreach my $group(@acceptGroups) {
	my $typesRef = Filters::getTypesOfGroup($group, $umls_interface);
	foreach my $key(keys %{$typesRef}) {
	    $acceptTypes{$key} = 1;
	}
    }

    #add all types specified
    #convert each type to a tui and add
    my $tui;
    my @acceptTypes = split(',',$acceptTypesString);
    foreach my $abr(@acceptTypes) {
	$tui = uc $umls_interface->getStTui($abr);
	$acceptTypes{$tui} = 1;
    }
    
    return \%acceptTypes;
}

( run in 3.541 seconds using v1.01-cache-2.11-cpan-df04353d9ac )