ALBD
view release on metacpan or search on metacpan
The licenses for most software are designed to take away your
freedom to share and change it. By contrast, the GNU General Public
License is intended to guarantee your freedom to share and change free
software--to make sure the software is free for all its users. This
General Public License applies to most of the Free Software
Foundation's software and to any other program whose authors commit to
using it. (Some other Free Software Foundation software is covered by
the GNU Library General Public License instead.) You can apply it to
your programs, too.
When we speak of free software, we are referring to freedom, not
price. Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
this service if you wish), that you receive source code or can get it
if you want it, that you can change the software or use pieces of it
in new free programs; and that you know you can do these things.
To protect your rights, we need to make restrictions that forbid
anyone to deny you these rights or to ask you to surrender the rights.
These restrictions translate to certain responsibilities for you if you
distribute copies of the software, or if you modify it.
Whether that is true depends on what the Program does.
1. You may copy and distribute verbatim copies of the Program's
source code as you receive it, in any medium, provided that you
conspicuously and appropriately publish on each copy an appropriate
copyright notice and disclaimer of warranty; keep intact all the
notices that refer to this License and to the absence of any warranty;
and give any other recipients of the Program a copy of this License
along with the Program.
You may charge a fee for the physical act of transferring a copy, and
you may at your option offer warranty protection in exchange for a fee.
2. You may modify your copy or copies of the Program or any portion
of it, thus forming a work based on the Program, and copy and
distribute such modifications or work under the terms of Section 1
above, provided that you also meet all of these conditions:
a) You must cause the modified files to carry prominent notices
stating that you changed the files and the date of any change.
machine-readable copy of the corresponding source code, to be
distributed under the terms of Sections 1 and 2 above on a medium
customarily used for software interchange; or,
c) Accompany it with the information you received as to the offer
to distribute corresponding source code. (This alternative is
allowed only for noncommercial distribution and only if you
received the program in object code or executable form with such
an offer, in accord with Subsection b above.)
The source code for a work means the preferred form of the work for
making modifications to it. For an executable work, complete source
code means all the source code for all modules it contains, plus any
associated interface definition files, plus the scripts used to
control compilation and installation of the executable. However, as a
special exception, the source code distributed need not include
anything that is normally distributed (in either source or binary
form) with the major components (compiler, kernel, and so on) of the
operating system on which the executable runs, unless that component
itself accompanies the executable.
of co-occurrence files
There is another folder containing scripts to square co-occurrence
matrices. Squaring an explicit (A to B) co-occurrence matrix results in
a co-occurrence matrix containing all implicit (A to C) connections.
This is useful for time slicing and other analysis. Removal of the
original explicit matrix is an additional step that is required if you
wish to create a predictions matrix file for every CUI. This can be done
with the removeExplicit.pl script. Squaring a co-occurrence matrix can
be very computationally expensive, both in terms of ram and cpu. For
this reason MATLAB scripts are preferred over perl scripts. Even using
MATLAB ram can become an issue, and squaring sections of a matrix and
combining them into a single output matrix may be necassary, but takes
much longer. Scripts in the squaring folder include:
convertForSquaring_MATLAB.pl -- functions to convert to and from ALBD
and MATLAB sparse matrix formats
squareMatrix.m -- MATLAB script to square a matrix while holding
everything in ram. Faster, but requires more ram.
lib/ALBD.pm view on Meta::CPAN
##############################################################################
# function to produce output
##############################################################################
# outputs the implicit terms to string
# input: $scoresRef <- a reference to a hash of scores (hash{CUI}=score)
# $ranksRef <- a reference to an array of CUIs ranked by their score
# $printTo <- optional, outputs the $printTo top ranked terms. If not
# specified, all terms are output
# output: a line seperated string containing ranked terms, scores, and thier
# preferred terms
sub _rankedTermsToString {
my $self = shift;
my $scoresRef = shift;
my $ranksRef = shift;
my $printTo = shift;
#set printTo
if (!$printTo) {
$printTo = scalar @{$ranksRef};
}
lib/ALBD.pm view on Meta::CPAN
my $index;
for (my $i = 0; $i < $printTo; $i++) {
#add the rank
$index = $i+1;
$string .= "$index\t";
#add the score
$string .= sprintf "%.5f\t", "${$scoresRef}{${$ranksRef}[$i]}\t";
#add the CUI
$string .= "${$ranksRef}[$i]\t";
#add the name
my $name = $umls_interface->getPreferredTerm(${$ranksRef}[$i]);
#if no preferred name, get anything
if (!defined $name || $name eq '') {
my $termListRef = $umls_interface->getTermList('C0440102');
if (scalar @{$termListRef} > 0) {
$name = '.**'.${$termListRef}[0];
}
}
$string .= "$name\n";
}
#!/usr/local/bin/perl -w
# Before `make install' is performed this script should be runnable with
# `make test'. After `make install' it should work as `perl t/lch.t'
use strict;
use warnings;
use Test::Simple tests => 10;
#error tolerance for exact numerical matches due to precision issues
# and sort issues (again due to precision) there may be small
# differences between runs. The precision at K difference is
# larger due to small differences in ranking making big differences
# in scores when the K < 10. See Rank::rankDescending for more
# details as to why the ranking imprecision occurrs
my $precRecallErrorTol = 0.0001;
my $atKErrorTol = 1.0;
#######################################################
# test script to run the sample code and compare its
#read in new time slicing output
(my $newAPScoresRef, my $newMAP, my $newPAtKScoresRef, my $newFAtKScoresRef)
= &readTimeSlicingData('./samples/sampleTimeSliceOutput');
#check that the correct number of values are read for all the
# time slicing metrics
ok (scalar @{$newAPScoresRef} == 11, "Correct Count of Average Precisions");
ok (scalar @{$newPAtKScoresRef} == 19, "Correct Count of Precision at K's");
ok (scalar @{$newFAtKScoresRef} == 19, "Correct Count of Freq at K's");
#check that each of the AP scores match the gold (within error tolerance)
my $apSame = 1;
for (my $i = 0; $i < scalar @{$goldAPScoresRef}; $i++) {
#check both comma seperated values (precision and recall)
my @goldScores = split(',',${$goldAPScoresRef}[$i]);
my @newScores = split(',',${$newAPScoresRef}[$i]);
if ((abs($goldScores[0]-$newScores[0]) > $precRecallErrorTol)
&& (abs($goldScores[1]-$newScores[1]) > $precRecallErrorTol)) {
$apSame = 0;
last;
}
}
ok($apSame == 1, "Average Precisions Match");
#check MAP is the same (within error tolerance)
ok (abs($goldMAP - $newMAP) > $precRecallErrorTol, "Mean Average Precision Matches");
#check that each of Precision at K scores match the gold
# (within error tolerance)
my $pAtKSame = 1;
for (my $i = 0; $i < scalar @{$goldPAtKScoresRef}; $i++) {
if (abs(${$goldPAtKScoresRef}[$i] - ${$newPAtKScoresRef}[$i]) > $atKErrorTol) {
$pAtKSame = 0;
last;
}
}
ok($pAtKSame == 1, "Precision at K Matches");
#check that each of the Freq at K scores match the gold
# (within error tolerance)
my $fAtKSame = 1;
for (my $i = 0; $i < scalar @{$goldFAtKScoresRef}; $i++) {
if (abs(${$goldFAtKScoresRef}[$i] - ${$newFAtKScoresRef}[$i]) > $atKErrorTol) {
$fAtKSame = 0;
last;
}
}
ok($fAtKSame == 1, "Frequency at K Matches");
print "Done with Time Slicing Tests\n";
utils/datasetCreator/squaring/squareMatrix.m view on Meta::CPAN
%nohup matlab -nojvm -r squareMatrix -logfile squareMatrix_matlab.out </dev/null &
clear all;
close all;
sparseSquare('/home/henryst/lbdData/squaring/1975_1999_window8_noOrder','/home/henryst/lbdData/squaring/1975_1999_window8_noOrder_squared');
error('DONE!');
function sparseSquare(fileIn, fileOut)
%load the data
data = load(fileIn);
disp(' loaded data');
%convert to sparse
vals = max(data);
utils/datasetCreator/squaring/squareMatrix_partial.m view on Meta::CPAN
%nohup matlab -nojvm -r squareMatrix_partial -logfile squareMatrix_matlab.out </dev/null &
clear all;
close all;
%The number of rows/cols to square at a time. Larger numbers will run faster
% but will require more ram. Lower numbers will run slower but require less ram.
% You want this number to be as high as possible without MATLAB crashing
increment = 40000;
sparseSquare_sectioned('/home/henryst/lbdData/squaring/1975_1999_window8_noOrder','/home/henryst/lbdData/squaring/1975_1999_window8_noOrder_squared_secondTry',increment);
error('DONE!');
function sparseSquare_sectioned(fileIn, fileOut, increment)
disp(fileIn);
%open, close, and clear the output file
fid = fopen(fileOut,'w');
fclose(fid);
%load the data
data = load(fileIn);
( run in 0.977 second using v1.01-cache-2.11-cpan-49f99fa48dc )