Bio-KBase
view release on metacpan or search on metacpan
lib/Bio/KBase/CDMI/CDMI_APIImpl.pm view on Meta::CPAN
Bio::KBase::Exceptions::ArgumentValidationError->throw(error => $msg,
method_name => 'protein_families_to_functions');
}
return($return);
}
=head2 protein_families_to_co_occurring_families
$return = $obj->protein_families_to_co_occurring_families($protein_families)
=over 4
=item Parameter and return types
=begin html
<pre>
$protein_families is a protein_families
$return is a reference to a hash where the key is a protein_family and the value is a fc_protein_families
protein_families is a reference to a list where each element is a protein_family
protein_family is a string
fc_protein_families is a reference to a list where each element is a fc_protein_family
fc_protein_family is a reference to a list containing 3 items:
0: a protein_family
1: a score
2: a function
score is a float
function is a string
</pre>
=end html
=begin text
$protein_families is a protein_families
$return is a reference to a hash where the key is a protein_family and the value is a fc_protein_families
protein_families is a reference to a list where each element is a protein_family
protein_family is a string
fc_protein_families is a reference to a list where each element is a fc_protein_family
fc_protein_family is a reference to a list containing 3 items:
0: a protein_family
1: a score
2: a function
score is a float
function is a string
=end text
=item Description
Since we accumulate data relating to the co-occurrence (i.e., chromosomal
clustering) of genes in prokaryotic genomes, we can note which pairs of genes tend to co-occur.
From this data, one can compute the protein families that tend to co-occur (i.e., tend to
cluster on the chromosome). This allows one to formulate conjectures for unclustered pairs, based
on clustered pairs from the same protein_families.
=back
=cut
sub protein_families_to_co_occurring_families
{
my $self = shift;
my($protein_families) = @_;
my @_bad_arguments;
(ref($protein_families) eq 'ARRAY') or push(@_bad_arguments, "Invalid type for argument \"protein_families\" (value was \"$protein_families\")");
if (@_bad_arguments) {
my $msg = "Invalid arguments passed to protein_families_to_co_occurring_families:\n" . join("", map { "\t$_\n" } @_bad_arguments);
Bio::KBase::Exceptions::ArgumentValidationError->throw(error => $msg,
method_name => 'protein_families_to_co_occurring_families');
}
my $ctx = $Bio::KBase::CDMI::Service::CallContext;
my($return);
#BEGIN protein_families_to_co_occurring_families
my $kb = $self->{db};
$return = {};
my $n = @$protein_families;
my $targets = "(" . ('?,' x $n); chop $targets; $targets .= ')';
my $constraint = "IsCoupledTo(from_link) IN $targets";
my @res = $kb->GetAll('IsCoupledTo Family',
$constraint,
$protein_families,
'IsCoupledTo(from_link) IsCoupledTo(to_link) IsCoupledTo(co_occurrence_evidence) Family(family_function)');
foreach my $tuple (grep { $_->[0] ne $_->[1] } @res)
{
my($from,$to,$sc,$func) = @$tuple;
if ($sc >= 10)
{
push(@{$return->{$from}},[$to,$sc,$func]);
}
}
#END protein_families_to_co_occurring_families
my @_bad_returns;
(ref($return) eq 'HASH') or push(@_bad_returns, "Invalid type for return variable \"return\" (value was \"$return\")");
if (@_bad_returns) {
my $msg = "Invalid returns passed to protein_families_to_co_occurring_families:\n" . join("", map { "\t$_\n" } @_bad_returns);
Bio::KBase::Exceptions::ArgumentValidationError->throw(error => $msg,
method_name => 'protein_families_to_co_occurring_families');
}
return($return);
}
=head2 co_occurrence_evidence
$return = $obj->co_occurrence_evidence($pairs_of_fids)
=over 4
lib/Bio/KBase/CDMI/CDMI_APIImpl.pm view on Meta::CPAN
=head2 role
=over 4
=item Description
The concept of "role" or "functional role" is basically an atomic functional unit.
The "function of a protein" is made up of one or more roles. That is, a bifunctional protein
with an assigned function of
5-Enolpyruvylshikimate-3-phosphate synthase (EC 2.5.1.19) / Cytidylate kinase (EC 2.7.4.14)
would implement two distinct roles (the "function1 / function2" notation is intended to assert
that the initial part of the protein implements function1, and the terminal part of the protein
implements function2). It is worth noting that a protein often implements multiple roles due
to broad specificity. In this case, we suggest describing the protein function as
function1 @ function2
That is the ' / ' separator is used to represent multiple roles implemented by distinct
domains of the protein, while ' @ ' is used to represent multiple roles implemented by
distinct domains.
=item Definition
=begin html
<pre>
a string
</pre>
=end html
=begin text
a string
=end text
=back
=head2 subsystem
=over 4
=item Description
A substem is composed of two components: a set of roles that are gathered to be annotated
simultaneously and a spreadsheet depicting the proteins within each genome that implement
the roles. The set of roles may correspond to a pathway, a complex, an inventory (say, "transporters")
or whatever other principle an annotator used to formulate the subsystem.
The subsystem spreadsheet is a list of "rows", each representing the subsytem in a specific genome.
Each row includes a variant code (indicating what version of the molecular machine exists in the
genome) and cells. Each cell is a 2-tuple:
[role,protein-encoding genes that implement the role in the genome]
Annotators construct subsystems, and in the process impose a controlled vocabulary
for roles and functions.
=item Definition
=begin html
<pre>
a string
</pre>
=end html
=begin text
a string
=end text
=back
=head2 variant
=over 4
=item Definition
=begin html
<pre>
a string
</pre>
=end html
=begin text
a string
=end text
=back
=head2 variant_of_subsystem
=over 4
( run in 2.369 seconds using v1.01-cache-2.11-cpan-13bb782fe5a )