view release on metacpan or search on metacpan
lib/Bio/EBI/RNAseqAPI.pm view on Meta::CPAN
=head1 NAME
Bio::EBI::RNAseqAPI - A Perl interface to the EMBL-EBI RNA-seq analysis API.
=head1 DESCRIPTION
This module provides a Perl-based interface to the L<EMBL-EBI|http://www.ebi.ac.uk> L<RNA-seq analysis API|http://www.ebi.ac.uk/fg/rnaseq/api/>.
The RNA-seq Analysis API enables access to analysis results for thousands of
publicly available gene expression datasets. This module provides functions to
access each endpoint provided by the API.
For more information about the API, see its L<documentation|http://www.ebi.ac.uk/fg/rnaseq/api/doc>.
=head1 SYNOPSIS
use 5.10.0;
use Bio::EBI::RNAseqAPI;
my $rnaseqAPI = Bio::EBI::RNAseqAPI->new;
lib/Bio/EBI/RNAseqAPI.pm view on Meta::CPAN
Analysis information for each run is returned in an anonymous hash. Some
functions return anonymous arrays with one anonymous hash per run found. See
below for examples and more information about the results.
=over 2
=item B<get_run>
Accesses the API's C<getRun> JSON endpoint and returns analysis information for
a single run, passed in the arguments.
Arguments should be passed as a hash containing values for "run" and
"minimum_mapped_reads", e.g.:
my $runInfo = $rnaseqAPI->get_run(
run => "ERR030885",
minimum_mapped_reads => 0
);
lib/Bio/EBI/RNAseqAPI.pm view on Meta::CPAN
);
return;
}
}
=item B<get_runs_by_list>
This function takes an anonymous array of run accessions and sequentially
accesses the API's C<getRun> JSON endpoint to collect the analysis information
for each run in the list provided.
my $runInfo = $rnaseqAPI->get_runs_by_list(
runs => [ "ERR030885", "ERR030886" ],
minimum_mapped_reads => 0
);
Run analysis information is returned as an anonymous array containing one
anonymous hash per run (see L</get_run> documentation for an example of what the
anonymous hash looks like). Returns C<undef> (and logs errors) if errors are
lib/Bio/EBI/RNAseqAPI.pm view on Meta::CPAN
return;
}
return \@allRunInfo;
}
=item B<get_runs_by_study>
Accesses the API's C<getRunsByStudy> JSON endpoint, and returns an anonymous array
containing an anonymous hash for each run found (see L</get_run> docs for an example).
my $runInfo = $rnaseqAPI->get_runs_by_study(
study => "E-MTAB-513",
minimum_mapped_reads => 0
);
Study accession can be either an L<ENA|http://www.ebi.ac.uk/ena>,
L<SRA|http://www.ncbi.nlm.nih.gov/sra>, L<DDBJ|http://www.ddbj.nig.ac.jp/> or
L<ArrayExpress|http://www.ebi.ac.uk/arrayexpress> study accession. The example
lib/Bio/EBI/RNAseqAPI.pm view on Meta::CPAN
$logger->error(
"Problem retrieving runs for ",
$args{ "study" }
);
}
}
=item B<get_runs_by_organism>
Accesses the API's C<getRunsByOrganism> JSON endpoint, and returns an anonymous
array containing an anonymous hash for each run found.
my $runInfo = $rnaseqAPI->get_runs_by_organism(
organism => "homo_sapiens",
minimum_mapped_reads => 70
);
Value for "organism" attribute is a species scientific name, in lower case,
with underscores instead of spaces. E.g. "homo_sapiens",
"canis_lupus_familiaris", "oryza_sativa_japonica_group". To ensure your
lib/Bio/EBI/RNAseqAPI.pm view on Meta::CPAN
$logger->error(
"Problem retrieving runs for ",
$args{ "organism" }
);
}
}
=item B<get_runs_by_organism_condition>
Accesses the API's C<getRunsByOrganismCondition> JSON endpoint, and returns an
anonymous array containing an anonymous hash for each run found. An organism
name and a "condition" -- meaning a sample attribute -- are passed in the
arguments. The condition must exist in the L<Experimental Factor Ontology (EFO)|http://www.ebi.ac.uk/efo>; this can
be checked via the EFO website or via the L<Ontology Lookup Service (OLS) API|http://www.ebi.ac.uk/ols/docs/api>.
my $runInfo = $rnaseqAPI->get_runs_by_organism_condition(
organism => "homo_sapiens",
condition => "central nervous system",
minimum_mapped_reads => 70
lib/Bio/EBI/RNAseqAPI.pm view on Meta::CPAN
=back
=head2 Analysis results per study
These functions take an L<ENA|http://www.ebi.ac.uk/ena>, L<SRA|http://www.ncbi.nlm.nih.gov/sra>, L<DDBJ|http://www.ddbj.nig.ac.jp/> or L<ArrayExpress|http://www.ebi.ac.uk/arrayexpress> accession or species name and return information about the corres...
=over 2
=item B<get_study>
Accesses the API's C<getStudy> JSON endpoint. Single argument is a study
accession (L<ENA|http://www.ebi.ac.uk/ena>, L<SRA|http://www.ncbi.nlm.nih.gov/sra>, L<DDBJ|http://www.ddbj.nig.ac.jp/> or L<ArrayExpress|http://www.ebi.ac.uk/arrayexpress>). Returns an anonymous hash
containing the results for the matching study. Returns C<undef> (and logs
errors) if errors are encountered. If you try an ArrayExpress accession and it
doesn't work, try the corresponding sequencing archive study accession instead.
my $studyInfo = $rnaseqAPI->get_study( "SRP033494" );
An example of the anonymous hash returned is as follows:
{
lib/Bio/EBI/RNAseqAPI.pm view on Meta::CPAN
$logger->error(
"Problem retrieving study ",
$studyAcc
);
}
}
=item B<get_studies_by_organism>
Accesses the API's C<getStudiesByOrganism> JSON endpoint. Single argument is the
name of an organism (see the L</run_organism_list> attribute for allowed names).
Returns an anonymous array containing one anonymous hash per study found. See
L</get_study> docs for an example of an anonymous hash.
my $studies = $rnaseqAPI->get_studies_by_organism( "arabidopsis_thaliana" );
=cut
sub get_studies_by_organism {
lib/Bio/EBI/RNAseqAPI.pm view on Meta::CPAN
These functions return information about the sample attributes that runs are
annotated with. Sample attributes have a "type", e.g. "organism", and a
"value", e.g. "Homo sapiens". Where possible, the URI of the matching ontology term
is also annotated.
=over 2
=item B<get_sample_attributes_by_run>
Accesses the API's C<getSampleAttributesByRun> JSON endpoint. Single argument is
the accession of the run. Returns an anonymous array containing one anonymous
hash per sample attribute found.
my $sampleAttributes = $rnaseqAPI->get_sample_attributes_by_run( "SRR805786" );
An example of the results returned is as follows:
[
{
'VALUE' => 'peripheral blood mononuclear cells (PBMCs)',
lib/Bio/EBI/RNAseqAPI.pm view on Meta::CPAN
$logger->error(
"Problem retrieving sample attributes for run ",
$runAcc
);
}
}
=item B<get_sample_attributes_per_run_by_study>
Accesses the API's C<getSampleAttributesPerRunByStudy> JSON endpoint. Single
argument is a study accession. Returns an array ref containing one anonymous
hash per sample attribute. See L</get_sample_attributes_by_run> docs for an
example. Returns C<undef> (and logs errors) if errors are encountered.
my $sampleAttributes = $rnaseqAPI->get_sample_attributes_per_run_by_study( "DRP000391" );
=cut
sub get_sample_attributes_per_run_by_study {
lib/Bio/EBI/RNAseqAPI.pm view on Meta::CPAN
$logger->error(
"Problem retrieving sample attributes for ",
$studyAcc
);
}
}
=item B<get_sample_attributes_coverage_by_study>
Accesses the API's C<getSampleAttributesCoverageByStudy> endpoint. Single argument
is a study accession. Returns an anonymous array containing one anonymous hash
per sample attribute. Returns C<undef> (and logs errors) if errors are
encountered.
my $sampleAttributeCoverage = $rnaseqAPI->get_sample_attributes_coverage_by_study( "DRP000391" );
An example of the results is as follows:
[
{
lib/Bio/EBI/RNAseqAPI.pm view on Meta::CPAN
}
=back
=head2 Baseline gene expression per tissue, cell type, developmental stage, sex, and strain
=over 2
=item B<get_expression_by_organism_genesymbol>
Accesses the API's C<getExpression> endpoint. Provide arguments as a hash,
passing an organism name and a gene symbol, as well as a value for the minimum
percentage of mapped reads to allow:
my $geneExpressionInfo = $rnaseqAPI->get_expression(
minimum_mapped_reads => 0,
organism => "oryza_sativa",
gene_symbol => "BURP7"
);
Results are returned as an anonymous array of anonymous hashes, with one
lib/Bio/EBI/RNAseqAPI.pm view on Meta::CPAN
"\" in organism \"",
$args{ "organism" }
);
}
}
=item B<get_expression_by_gene_id>
Accesses the API's C<getExpression> endpoint, but instead of querying by
organism and gene symbol (see L</get_expression_by_organism_genesymbol>), this
function queries by gene identifier. Also expects a value for the minimum
percentage of mapped reads to allow.
my $geneExpressionInfo = $rnaseqAPI->get_expression(
gene_identifer => "ENSG00000172023",
minimum_mapped_reads => 0
);
Results are returned as an anonymous array of anonymous hashes, with one
lib/Bio/EBI/RNAseqAPI.pm view on Meta::CPAN
# User Agent builder.
sub _build_user_agent {
my $userAgent = LWP::UserAgent->new;
$userAgent->env_proxy;
return $userAgent;
}
# Run organisms list builder. This is built by accessing the API's endpoints for
# the various genome reference resources it uses. Most of these are the
# divisions of Ensembl (http://www.ensembl.org and http://ensemblgenomes.org)
# -- core, plants, fungi, metazoa, and protists, as well as WormBase ParaSite
# (http://parasite.wormbase.org/). The endpoint for each resource provides
# key-value pairs of sample organism and reference organism, the reference
# organism being the name of the reference genome that was used in the
# alignment of RNA-seq reads, and the sample organism being the species the RNA
# sample was taken from. Here we collect all the sample organisms, add them as
# keys in an anonymous hash (pointing at 1), and return the anonymous hash.
sub _build_run_organism_list {
my ( $self ) = @_;
my $logger = $self->get_log_writer;