Bio-Grep
view release on metacpan or search on metacpan
lib/Bio/Grep.pm view on Meta::CPAN
To start the back-end with the specified settings, simply call
$sbe->search();
This method also accepts an hash reference with settings. In this case, all
previous defined options except all paths and the database are set to their
default values.
$sbe->search({ mismatches => 2,
reverse_complement => 0,
query => 'AGAGCCCT' });
=head2 ANALYZE SEARCH RESULTS
Use such a L<Bio::Perl> like while loop to analyze the search results.
while ( my $res = $sbe->next_res ) {
print $res->sequence->id . "\n";
print $res->alignment_string() . "\n\n";
}
See L<Bio::Grep::SearchResult> for all available information.
=head1 BGREP
This distribution comes with a sample script called L<bgrep>.
=head1 WHICH BACK-END?
We support these external back-ends:
=over
=item C<Vmatch>
L<http://vmatch.de/>
=item C<Agrep>
L<ftp://ftp.cs.arizona.edu/agrep/> (original Wu-Manber 1992 implementation for
UNIX),
L<http://www.tgries.de/agrep/> (DOS, Windows, OS/2),
L<http://webglimpse.net/download.php> (Agrep binary of C<Glimpse>) and
L<http://laurikari.net/tre/download.html> (TRE implementation).
=item C<GUUGle>
L<http://bibiserv.techfak.uni-bielefeld.de/guugle/>
=back
=head2 FEATURE COMPARISON
=begin html
<table><tr><th>Feature</th><th>Agrep</th><th>GUUGle</th><th>RE</th><th>Vmatch</th></tr><tr><td>Suffix Arrays/Trees</td>
<td style="text-align:center;background-color: #ffe0e0;">no</td>
<td style="font-weight: bold;text-align: center;background-color: #00ff00;">yes</td>
<td style="text-align:center;background-color: #ffe0e0;">no</td>
<td style="font-weight: bold;text-align: center;background-color: #00ff00;">yes</td>
</tr>
<tr><td>Sliding Window</td>
<td style="font-weight: bold;text-align: center;background-color: #00ff00;">yes</td>
<td style="text-align:center;background-color: #ffe0e0;">no</td>
<td style="font-weight: bold;text-align: center;background-color: #00ff00;">yes</td>
<td style="text-align:center;background-color: #ffe0e0;">no</td>
</tr>
<tr><td>Persistent Index<sup>1</sup></td>
<td style="text-align:center;background-color: #ffe0e0;">no</td>
<td style="text-align:center;background-color: #ffe0e0;">no</td>
<td style="text-align:center;background-color: #ffe0e0;">no</td>
<td style="font-weight: bold;text-align: center;background-color: #00ff00;">yes</td>
</tr>
<tr><td>Mismatches</td>
<td style="font-weight: bold;text-align: center;background-color: #00ff00;">yes</td>
<td style="text-align:center;background-color: #ffe0e0;">no</td>
<td style="text-align:center;background-color: #ffe0e0;">no</td>
<td style="font-weight: bold;text-align: center;background-color: #00ff00;">yes</td>
</tr>
<tr><td>Edit Distance</td>
<td style="font-weight: bold;text-align: center;background-color: #00ff00;">yes</td>
<td style="text-align:center;background-color: #ffe0e0;">no</td>
<td style="text-align:center;background-color: #ffe0e0;">no</td>
<td style="font-weight: bold;text-align: center;background-color: #00ff00;">yes</td>
</tr>
<tr><td>Insertions</td>
<td style="text-align:center;background-color: #ffe0e0;">no</td>
<td style="text-align:center;background-color: #ffe0e0;">no</td>
<td style="text-align:center;background-color: #ffe0e0;">no</td>
<td style="text-align:center;background-color: #ffe0e0;">no</td>
</tr>
<tr><td>Deletions</td>
<td style="text-align:center;background-color: #ffe0e0;">no</td>
<td style="text-align:center;background-color: #ffe0e0;">no</td>
<td style="text-align:center;background-color: #ffe0e0;">no</td>
<td style="text-align:center;background-color: #ffe0e0;">no</td>
</tr>
<tr><td>Multiple Queries<sup>2</sup></td>
<td style="text-align:center;background-color: #ffe0e0;">no</td>
<td style="font-weight: bold;text-align: center;background-color: #00ff00;">yes</td>
<td style="text-align:center;background-color: #ffe0e0;">no</td>
<td style="font-weight: bold;text-align: center;background-color: #00ff00;">yes</td>
</tr>
<tr><td>GU</td>
<td style="text-align:center;background-color: #ffe0e0;">no</td>
<td style="font-weight: bold;text-align: center;background-color: #00ff00;">yes</td>
<td style="text-align:center;background-color: #ffe0e0;">no</td>
<td style="text-align:center;background-color: #ffe0e0;">no</td>
</tr>
<tr><td>DNA/RNA</td>
<td style="font-weight: bold;text-align: center;background-color: #00ff00;">yes</td>
<td style="font-weight: bold;text-align: center;background-color: #00ff00;">yes</td>
<td style="font-weight: bold;text-align: center;background-color: #00ff00;">yes</td>
<td style="font-weight: bold;text-align: center;background-color: #00ff00;">yes</td>
</tr>
<tr><td>Protein</td>
<td style="font-weight: bold;text-align: center;background-color: #00ff00;">yes</td>
<td style="text-align:center;background-color: #ffe0e0;">no</td>
<td style="font-weight: bold;text-align: center;background-color: #00ff00;">yes</td>
<td style="font-weight: bold;text-align: center;background-color: #00ff00;">yes</td>
</tr>
<tr><td>Direct and Revcom</td>
<td style="text-align:center;background-color: #ffe0e0;">no</td>
<td style="font-weight: bold;text-align: center;background-color: #00ff00;">yes</td>
<td style="font-weight: bold;text-align: center;background-color: #00ff00;">yes</td>
<td style="font-weight: bold;text-align: center;background-color: #00ff00;">yes</td>
</tr>
<tr><td>Reverse Complement</td>
<td style="font-weight: bold;text-align: center;background-color: #00ff00;">yes</td>
<td style="font-weight: bold;text-align: center;background-color: #00ff00;">yes</td>
<td style="font-weight: bold;text-align: center;background-color: #00ff00;">yes</td>
<td style="font-weight: bold;text-align: center;background-color: #00ff00;">yes</td>
</tr>
<tr><td>Upstream/Downstream</td>
<td style="text-align:center;background-color: #ffe0e0;">no</td>
<td style="font-weight: bold;text-align: center;background-color: #00ff00;">yes</td>
<td style="font-weight: bold;text-align: center;background-color: #00ff00;">yes</td>
<td style="font-weight: bold;text-align: center;background-color: #00ff00;">yes</td>
</tr>
<tr><td>Filters</td>
<td style="text-align:center;background-color: #ffe0e0;">no</td>
<td style="font-weight: bold;text-align: center;background-color: #00ff00;">yes</td>
<td style="font-weight: bold;text-align: center;background-color: #00ff00;">yes</td>
<td style="font-weight: bold;text-align: center;background-color: #00ff00;">yes</td>
</tr>
<tr><td>Query Length<sup>3</sup></td>
<td style="text-align:center;background-color: #ffe0e0;">no</td>
<td style="font-weight: bold;text-align: center;background-color: #00ff00;">yes</td>
<td style="text-align:center;background-color: #ffe0e0;">no</td>
<td style="font-weight: bold;text-align: center;background-color: #00ff00;">yes</td>
</tr>
<tr><td>Regular Expressions<sup>4</sup></td>
<td style="text-align:center;background-color: #ffe0e0;">no</td>
<td style="text-align:center;background-color: #ffe0e0;">no</td>
<td style="font-weight: bold;text-align: center;background-color: #00ff00;">yes</td>
<td style="text-align:center;background-color: #ffe0e0;">no</td>
</tr>
</table><br/><div style="font-size: smaller"><hr width="300"
align="left"><sup>1</sup>Needs pre-calculation and (much) more memory but queries are in general faster<br/><sup>2</sup>With query_file<br/><sup>3</sup>Matches if a substring of the query of size n or larger matches<br/><sup>4</sup>Agrep soon</div>
=end html
=begin man
Features || Agrep | GUUGle | RE | Vmatch
Suffix Arrays/Trees || no | yes | no | yes
Sliding Window || yes | no | yes | no
Persistent Index 1 || no | no | no | yes
Mismatches || yes | no | no | yes
Edit Distance || yes | no | no | yes
Insertions || no | no | no | no
Deletions || no | no | no | no
Multiple Queries 2 || no | yes | no | yes
GU || no | yes | no | no
DNA/RNA || yes | yes | yes | yes
Protein || yes | no | yes | yes
Direct and Revcom || no | yes | yes | yes
Reverse Complement || yes | yes | yes | yes
Upstream/Downstream || no | yes | yes | yes
Filters || no | yes | yes | yes
Query Length 3 || no | yes | no | yes
Regular Expressions 4 || no | no | yes | no
--
1 Needs pre-calculation and (much) more memory but queries are in general faster
2 With query_file
3 Matches if a substring of the query of size n or larger matches
4 Agrep soon
=end man
C<Vmatch> is fast but needs a lot of memory. C<Agrep> is the best choice if
you allow many mismatches in short sequences, if you want to search in Fasta
files with relatively short sequences (e.g CDNA or Protein databases) and if
you are only interested in which sequences the approximate match was found.
Its performance is in this case amazing. If you want the exact positions of a
match in the sequence, choose C<Vmatch>. If you want nice alignments, choose
C<Vmatch> too (C<EMBOSS> can automatically align the sequence and the query in
the C<Agrep> back-end, but then C<Vmatch> is faster). Filters require exact
positions, so you can't use them with C<Agrep>. This may change in future
version or not. The C<Agrep> implementation of the C<TRE> library
(L<http://laurikari.net/tre/>) is also supported. This implementation has less
limitations and more features (e.g. you get the exact hit positions) but is
much slower. See L<Bio::Grep::Benchmarks>.
C<GUUGle> may be the best choice if you have RNA queries (counts GU as no
mismatch) and if you are interested in only exact matches. Another
solution here would be to use C<Vmatch> and write a filter (see next section)
that only allows GU mismatches. Of course, this is only an alternative if you
can limit (C<$sbe-E<gt>settings-E<gt>mismatches()>) the maximal number of GU
mismatches. C<Vmatch> with its pre-calculated suffix arrays is really fast, so
you should consider this option.
Perl regular expressions are available in the C<RE> back-end. It is a very
( run in 0.494 second using v1.01-cache-2.11-cpan-cdf2f3d4e48 )