Boulder

 view release on metacpan or  search on metacpan

Boulder/Blast.pm  view on Meta::CPAN

	Matrix		the matrix in use, e.g. BLOSUM62
	Ctxfactor	the value of the -ctxfactor argument
	Gapall		The value of the -gapall argument

=back

=head2 Information about the query sequence and subject database

Thse top-level tags give information about the query sequence and the
database that was searched on.

=over 4

=item Blast_query

The identifier for the search sequence, as defined by the
FASTA format.  This will be the first set of non-whitespace
characters following the ">" character.  In other words, the search
sequence "name".

=item Blast_query_length

The length of the query sequence, in base pairs.

=item Blast_db

The Unix filesystem path to the subject database.

=item Blast_db_title

The title of the subject database.

=back

=head2 The search results: the I<Blast_hits> tag.

Each BLAST hit is represented by the tag I<Blast_hits>.  There may be
zero, one, or many such tags.  They will be presented in reverse
sorted order of significance, i.e. most significant hit first.

Each I<Blast_hits> tag is a Stone subrecord containing the following
subtags:

=over 4

=item Name

The name/identifier of the sequence that was hit.

=item Length

The total length of the sequence that was hit

=item Signif

The significance of the hit.  If there are multiple HSPs in the hit,
this will be the most significant (smallest) value.

=item Identity

The percent identity of the hit.  If there are multiple HSPs, this
will be the one with the highest percent identity.

=item Expect

The expectation value for the hit.  If there are multiple HSPs, this
will be the lowest expectation value in the set.

=item Hsps

One or more sub-sub-tags, pointing to a nested record containing
information about each high-scoring segment pair (HSP).  See the next
section for details.

=back

=head2 The Hsp records: the I<Hsps> tag

Each I<Blast_hit> tag will have at least one, and possibly several
I<Hsps> tags, each one corresponding to a high-scoring segment pair
(HSP).  These records contain detailed information about the hit,
including the alignments.  Tags are as follows:

=over 4

=item Signif

The significance (P value) of this HSP.

=item Bits

The number of bits of significance.

=item Expect

Expectation value for this HSP.

=item Identity

Percent identity.
	
=item Positives

Percent positive matches.

=item Score

The Smith-Waterman alignment score.

=item Orientation

The word "plus" or "minus".  This tag is only present for nucleotide
searches, when the reverse complement match may be present.

=item Strand

Depending on algorithm used, indicates complementarity of match and
possibly the reading frame.  This is copied out of the blast report.
Possibilities include:

 "Plus / Minus" "Plus / Plus" -- blastn algorithm
 "+1 / -2" "+2 / -2"	     -- blastx, tblastx

Boulder/Blast.pm  view on Meta::CPAN


Position at which the HSP starts in the query sequence (1-based
indexing).

=item Query_end

Position at which the HSP stops in the query sequence.

=item Subject_start

Position at which the HSP starts in the subject (target) sequence.

=item Subject_end

Position at which the HSP stops in the subject (target) sequence.

=item Query, Subject, Alignment

These three tags contain strings which, together, create the gapped
alignment of the query sequence with the subject sequence.

For example, to print the alignment of the first HSP of the first
match, you might say:

  $hsp = $blast->Blast_hits->Hsps;
  print join("\n",$hsp->Query,$hsp->Alignment,$hsp->Subject),"\n";

=back

See the bottom of this manual page for an example BLAST run.

=head1 CAVEATS

This module has been extensively tested with WUBLAST, but very little
with NCBI BLAST.  It probably will not work with PSI Blast or other
variants.

The author plans to adapt this module to parse other formats, as well
as non-BLAST formats such as the output of Fastn.

=head1 SEE ALSO

L<Boulder>, L<Boulder::GenBank>

=head1 AUTHOR

Lincoln Stein <lstein@cshl.org>.

Copyright (c) 1998-1999 Cold Spring Harbor Laboratory

This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself.  See DISCLAIMER.txt for
disclaimers of warranty.

=head1 EXAMPLE BLASTN RUN

This output was generated by the I<quickblast.pl> program, which is
located in the F<eg/> subdirectory of the I<Boulder> distribution
directory.  It is a typical I<blastn> (nucleotide->nucleotide) run;
however long lines (usually DNA sequences) have been truncated.  Also
note that per the Boulder protocol, the percent sign (%) is escaped in
the usual way.  It will be unescaped when reading the stream
back in.

 Blast_run_date=Fri Nov  6 14:40:41 1998
 Blast_db_date=2:40 PM EST Nov 6, 1998
 Blast_parms={
   Hspmax=10
   Expectation=10
   Matrix=+5,-4
   Ctxfactor=2.00
 }
 Blast_program_date=05-Feb-1998
 Blast_db= /usr/tmp/quickblast18202aaaa
 Blast_version=2.0a19-WashU
 Blast_query=BCD207R
 Blast_db_title= test.fasta
 Blast_query_length=332
 Blast_program=blastn
 Blast_hits={
   Signif=3.5e-74
   Expect=3.5e-74,
   Name=BCD207R
   Identity=100%25
   Length=332
   Hsps={
     Subject=GTGCTTTCAAACATTGATGGATTCCTCCCCTTGACATATATATATACTTTGGGTTCCCGCAA...
     Signif=3.5e-74
     Length=332
     Bits=249.1
     Query_start=1
     Subject_end=332
     Query=GTGCTTTCAAACATTGATGGATTCCTCCCCTTGACATATATATATACTTTGGGTTCCCGCAA...
     Positives=100%25
     Expect=3.5e-74,
     Identity=100%25
     Query_end=332
     Orientation=plus
     Score=1660
     Strand=Plus / Plus
     Subject_start=1
     Alignment=||||||||||||||||||||||||||||||||||||||||||||||||||||||||||...
   }
 }
 =

=head1 Example BLASTP run

Here is the output from a typical I<blastp> (protein->protein) run.
Long lines have again been truncated.

 Blast_run_date=Fri Nov  6 14:37:23 1998
 Blast_db_date=2:36 PM EST Nov 6, 1998
 Blast_parms={
   Hspmax=10
   Expectation=10
   Matrix=BLOSUM62
   Ctxfactor=1.00
 }
 Blast_program_date=05-Feb-1998
 Blast_db= /usr/tmp/quickblast18141aaaa



( run in 1.687 second using v1.01-cache-2.11-cpan-5a3173703d6 )