view release on metacpan or search on metacpan
0.0.12 2017-02-22T17:56:27Z
- `anchr orient`
- Use tempdir as basename of dazz db
0.0.11 2017-02-21T08:54:07Z
- `anchr overlap2`: calling `daligner` to detect overlaps between two files
- Document paralogs
- Switch to k-unitigs
- Add --start for dazzname
- Use `faops replace` to rename headers in FA files
0.0.10 2017-02-17T08:38:42Z
- Discard any reads with subs
- Use superReadSequences.fasta or k_unitigs.fasta
0.0.9 2017-02-14T20:59:08Z
- Install quorum and picard via Linuxbrew
- Move docs from sra
- Add quast in QA section
- `--noscythe` for trim
changed that file, and provided that you do at least ONE of the following:
a) place your modifications in the Public Domain or otherwise make them
Freely Available, such as by posting said modifications to Usenet or an
equivalent medium, or placing the modifications on a major archive site
such as ftp.uu.net, or by allowing the Copyright Holder to include your
modifications in the Standard Version of the Package.
b) use the modified Package only within your corporation or organization.
c) rename any non-standard executables so the names do not conflict with
standard executables, which must also be provided, and provide a separate
manual page for each non-standard executable that clearly documents how it
differs from the Standard Version.
d) make other distribution arrangements with the Copyright Holder.
4. You may distribute the programs of this Package in object code or executable
form, provided that you do at least ONE of the following:
a) distribute a Standard Version of the executables and library files,
t/17-break.t
t/18-quorum.t
t/19-kunitigs.t
t/1_4.2.restrict.tsv
t/1_4.anchor.fasta
t/1_4.ovlp.tsv
t/1_4.pac.fasta
t/1_4.pac.ovlp.tsv
t/1_4.pac.paf
t/1_4.pac.paf.ovlp.tsv
t/1_4.renamed.fasta
t/1_4.replace.tsv
t/1_4.restrict.tsv
t/1_4.show.txt
t/20-anchors.t
t/21-scaffold.t
t/22-paf2ovlp.t
t/23-trimlong.t
t/24_4.anchor.ovlp.tsv
t/24_4.ovlp.tsv
t/24_4.relation.tsv
Available commands:
commands: list the application's commands
help: display a command's help screen
anchors: selete anchors from k-unitigs or superreads
break: break long reads by anthors
contained: discard contained super-reads, k-unitigs, or anchors
cover: trusted regions in the first file covered by the second
dazzname: rename FASTA reads for dazz_db
dep: check or install dependances
group: group anthors by long reads
kunitigs: create k-unitigs from corrected reads
layout: layout anthor group
merge: merge overlapped super-reads, k-unitigs, or anchors
orient: orient overlapped sequences to the same strand
overlap: detect overlaps by daligner
overlap2: detect overlaps between two (large) files by daligner
quorum: Run quorum to discard bad reads
replace: replace IDs in .ovlp.tsv
doc/bacteria_2_3.md view on Meta::CPAN
Clear intermediate files.
```bash
BASE_DIR=$HOME/data/anchr/Sfle
cd ${BASE_DIR}
find . -type f -name "quorum_mer_db.jf" | xargs rm
find . -type f -name "k_u_hash_0" | xargs rm
find . -type f -name "readPositionsInSuperReads" | xargs rm
find . -type f -name "*.tmp" | xargs rm
find . -type f -name "pe.renamed.fastq" | xargs rm
find . -type f -name "pe.cor.sub.fa" | xargs rm
```
## Sfle: create anchors
```bash
BASE_DIR=$HOME/data/anchr/Sfle
cd ${BASE_DIR}
perl -e '
doc/bacteria_2_3.md view on Meta::CPAN
Clear intermediate files.
```bash
BASE_DIR=$HOME/data/anchr/Vpar
cd ${BASE_DIR}
find . -type f -name "quorum_mer_db.jf" | xargs rm
find . -type f -name "k_u_hash_0" | xargs rm
find . -type f -name "readPositionsInSuperReads" | xargs rm
find . -type f -name "*.tmp" | xargs rm
find . -type f -name "pe.renamed.fastq" | xargs rm
find . -type f -name "pe.cor.sub.fa" | xargs rm
```
## Vpar: create anchors
```bash
BASE_DIR=$HOME/data/anchr/Vpar
cd ${BASE_DIR}
perl -e '
doc/bacteria_2_3.md view on Meta::CPAN
Clear intermediate files.
```bash
BASE_DIR=$HOME/data/anchr/Lpne
cd ${BASE_DIR}
find . -type f -name "quorum_mer_db.jf" | xargs rm
find . -type f -name "k_u_hash_0" | xargs rm
find . -type f -name "readPositionsInSuperReads" | xargs rm
find . -type f -name "*.tmp" | xargs rm
find . -type f -name "pe.renamed.fastq" | xargs rm
find . -type f -name "pe.cor.sub.fa" | xargs rm
```
## Lpne: create anchors
```bash
BASE_DIR=$HOME/data/anchr/Lpne
cd ${BASE_DIR}
perl -e '
doc/bacteria_2_3.md view on Meta::CPAN
Clear intermediate files.
```bash
BASE_DIR=$HOME/data/anchr/Ngon
cd ${BASE_DIR}
find . -type f -name "quorum_mer_db.jf" | xargs rm
find . -type f -name "k_u_hash_0" | xargs rm
find . -type f -name "readPositionsInSuperReads" | xargs rm
find . -type f -name "*.tmp" | xargs rm
find . -type f -name "pe.renamed.fastq" | xargs rm
find . -type f -name "pe.cor.sub.fa" | xargs rm
```
## Ngon: create anchors
```bash
BASE_DIR=$HOME/data/anchr/Ngon
cd ${BASE_DIR}
perl -e '
doc/bacteria_2_3.md view on Meta::CPAN
Clear intermediate files.
```bash
BASE_NAME=Nmen
cd $HOME/data/anchr/${BASE_NAME}
find 2_illumina -type f -name "quorum_mer_db.jf" | xargs rm
find 2_illumina -type f -name "k_u_hash_0" | xargs rm
find 2_illumina -type f -name "*.tmp" | xargs rm
find 2_illumina -type f -name "pe.renamed.fastq" | xargs rm
find 2_illumina -type f -name "se.renamed.fastq" | xargs rm
find 2_illumina -type f -name "pe.cor.sub.fa" | xargs rm
```
* Stats of processed reads
```bash
BASE_NAME=Nmen
cd ${HOME}/data/anchr/${BASE_NAME}
REAL_G=2272360
doc/bacteria_2_3.md view on Meta::CPAN
Clear intermediate files.
```bash
BASE_DIR=$HOME/data/anchr/Bper
cd ${BASE_DIR}
find . -type f -name "quorum_mer_db.jf" | xargs rm
find . -type f -name "k_u_hash_0" | xargs rm
find . -type f -name "readPositionsInSuperReads" | xargs rm
find . -type f -name "*.tmp" | xargs rm
find . -type f -name "pe.renamed.fastq" | xargs rm
find . -type f -name "pe.cor.sub.fa" | xargs rm
```
## Bper: create anchors
```bash
BASE_DIR=$HOME/data/anchr/Bper
cd ${BASE_DIR}
perl -e '
doc/bacteria_2_3.md view on Meta::CPAN
Clear intermediate files.
```bash
BASE_NAME=Cdip
cd $HOME/data/anchr/${BASE_NAME}
find 2_illumina -type f -name "quorum_mer_db.jf" | xargs rm
find 2_illumina -type f -name "k_u_hash_0" | xargs rm
find 2_illumina -type f -name "*.tmp" | xargs rm
find 2_illumina -type f -name "pe.renamed.fastq" | xargs rm
find 2_illumina -type f -name "se.renamed.fastq" | xargs rm
find 2_illumina -type f -name "pe.cor.sub.fa" | xargs rm
```
* Stats of processed reads
```bash
BASE_NAME=Cdip
cd ${HOME}/data/anchr/${BASE_NAME}
REAL_G=2488635
doc/bacteria_2_3.md view on Meta::CPAN
Clear intermediate files.
```bash
BASE_NAME=Ftul
cd $HOME/data/anchr/${BASE_NAME}
find 2_illumina -type f -name "quorum_mer_db.jf" | xargs rm
find 2_illumina -type f -name "k_u_hash_0" | xargs rm
find 2_illumina -type f -name "*.tmp" | xargs rm
find 2_illumina -type f -name "pe.renamed.fastq" | xargs rm
find 2_illumina -type f -name "se.renamed.fastq" | xargs rm
find 2_illumina -type f -name "pe.cor.sub.fa" | xargs rm
```
* Stats of processed reads
```bash
BASE_NAME=Ftul
cd ${HOME}/data/anchr/${BASE_NAME}
REAL_G=1892775
doc/dazzler.md view on Meta::CPAN
## dazz_db
`DBsplit`
* If the `-x` option is set then all reads less than the given length are ignored.
* Each block is of size `-s` * 1 Mbp except for the last.
### Rename sequences for dazzler
Create two files, `renamed.fasta`, `stdout.replace.tsv`.
```bash
mkdir -p ~/data/test/dazzler
cd ~/data/test/dazzler
cat ~/data/anchr/iso_1/merge/anchor.merge.fasta \
| anchr dazzname stdin -o stdout \
| faops filter -l 0 stdin renamed.fasta
```
### Create and split DB
`myDB.db` and its hidden companions.
```bash
cd ~/data/test/dazzler
echo "Make the dazzler DB"
DBrm myDB
fasta2DB myDB renamed.fasta
DBdust myDB
# each block is of size 50 MB
DBsplit -s50 myDB
BLOCK_NUMBER=$(cat myDB.db | perl -nl -e '/^blocks\s+=\s+(\d+)/ and print $1')
echo ${BLOCK_NUMBER}
```
### Retrieve some records from DB
* If the `-n` option is set then the DNA sequence is **not** displayed
```bash
cd ~/data/test/dazzler
# headers
DBshow -n myDB 1-10 102 100-101
# sequences from the original file
faops some -l 0 renamed.fasta <(DBshow -n myDB 1-10 102 100-101 | sed 's/^>//') stdout
```
## daligner
`HPC.daligner`
* local alignments involving at least `-l` base pairs (default 1000)
* An average correlation rate of `-e` (default 70%) set to 80%
* The default number of threads is 4, set by `-T` option (power of 2)
* Set the `-t` parameter which suppresses the use of any *k*-mer that occurs more than *t* times in
doc/e_coli.md view on Meta::CPAN
Clear intermediate files.
```bash
BASE_NAME=e_coli
cd $HOME/data/anchr/${BASE_NAME}
find 2_illumina -type f -name "quorum_mer_db.jf" | xargs rm
find 2_illumina -type f -name "k_u_hash_0" | xargs rm
find 2_illumina -type f -name "*.tmp" | xargs rm
find 2_illumina -type f -name "pe.renamed.fastq" | xargs rm
find 2_illumina -type f -name "se.renamed.fastq" | xargs rm
find 2_illumina -type f -name "pe.cor.sub.fa" | xargs rm
```
* Stats of processed reads
```bash
BASE_NAME=e_coli
cd ${HOME}/data/anchr/${BASE_NAME}
REAL_G=4641652
doc/gage_b.md view on Meta::CPAN
* Clear intermediate files.
```bash
BASE_NAME=Bcer
cd $HOME/data/anchr/${BASE_NAME}
find 2_illumina -type f -name "quorum_mer_db.jf" | xargs rm
find 2_illumina -type f -name "k_u_hash_0" | xargs rm
find 2_illumina -type f -name "*.tmp" | xargs rm
find 2_illumina -type f -name "pe.renamed.fastq" | xargs rm
find 2_illumina -type f -name "se.renamed.fastq" | xargs rm
find 2_illumina -type f -name "pe.cor.sub.fa" | xargs rm
```
* kmergenie
```bash
BASE_NAME=Bcer
cd ${HOME}/data/anchr/${BASE_NAME}
mkdir -p 2_illumina/kmergenie
doc/gage_b.md view on Meta::CPAN
* Clear intermediate files.
```bash
BASE_NAME=Rsph
cd $HOME/data/anchr/${BASE_NAME}
find 2_illumina -type f -name "quorum_mer_db.jf" | xargs rm
find 2_illumina -type f -name "k_u_hash_0" | xargs rm
find 2_illumina -type f -name "*.tmp" | xargs rm
find 2_illumina -type f -name "pe.renamed.fastq" | xargs rm
find 2_illumina -type f -name "se.renamed.fastq" | xargs rm
find 2_illumina -type f -name "pe.cor.sub.fa" | xargs rm
```
* kmergenie
```bash
BASE_NAME=Rsph
cd ${HOME}/data/anchr/${BASE_NAME}
mkdir -p 2_illumina/kmergenie
doc/gage_b.md view on Meta::CPAN
* Clear intermediate files.
```bash
BASE_NAME=Mabs
cd $HOME/data/anchr/${BASE_NAME}
find 2_illumina -type f -name "quorum_mer_db.jf" | xargs rm
find 2_illumina -type f -name "k_u_hash_0" | xargs rm
find 2_illumina -type f -name "*.tmp" | xargs rm
find 2_illumina -type f -name "pe.renamed.fastq" | xargs rm
find 2_illumina -type f -name "se.renamed.fastq" | xargs rm
find 2_illumina -type f -name "pe.cor.sub.fa" | xargs rm
```
* kmergenie
```bash
BASE_NAME=Mabs
cd ${HOME}/data/anchr/${BASE_NAME}
mkdir -p 2_illumina/kmergenie
doc/gage_b.md view on Meta::CPAN
Clear intermediate files.
```bash
BASE_NAME=Vcho
cd $HOME/data/anchr/${BASE_NAME}
find 2_illumina -type f -name "quorum_mer_db.jf" | xargs rm
find 2_illumina -type f -name "k_u_hash_0" | xargs rm
find 2_illumina -type f -name "*.tmp" | xargs rm
find 2_illumina -type f -name "pe.renamed.fastq" | xargs rm
find 2_illumina -type f -name "se.renamed.fastq" | xargs rm
find 2_illumina -type f -name "pe.cor.sub.fa" | xargs rm
```
* Stats of processed reads
```bash
BASE_NAME=Vcho
cd ${HOME}/data/anchr/${BASE_NAME}
REAL_G=4033464
doc/gage_b.md view on Meta::CPAN
* Clear intermediate files.
```bash
BASE_NAME=MabsF
cd $HOME/data/anchr/${BASE_NAME}
find 2_illumina -type f -name "quorum_mer_db.jf" | xargs rm
find 2_illumina -type f -name "k_u_hash_0" | xargs rm
find 2_illumina -type f -name "*.tmp" | xargs rm
find 2_illumina -type f -name "pe.renamed.fastq" | xargs rm
find 2_illumina -type f -name "se.renamed.fastq" | xargs rm
find 2_illumina -type f -name "pe.cor.sub.fa" | xargs rm
```
## MabsF: down sampling
```bash
BASE_NAME=MabsF
REAL_G=5090491
cd ${HOME}/data/anchr/${BASE_NAME}
doc/gage_b.md view on Meta::CPAN
* Clear intermediate files.
```bash
BASE_NAME=RsphF
cd $HOME/data/anchr/${BASE_NAME}
find 2_illumina -type f -name "quorum_mer_db.jf" | xargs rm
find 2_illumina -type f -name "k_u_hash_0" | xargs rm
find 2_illumina -type f -name "*.tmp" | xargs rm
find 2_illumina -type f -name "pe.renamed.fastq" | xargs rm
find 2_illumina -type f -name "se.renamed.fastq" | xargs rm
find 2_illumina -type f -name "pe.cor.sub.fa" | xargs rm
```
* kmergenie
```bash
BASE_NAME=RsphF
cd ${HOME}/data/anchr/${BASE_NAME}
mkdir -p 2_illumina/kmergenie
doc/gage_b.md view on Meta::CPAN
* Clear intermediate files.
```bash
BASE_NAME=VchoF
cd $HOME/data/anchr/${BASE_NAME}
find 2_illumina -type f -name "quorum_mer_db.jf" | xargs rm
find 2_illumina -type f -name "k_u_hash_0" | xargs rm
find 2_illumina -type f -name "*.tmp" | xargs rm
find 2_illumina -type f -name "pe.renamed.fastq" | xargs rm
find 2_illumina -type f -name "se.renamed.fastq" | xargs rm
find 2_illumina -type f -name "pe.cor.sub.fa" | xargs rm
```
* kmergenie
```bash
BASE_NAME=VchoF
cd ${HOME}/data/anchr/${BASE_NAME}
mkdir -p 2_illumina/kmergenie
doc/masurca.md view on Meta::CPAN
âââ outputJoinedPairs.perl
âââ outputMatedReadsAsReverseComplement.perl
âââ outputRecordsNotOnList
âââ parallel
âââ putReadsIntoGroupsBasedOnSuperReads
âââ quorum
âââ quorum_create_database
âââ quorum_error_correct_reads
âââ recompute_astat_superreads.sh
âââ reduce_sr
âââ rename_filter_fastq
âââ rename_filter_fastq.pl
âââ reportReadsToExclude.perl
âââ restore_ns.pl
âââ reverse_complement
âââ runByDirectory
âââ run_ECR.sh
âââ runSRCA.pl
âââ sample_mate_pairs.pl
âââ samtools
âââ semaphore
âââ SOAPdenovo-127mer
doc/masurca.md view on Meta::CPAN
cd ~/zlc/Ecoli/anchorAlign
for id in 0_11 10_13 11_7 12_3 13_33 14_8 15_11 16_20 17_4 18_17 19_19 1_4 20_15 21_13 22_8 23_15 24_34 25_8 26_3 27_30 28_2 29_13 2_27 30_25 31_15 32_28 33_2 34_16 35_3 36_23 37_5 38_29 39_5 3_12 40_9 41_19 4_5 5_7 6_56 7_12 8_15 9_6;
do
bash ~/Scripts/cpan/App-Anchr/share/link_anchor.sh ${id}.anchor.fasta ${id}.pac.fasta ${id};
GROUP_COUNT=$(id=${id} perl -e '@p = split q{_}, $ENV{id}; print $p[1];')
perl ~/Scripts/cpan/App-Anchr/share/ovlp_layout.pl ${id}.ovlp.tsv --range "1-${GROUP_COUNT}"
done
# Exceeded memory bound: 502169772
#poa -preserve_seqorder -read_fasta 9_2.renamed.fasta -clustal 9_2.aln -hb ~/Scripts/sra/poa-blosum80.mat
#cp 9_2.renamed.fasta myDB.pp.fasta
#
#DBrm myDB
#fasta2DB myDB myDB.pp.fasta
#DBdust myDB
#
#if [ -e myDB.las ]; then
# rm myDB.las
#fi
#HPC.daligner myDB -v -M4 -e.70 -l1000 -s1000 -mdust > job.sh
#bash job.sh
#rm job.sh
#
#LA4Falcon -o myDB.db myDB.las 1-2
#
#perl ~/Scripts/sra/las2ovlp.pl 9_2.renamed.fasta <(LAshow -o myDB.db myDB.las 1)
#
#perl ~/Scripts/sra/las2ovlp.pl 9_2.renamed.fasta 9_2.show.txt -r 9_2.replace.tsv
# 3 5 10 8 4 9 7 2 11 6 1
perl ~/Scripts/egaz/sparsemem_exact.pl \
-f 0_11.renamed.fasta -g ~/data/dna-seq/e_coli/superreads/NC_000913.fa \
--length 500 -o 0_11.chr.tsv
perl ~/Scripts/sra/ovlp_layout.pl 0_11.ovlp.tsv --range 1-11
# 16 47 19 51 28 22 15 11 43 5 34 44 4 37 6 9 53 24 40 52 46 23 32 38 55 54 18 31 10 26 2 8 48 36 27 29 30 45 50 33 35 42 41 3 25 20 17 14 7 56 21 13 39 49 12 1
perl ~/Scripts/egaz/sparsemem_exact.pl \
-f 6_56.renamed.fasta -g ~/data/dna-seq/e_coli/superreads/NC_000913.fa \
--length 500 -o 6_56.chr.tsv
perl ~/Scripts/sra/ovlp_layout.pl 6_56.ovlp.tsv --range 1-56
# pip install pysam biopython
python ~/Scripts/sra/nanocorrect.py myDB all > corrected.fasta
```
doc/model_organisms.md view on Meta::CPAN
* Clear intermediate files.
```bash
BASE_NAME=s288c
cd $HOME/data/anchr/${BASE_NAME}
find 2_illumina -type f -name "quorum_mer_db.jf" | xargs rm
find 2_illumina -type f -name "k_u_hash_0" | xargs rm
find 2_illumina -type f -name "*.tmp" | xargs rm
find 2_illumina -type f -name "pe.renamed.fastq" | xargs rm
find 2_illumina -type f -name "se.renamed.fastq" | xargs rm
find 2_illumina -type f -name "pe.cor.sub.fa" | xargs rm
```
## s288c: down sampling
```bash
BASE_NAME=s288c
REAL_G=12157105
cd ${HOME}/data/anchr/${BASE_NAME}
doc/model_organisms.md view on Meta::CPAN
Clear intermediate files.
```bash
BASE_NAME=iso_1
cd $HOME/data/anchr/${BASE_NAME}
find 2_illumina -type f -name "quorum_mer_db.jf" | xargs rm
find 2_illumina -type f -name "k_u_hash_0" | xargs rm
find 2_illumina -type f -name "*.tmp" | xargs rm
find 2_illumina -type f -name "pe.renamed.fastq" | xargs rm
find 2_illumina -type f -name "se.renamed.fastq" | xargs rm
find 2_illumina -type f -name "pe.cor.sub.fa" | xargs rm
```
* Stats of processed reads
```bash
BASE_NAME=iso_1
cd ${HOME}/data/anchr/${BASE_NAME}
REAL_G=137567477
doc/model_organisms.md view on Meta::CPAN
* Clear intermediate files.
```bash
BASE_NAME=n2
cd $HOME/data/anchr/${BASE_NAME}
find 2_illumina -type f -name "quorum_mer_db.jf" | xargs rm
find 2_illumina -type f -name "k_u_hash_0" | xargs rm
find 2_illumina -type f -name "*.tmp" | xargs rm
find 2_illumina -type f -name "pe.renamed.fastq" | xargs rm
find 2_illumina -type f -name "se.renamed.fastq" | xargs rm
find 2_illumina -type f -name "pe.cor.sub.fa" | xargs rm
```
* kmergenie
```bash
BASE_NAME=n2
cd ${HOME}/data/anchr/${BASE_NAME}
mkdir -p 2_illumina/kmergenie
doc/model_organisms.md view on Meta::CPAN
* Clear intermediate files.
```bash
BASE_NAME=col_0
cd $HOME/data/anchr/${BASE_NAME}
find 2_illumina -type f -name "quorum_mer_db.jf" | xargs rm
find 2_illumina -type f -name "k_u_hash_0" | xargs rm
find 2_illumina -type f -name "*.tmp" | xargs rm
find 2_illumina -type f -name "pe.renamed.fastq" | xargs rm
find 2_illumina -type f -name "se.renamed.fastq" | xargs rm
find 2_illumina -type f -name "pe.cor.sub.fa" | xargs rm
```
* kmergenie
```bash
BASE_NAME=col_0
cd ${HOME}/data/anchr/${BASE_NAME}
mkdir -p 2_illumina/kmergenie
lib/App/Anchr.pm view on Meta::CPAN
Available commands:
commands: list the application's commands
help: display a command's help screen
anchors: selete anchors from k-unitigs or superreads
break: break long reads by anthors
contained: discard contained super-reads, k-unitigs, or anchors
cover: trusted regions in the first file covered by the second
dazzname: rename FASTA reads for dazz_db
dep: check or install dependances
group: group anthors by long reads
kunitigs: create k-unitigs from corrected reads
layout: layout anthor group
merge: merge overlapped super-reads, k-unitigs, or anchors
orient: orient overlapped sequences to the same strand
overlap: detect overlaps by daligner
overlap2: detect overlaps between two (large) files by daligner
quorum: Run quorum to discard bad reads
replace: replace IDs in .ovlp.tsv
lib/App/Anchr/Command/contained.pm view on Meta::CPAN
}
# record cwd, we'll return there
my $cwd = Path::Tiny->cwd;
my $tempdir = Path::Tiny->tempdir("anchr_contained_XXXXXXXX");
chdir $tempdir;
my $basename = $tempdir->basename();
$basename =~ s/\W+/_/g;
{ # filter short contigs then rename reads as there're duplicated names
for my $i ( 0 .. $#infiles ) {
my $cmd;
$cmd .= "faops filter -a $opt->{len} -l 0 $infiles[$i] stdout";
$cmd .= " | faops dazz -p $opt->{prefix}_$i stdin infile.$i.fasta";
App::Anchr::Common::exec_cmd( $cmd, { verbose => $opt->{verbose}, }, );
}
}
{ # overlaps
lib/App/Anchr/Command/dazzname.pm view on Meta::CPAN
package App::Anchr::Command::dazzname;
use strict;
use warnings;
use autodie;
use App::Anchr -command;
use App::Anchr::Common;
use constant abstract => 'rename FASTA reads for dazz_db';
sub opt_spec {
return (
[ "outfile|o=s", "output filename, [stdout] for screen" ],
[ "prefix=s", "prefix of names", { default => "read" }, ],
[ "start=i", "start index", { default => 1 }, ],
{ show_defaults => 1, }
);
}
lib/App/Anchr/Command/orient.pm view on Meta::CPAN
"faops order $infiles[$i] infile.$i.order.txt infile.$i.fasta",
{ verbose => $opt->{verbose}, },
);
}
}
{ # Preprocess reads to format them for dazzler
my $cmd = "cat";
$cmd .= sprintf " infile.%d.fasta", $_ for ( 0 .. $#infiles );
$cmd .= " | anchr dazzname stdin -o stdout";
$cmd .= " | faops filter -l 0 stdin renamed.fasta";
App::Anchr::Common::exec_cmd( $cmd, { verbose => $opt->{verbose}, } );
if ( !$tempdir->child("renamed.fasta")->is_file ) {
Carp::croak "Failed: create renamed.fasta\n";
}
if ( !$tempdir->child("stdout.replace.tsv")->is_file ) {
Carp::croak "Failed: create stdout.replace.tsv\n";
}
}
{ # overlaps
my $cmd;
$cmd .= "anchr overlap renamed.fasta";
$cmd .= " --len $opt->{len} --idt $opt->{idt} --parallel $opt->{parallel}";
$cmd .= " -o renamed.ovlp.tsv";
App::Anchr::Common::exec_cmd( $cmd, { verbose => $opt->{verbose}, } );
if ( !$tempdir->child("renamed.ovlp.tsv")->is_file ) {
Carp::croak "Failed: create renamed.ovlp.tsv\n";
}
}
# filter overlaps
if ( $opt->{restrict} ) {
my $cmd;
$cmd .= "anchr replace renamed.ovlp.tsv stdout.replace.tsv -o stdout";
$cmd .= " | anchr restrict stdin $opt->{restrict} -o stdout";
$cmd .= " | anchr replace stdin stdout.replace.tsv -r -o stdout";
$cmd .= " > restrict.ovlp.tsv";
App::Anchr::Common::exec_cmd( $cmd, { verbose => $opt->{verbose}, } );
if ( !$tempdir->child("restrict.ovlp.tsv")->is_file ) {
Carp::croak "Failed: create restrict.ovlp.tsv\n";
}
}
my $graph = Graph->new( directed => 0 );
{ # Build ovlp graph
my @lines;
if ( $tempdir->child("restrict.ovlp.tsv")->is_file ) {
@lines = $tempdir->child("restrict.ovlp.tsv")->lines( { chomp => 1, } );
}
else {
@lines = $tempdir->child("renamed.ovlp.tsv")->lines( { chomp => 1, } );
}
# load strands
for my $line (@lines) {
my $info = App::Anchr::Common::parse_ovlp_line($line);
# ignore self overlapping
next if $info->{f_id} eq $info->{g_id};
# ignore poor overlaps
lib/App/Anchr/Command/orient.pm view on Meta::CPAN
my @negs;
for my $i ( sort $graph->vertices ) {
my $i_strand = $graph->get_vertex_attribute( $i, q{strand} );
push @negs, $i if ( defined $i_strand and $i_strand == 1 );
}
$tempdir->child("rc.list")->spew( map {"$_\n"} @negs );
my $cmd;
$cmd .= "faops rc -l 0 -n -f rc.list";
$cmd .= " renamed.fasta renamed.rc.fasta";
App::Anchr::Common::exec_cmd( $cmd, { verbose => $opt->{verbose}, } );
if ( !$tempdir->child("renamed.rc.fasta")->is_file ) {
Carp::croak "Failed: create renamed.rc.fasta\n";
}
}
{ # Outputs. stdout is handeld by faops
my $cmd;
$cmd .= "faops replace -l 0 renamed.rc.fasta stdout.replace.tsv";
$cmd .= " $opt->{outfile}";
App::Anchr::Common::exec_cmd( $cmd, { verbose => $opt->{verbose}, } );
}
chdir $cwd;
}
1;
lib/App/Anchr/Command/overlap.pm view on Meta::CPAN
my $tempdir = Path::Tiny->tempdir("anchr_ovlp_XXXXXXXX");
chdir $tempdir;
my $basename = $tempdir->basename();
$basename =~ s/\W+/_/g;
{ # Preprocess reads to format them for dazzler
my $cmd = "cat";
$cmd .= sprintf " %s", $_ for @infiles;
$cmd .= " | anchr dazzname stdin -o stdout";
$cmd .= " | faops filter -l 0 stdin renamed.fasta";
App::Anchr::Common::exec_cmd( $cmd, { verbose => $opt->{verbose}, } );
if ( !$tempdir->child("stdout.replace.tsv")->is_file ) {
Carp::croak "Failed: create stdout.replace.tsv\n";
}
}
{ # Make the dazzler DB, each block is of size 50 MB
my $cmd;
$cmd .= "fasta2DB $basename renamed.fasta";
$cmd .= " && DBdust $basename";
$cmd .= " && DBsplit -s50 $basename";
App::Anchr::Common::exec_cmd( $cmd, { verbose => $opt->{verbose}, } );
if ( !$tempdir->child("$basename.db")->is_file ) {
Carp::croak "Failed: fasta2DB\n";
}
}
{ # Run daligner
lib/App/Anchr/Command/overlap.pm view on Meta::CPAN
my $cmd = "LAshow -o $basename.db $basename.las > show.txt";
if ( $opt->{all} ) {
$cmd = "LAshow $basename.db $basename.las > show.txt";
}
App::Anchr::Common::exec_cmd( $cmd, { verbose => $opt->{verbose}, } );
if ( !$tempdir->child("show.txt")->is_file ) {
Carp::croak "Failed: LAshow\n";
}
$cmd = "anchr show2ovlp renamed.fasta show.txt";
if ( !$opt->{serial} ) {
$cmd .= " -r stdout.replace.tsv";
}
$cmd .= " -o $opt->{outfile}";
App::Anchr::Common::exec_cmd( $cmd, { verbose => $opt->{verbose}, } );
}
chdir $cwd;
}
lib/App/Anchr/Command/quorum.pm view on Meta::CPAN
NUM_THREADS=[% opt.parallel %]
save NUM_THREADS
#----------------------------#
# Renaming reads
#----------------------------#
log_info 'Processing pe and/or se library reads'
rm -rf meanAndStdevByPrefix.pe.txt
echo 'pe [% opt.size %] [% opt.std %]' >> meanAndStdevByPrefix.pe.txt
if [ ! -e pe.renamed.fastq ]; then
rename_filter_fastq \
'pe' \
<(exec expand_fastq '[% args.0 %]' ) \
<(exec expand_fastq '[% args.1 %]' ) \
> 'pe.renamed.fastq'
fi
[% IF args.2 -%]
echo 'se [% opt.size %] [% opt.std %]' >> meanAndStdevByPrefix.pe.txt
if [ ! -e se.renamed.fastq ]; then
rename_filter_fastq \
'se' \
<(exec expand_fastq '[% args.2 %]' ) \
'' \
> 'se.renamed.fastq'
fi
[% END -%]
#----------------------------#
# Stats of PE reads
#----------------------------#
head -n 80000 pe.renamed.fastq > pe_data.tmp
export PE_AVG_READ_LENGTH=$(
head -n 40000 pe_data.tmp \
| grep --text -v '^+' \
| grep --text -v '^@' \
| awk '{if(length($1)>31){n+=length($1);m++;}}END{print int(n/m)}'
)
save PE_AVG_READ_LENGTH
log_debug "Average PE read length $PE_AVG_READ_LENGTH"
KMER=$(
lib/App/Anchr/Command/quorum.pm view on Meta::CPAN
}
'
if [ ! -e quorum_mer_db.jf ]; then
log_info Creating mer database for Quorum.
quorum_create_database \
-t [% opt.parallel %] \
-s $JF_SIZE -b 7 -m 24 -q $((MIN_Q_CHAR + 5)) \
-o quorum_mer_db.jf.tmp \
pe.renamed.fastq [% IF args.2 %]se.renamed.fastq [% END %]\
&& mv quorum_mer_db.jf.tmp quorum_mer_db.jf
if [ $? -ne 0 ]; then
log_warn Increase JF_SIZE by --jf, the recommendation value is genome_size*coverage/2
exit 1
fi
fi
# -m Minimum count for a k-mer to be considered "good" (1)
# -g Number of good k-mer in a row for anchor (2)
# -a Minimum count for an anchor k-mer (3)
lib/App/Anchr/Command/quorum.pm view on Meta::CPAN
# remove `--no-discard`.
# And we only want most reliable parts of the genome other than the whole genome, so dropping rare
# k-mers is totally OK for us. Raise `-m` from 1 to 3, `-g` from 1 to 3, and `-a` from 1 to 4.
if [ ! -e pe.cor.fa ]; then
log_info Error correct PE.
quorum_error_correct_reads \
-q $((MIN_Q_CHAR + 40)) \
--contaminant=[% opt.adapter %] \
-m 3 -s 1 -g 3 -a 4 -t [% opt.parallel %] -w 10 -e 1 \
quorum_mer_db.jf \
pe.renamed.fastq [% IF args.2 %]se.renamed.fastq [% END %]\
-o pe.cor --verbose 1>quorum.err 2>&1 \
|| {
mv pe.cor.fa pe.cor.fa.failed;
log_warn Error correction of PE reads failed. Check pe.cor.log.;
exit 1;
}
log_debug "Discard any reads with subs"
mv pe.cor.fa pe.cor.sub.fa
cat pe.cor.sub.fa | grep -E '^>\w+\s*$' -A 1 | sed '/^--$/d' > pe.cor.fa
fi
SUM_IN=$( faops n50 -H -N 0 -S pe.renamed.fastq [% IF args.2 %]se.renamed.fastq [% END %])
save SUM_IN
SUM_OUT=$( faops n50 -H -N 0 -S pe.cor.fa )
save SUM_OUT
#----------------------------#
# Estimating genome size.
#----------------------------#
log_info Estimating genome size.
[% IF opt.estsize == 'auto' -%]
script/anchr view on Meta::CPAN
__END__
perl -I lib/ script/anchr
perl -I lib/ script/anchr dep
perl -I lib/ t/02-dep.t
perl -I lib/ script/anchr dazzname t/1_4.anchor.fasta -o stdout
perl -I lib/ t/03-dazzname.t
perl -I lib/ script/anchr show2ovlp t/1_4.renamed.fasta t/1_4.show.txt -o stdout
perl -I lib/ script/anchr show2ovlp t/1_4.renamed.fasta t/1_4.show.txt -r t/1_4.replace.tsv -o stdout
perl -I lib/ t/04-show2ovlp.t
perl -I lib/ script/anchr trim t/R1.fq.gz t/R2.fq.gz -b NAMES -o stdout | bash
perl -I lib/ t/05-trim.t
perl -I lib/ script/anchr overlap t/1_4.pac.fasta -o stdout
perl -I lib/ script/anchr overlap t/1_4.pac.fasta --idt 0.8 --len 2500 --serial -o stdout
perl -I lib/ script/anchr overlap t/1_4.pac.fasta --idt 0.8 --len 2500 --all -o stdout
perl -I lib/ t/07-overlap.t
share/createSuperReadsForDirectory.perl view on Meta::CPAN
my $goodSuperReadsNamesFile = "$workingDirectory/superReadNames.txt";
my $fastaSuperReadErrorsFile = "$workingDirectory/createFastaSuperReadSequences.errors.txt";
#
# reduce SR
#
my $localGoodSequenceOutputFile = "${finalSuperReadSequenceFile}.all";
my $superReadNameAndLengthsFile = "$workingDirectory/sr_sizes.tmp";
my $reduceFile = "$workingDirectory/reduce.tmp";
my $reduceFileTranslated = "$workingDirectory/reduce.tmp.renamed";
my $tflag = "-rename-super-reads";
if ($keepKUnitigsInSuperreadNames) {
$tflag = "";
}
$cmd
= "cat $superReadCountsFile "
. "| createFastaSuperReadSequences $workingDirectory /dev/fd/0 "
. "-seqdiffmax $seqDiffMax -min-ovl-len $merLenMinus1 -minreadsinsuperread $minReadsInSuperRead "
. " -good-sr-filename $goodSuperReadsNamesFile "
. "-kunitigsfile $kUnitigsFile -good-sequence-output-file $localGoodSequenceOutputFile "
. "-super-read-name-and-lengths-file $superReadNameAndLengthsFile $tflag 2> $sequenceCreationErrorFile";
t/04-show2ovlp.t view on Meta::CPAN
my $result = test_app( 'App::Anchr' => [qw(help show2ovlp)] );
like( $result->stdout, qr{show2ovlp}, 'descriptions' );
$result = test_app( 'App::Anchr' => [qw(show2ovlp)] );
like( $result->error, qr{need .+input file}, 'need infile' );
$result = test_app( 'App::Anchr' => [qw(show2ovlp t/not_exists t/not_exists)] );
like( $result->error, qr{doesn't exist}, 'infile not exists' );
$result = test_app( 'App::Anchr' => [qw(show2ovlp t/1_4.renamed.fasta t/1_4.show.txt -o stdout)] );
is( ( scalar grep {/\S/} split( /\n/, $result->stdout ) ), 50, 'line count' );
like( $result->stdout, qr{overlap}s, 'overlaps' );
$result = test_app(
'App::Anchr' => [qw(show2ovlp t/1_4.renamed.fasta t/1_4.show.txt -r t/not_exists -o stdout)] );
like( $result->error, qr{doesn't exist}, 'infile not exists' );
$result = test_app( 'App::Anchr' =>
[qw(show2ovlp t/1_4.renamed.fasta t/1_4.show.txt -r t/1_4.replace.tsv -o stdout)] );
is( ( scalar grep {/\S/} split( /\n/, $result->stdout ) ), 50, 'line count' );
like( $result->stdout, qr{pac7556_20928}s, 'original name' );
done_testing();
t/14-contained.t view on Meta::CPAN
$result = test_app( 'App::Anchr' => [qw(contained)] );
like( $result->error, qr{need .+input file}, 'need infile' );
$result = test_app( 'App::Anchr' => [qw(contained t/1_4.pac.fasta t/not_exists)] );
like( $result->error, qr{doesn't exist}, 'infile not exists' );
$result = test_app( 'App::Anchr' => [qw(contained t/1_4.anchor.fasta --prefix anchor -v -o stdout)] );
is( ( scalar grep {/^CMD/} grep {/\S/} split( /\n/, $result->stderr ) ), 3, 'stderr line count' );
is( ( scalar grep {/\S/} split( /\n/, $result->stdout ) ), 8, 'line count' );
unlike( $result->stdout, qr{anchor576_1624}s, 'original names' );
like( $result->stdout, qr{anchor_0\/1\/}s, 'renamed' );
$result = test_app( 'App::Anchr' => [qw(contained t/contained.fasta -v -o stdout)] );
is( ( scalar grep {/^CMD/} grep {/\S/} split( /\n/, $result->stderr ) ), 3, 'stderr line count' );
is( ( scalar grep {/\S/} split( /\n/, $result->stdout ) ), 2, 'line count' );
like( $result->stdout, qr{infile_0\/}s, 'renamed' );
done_testing();