App-Anchr
view release on metacpan or search on metacpan
doc/bacteria_2_3.md view on Meta::CPAN
# Bacteria 2+3
[TOC levels=1-3]: # " "
- [Bacteria 2+3](#bacteria-23)
- [Escherichia virus Lambda](#escherichia-virus-lambda)
- [lambda: download](#lambda-download)
- [lambda: preprocess PacBio reads](#lambda-preprocess-pacbio-reads)
- [lambda: reads stats](#lambda-reads-stats)
- [lambda: 3GS](#lambda-3gs)
- [Shigella flexneri NCTC0001, ç¦æ°å¿è´ºæ°è](#shigella-flexneri-nctc0001-ç¦æ°å¿è´ºæ°è)
- [Sfle: download](#sfle-download)
- [Sfle: combinations of different quality values and read lengths](#sfle-combinations-of-different-quality-values-and-read-lengths)
- [Sfle: down sampling](#sfle-down-sampling)
- [Sfle: generate super-reads](#sfle-generate-super-reads)
- [Sfle: create anchors](#sfle-create-anchors)
- [Sfle: results](#sfle-results)
- [Sfle: merge anchors](#sfle-merge-anchors)
- [Sfle: 3GS](#sfle-3gs)
- [Sfle: expand anchors](#sfle-expand-anchors)
- [Vibrio parahaemolyticus ATCC BAA-239, 坿º¶è¡å¼§è](#vibrio-parahaemolyticus-atcc-baa-239-坿º¶è¡å¼§è)
- [Vpar: download](#vpar-download)
- [Vpar: combinations of different quality values and read lengths](#vpar-combinations-of-different-quality-values-and-read-lengths)
- [Vpar: down sampling](#vpar-down-sampling)
- [Vpar: generate super-reads](#vpar-generate-super-reads)
- [Vpar: create anchors](#vpar-create-anchors)
- [Vpar: results](#vpar-results)
- [Vpar: merge anchors](#vpar-merge-anchors)
- [Vpar: 3GS](#vpar-3gs)
- [Vpar: expand anchors](#vpar-expand-anchors)
- [Legionella pneumophila subsp. pneumophila ATCC 33152D-5; Philadelphia-1, åèºåå¢è](#legionella-pneumophila-subsp-pneumophila-atcc-33152d-5-philadelphia-1-åèºåå¢è)
- [Lpne: download](#lpne-download)
- [Lpne: combinations of different quality values and read lengths](#lpne-combinations-of-different-quality-values-and-read-lengths)
- [Lpne: down sampling](#lpne-down-sampling)
- [Lpne: generate super-reads](#lpne-generate-super-reads)
- [Lpne: create anchors](#lpne-create-anchors)
- [Lpne: results](#lpne-results)
- [Lpne: merge anchors](#lpne-merge-anchors)
- [Lpne: 3GS](#lpne-3gs)
- [Lpne: expand anchors](#lpne-expand-anchors)
- [Neisseria gonorrhoeae FDAARGOS_207, æ·ç
å¥çæ°è](#neisseria-gonorrhoeae-fdaargos-207-æ·ç
å¥çæ°è)
- [Ngon: download](#ngon-download)
- [Ngon: combinations of different quality values and read lengths](#ngon-combinations-of-different-quality-values-and-read-lengths)
- [Ngon: down sampling](#ngon-down-sampling)
- [Ngon: generate super-reads](#ngon-generate-super-reads)
- [Ngon: create anchors](#ngon-create-anchors)
- [Ngon: results](#ngon-results)
- [Ngon: merge anchors](#ngon-merge-anchors)
- [Ngon: 3GS](#ngon-3gs)
- [Ngon: expand anchors](#ngon-expand-anchors)
- [Neisseria meningitidis FDAARGOS_209, èèçå¥çæ°è](#neisseria-meningitidis-fdaargos-209-èèçå¥çæ°è)
- [Nmen: download](#nmen-download)
- [Nmen: combinations of different quality values and read lengths](#nmen-combinations-of-different-quality-values-and-read-lengths)
- [Nmen: quorum](#nmen-quorum)
- [Nmen: down sampling](#nmen-down-sampling)
- [Nmen: k-unitigs and anchors (sampled)](#nmen-k-unitigs-and-anchors-sampled)
- [Nmen: merge anchors](#nmen-merge-anchors)
- [Nmen: 3GS](#nmen-3gs)
- [Nmen: expand anchors](#nmen-expand-anchors)
- [Bordetella pertussis FDAARGOS_195, ç¾æ¥å³åå¾·ç¹æ°æè](#bordetella-pertussis-fdaargos-195-ç¾æ¥å³åå¾·ç¹æ°æè)
- [Bper: download](#bper-download)
- [Bper: combinations of different quality values and read lengths](#bper-combinations-of-different-quality-values-and-read-lengths)
- [Bper: down sampling](#bper-down-sampling)
- [Bper: generate super-reads](#bper-generate-super-reads)
- [Bper: create anchors](#bper-create-anchors)
- [Bper: results](#bper-results)
- [Bper: merge anchors](#bper-merge-anchors)
- [Corynebacterium diphtheriae FDAARGOS_197, ç½åæè](#corynebacterium-diphtheriae-fdaargos-197-ç½åæè)
- [Cdip: download](#cdip-download)
- [Cdip: combinations of different quality values and read lengths](#cdip-combinations-of-different-quality-values-and-read-lengths)
- [Cdip: quorum](#cdip-quorum)
- [Cdip: down sampling](#cdip-down-sampling)
- [Cdip: k-unitigs and anchors (sampled)](#cdip-k-unitigs-and-anchors-sampled)
- [Cdip: merge anchors](#cdip-merge-anchors)
- [Cdip: 3GS](#cdip-3gs)
- [Cdip: expand anchors](#cdip-expand-anchors)
- [Francisella tularensis FDAARGOS_247, åæçå¼æè¥¿æ¯æ°è](#francisella-tularensis-fdaargos-247-åæçå¼æè¥¿æ¯æ°è)
- [Ftul: download](#ftul-download)
- [Ftul: combinations of different quality values and read lengths](#ftul-combinations-of-different-quality-values-and-read-lengths)
- [Ftul: quorum](#ftul-quorum)
- [Ftul: down sampling](#ftul-down-sampling)
- [Ftul: k-unitigs and anchors (sampled)](#ftul-k-unitigs-and-anchors-sampled)
- [Ftul: merge anchors](#ftul-merge-anchors)
- [Ftul: 3GS](#ftul-3gs)
- [Ftul: expand anchors](#ftul-expand-anchors)
- [Haemophilus influenzae FDAARGOS_199, æµæåè¡æè](#haemophilus-influenzae-fdaargos-199-æµæåè¡æè)
- [Hinf: download](#hinf-download)
- [Listeria monocytogenes FDAARGOS_351, åæ ¸ç»èå¢çææ¯ç¹æ°è](#listeria-monocytogenes-fdaargos-351-åæ ¸ç»èå¢çææ¯ç¹æ°è)
- [Lmon: download](#lmon-download)
- [Clostridioides difficile 630](#clostridioides-difficile-630)
- [Cdif: download](#cdif-download)
- [Campylobacter jejuni subsp. jejuni ATCC 700819, 空è å¼¯æ²æè](#campylobacter-jejuni-subsp-jejuni-atcc-700819-空è å¼¯æ²æè)
- [Cjej: download](#cjej-download)
# Escherichia virus Lambda
Project
[SRP055199](https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP055199)
## lambda: download
* Reference genome
* Strain: Escherichia virus Lambda (viruses)
* Taxid: [10710](https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=10710&lvl=3&lin=f&keep=1&srchmode=1&unlock)
* RefSeq assembly accession:
[GCF_000840245.1](ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/840/245/GCF_000840245.1_ViralProj14204/GCF_000840245.1_ViralProj14204_assembly_report.txt)
* Proportion of paralogs (> 1000 bp): 0.0
```bash
mkdir -p ~/data/anchr/lambda/1_genome
cd ~/data/anchr/lambda/1_genome
aria2c -x 9 -s 3 -c ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/840/245/GCF_000840245.1_ViralProj14204/GCF_000840245.1_ViralProj14204_genomic.fna.gz
TAB=$'\t'
cat <<EOF > replace.tsv
NC_001416.1${TAB}1
EOF
faops replace GCF_000840245.1_ViralProj14204_genomic.fna.gz replace.tsv genome.fa
#cp ~/data/anchr/paralogs/otherbac/Results/lambda/lambda.multi.fas paralogs.fas
```
* PacBio
```bash
mkdir -p ~/data/anchr/lambda/3_pacbio
cd ~/data/anchr/lambda/3_pacbio
cat << EOF > sra_ftp.txt
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR179/005/SRR1796325/SRR1796325.fastq.gz
EOF
aria2c -x 9 -s 3 -c -i sra_ftp.txt
cat << EOF > sra_md5.txt
2c663d7ea426eea0aaba9017e1a9168c SRR1796325.fastq.gz
EOF
md5sum --check sra_md5.txt
doc/bacteria_2_3.md view on Meta::CPAN
Q20L60/anchor/pe.others.fa \
Q20L90/anchor/pe.others.fa \
Q25L60/anchor/pe.others.fa \
Q25L90/anchor/pe.others.fa \
Q30L60/anchor/pe.others.fa \
Q30L90/anchor/pe.others.fa \
--len 1000 --idt 0.98 --proportion 0.99999 --parallel 16 \
-o stdout \
| faops filter -a 1000 -l 0 stdin merge/others.contained.fasta
anchr orient merge/others.contained.fasta --len 1000 --idt 0.98 -o merge/others.orient.fasta
anchr merge merge/others.orient.fasta --len 1000 --idt 0.999 -o stdout \
| faops filter -a 1000 -l 0 stdin merge/others.merge.fasta
# sort on ref
bash ~/Scripts/cpan/App-Anchr/share/sort_on_ref.sh merge/anchor.merge.fasta 1_genome/genome.fa merge/anchor.sort
nucmer -l 200 1_genome/genome.fa merge/anchor.sort.fa
mummerplot -png out.delta -p anchor.sort --large
# mummerplot files
rm *.[fr]plot
rm out.delta
rm *.gp
mv anchor.sort.png merge/
# quast
rm -fr 9_qa
quast --no-check --threads 16 \
-R 1_genome/genome.fa \
merge/anchor.merge.fasta \
merge/others.merge.fasta \
1_genome/paralogs.fas \
--label "merge,others,paralogs" \
-o 9_qa
```
## Sfle: 3GS
```bash
BASE_DIR=$HOME/data/anchr/Sfle
cd ${BASE_DIR}
canu \
-p Sfle -d canu-raw-40x \
gnuplot=$(brew --prefix)/Cellar/$(brew list --versions gnuplot | sed 's/ /\//')/bin/gnuplot \
genomeSize=4.8m \
-pacbio-raw 3_pacbio/pacbio.40x.fasta
canu \
-p Sfle -d canu-raw-80x \
gnuplot=$(brew --prefix)/Cellar/$(brew list --versions gnuplot | sed 's/ /\//')/bin/gnuplot \
genomeSize=4.8m \
-pacbio-raw 3_pacbio/pacbio.80x.fasta
faops n50 -S -C canu-raw-40x/Sfle.trimmedReads.fasta.gz
faops n50 -S -C canu-raw-80x/Sfle.trimmedReads.fasta.gz
```
## Sfle: expand anchors
* anchorLong
```bash
BASE_DIR=$HOME/data/anchr/Sfle
cd ${BASE_DIR}
anchr cover \
--parallel 16 \
-c 2 -m 40 \
-b 20 --len 1000 --idt 0.9 \
merge/anchor.merge.fasta \
canu-raw-40x/Sfle.trimmedReads.fasta.gz \
-o merge/anchor.cover.fasta
rm -fr anchorLong
anchr overlap2 \
--parallel 16 \
merge/anchor.cover.fasta \
canu-raw-40x/Sfle.trimmedReads.fasta.gz \
-d anchorLong \
-b 20 --len 1000 --idt 0.98
anchr overlap \
merge/anchor.cover.fasta \
--serial --len 10 --idt 0.9999 \
-o stdout \
| perl -nla -e '
BEGIN {
our %seen;
our %count_of;
}
@F == 13 or next;
$F[3] > 0.9999 or next;
my $pair = join( "-", sort { $a <=> $b } ( $F[0], $F[1], ) );
next if $seen{$pair};
$seen{$pair} = $_;
$count_of{ $F[0] }++;
$count_of{ $F[1] }++;
END {
for my $pair ( keys %seen ) {
my ($f_id, $g_id) = split "-", $pair;
next if $count_of{$f_id} > 2;
next if $count_of{$g_id} > 2;
print $seen{$pair};
}
}
' \
| sort -k 1n,1n -k 2n,2n \
> anchorLong/anchor.ovlp.tsv
ANCHOR_COUNT=$(faops n50 -H -N 0 -C anchorLong/anchor.fasta)
echo ${ANCHOR_COUNT}
rm -fr anchorLong/group
anchr group \
doc/bacteria_2_3.md view on Meta::CPAN
| faops filter -a 1000 -l 0 stdin merge/anchor.merge.fasta
# merge others
anchr contained \
Q20L60_2000000/anchor/pe.others.fa \
Q25L60_2000000/anchor/pe.others.fa \
Q30L60_2000000/anchor/pe.others.fa \
--len 1000 --idt 0.98 --proportion 0.99999 --parallel 16 \
-o stdout \
| faops filter -a 1000 -l 0 stdin merge/others.contained.fasta
anchr orient merge/others.contained.fasta --len 1000 --idt 0.98 -o merge/others.orient.fasta
anchr merge merge/others.orient.fasta --len 1000 --idt 0.999 -o stdout \
| faops filter -a 1000 -l 0 stdin merge/others.merge.fasta
# sort on ref
bash ~/Scripts/cpan/App-Anchr/share/sort_on_ref.sh merge/anchor.merge.fasta 1_genome/genome.fa merge/anchor.sort
nucmer -l 200 1_genome/genome.fa merge/anchor.sort.fa
mummerplot -png out.delta -p anchor.sort --large
# mummerplot files
rm *.[fr]plot
rm out.delta
rm *.gp
mv anchor.sort.png merge/
# quast
rm -fr 9_qa
quast --no-check --threads 16 \
-R 1_genome/genome.fa \
merge/anchor.merge.fasta \
1_genome/paralogs.fas \
--label "merge,paralogs" \
-o 9_qa
```
## Vpar: 3GS
```bash
BASE_DIR=$HOME/data/anchr/Vpar
cd ${BASE_DIR}
canu \
-p Vpar -d canu-raw-40x \
gnuplot=$(brew --prefix)/Cellar/$(brew list --versions gnuplot | sed 's/ /\//')/bin/gnuplot \
genomeSize=5.2m \
-pacbio-raw 3_pacbio/pacbio.40x.fasta
canu \
-p Vpar -d canu-raw-80x \
gnuplot=$(brew --prefix)/Cellar/$(brew list --versions gnuplot | sed 's/ /\//')/bin/gnuplot \
genomeSize=5.2m \
-pacbio-raw 3_pacbio/pacbio.80x.fasta
faops n50 -S -C canu-raw-40x/Vpar.trimmedReads.fasta.gz
faops n50 -S -C canu-raw-80x/Vpar.trimmedReads.fasta.gz
```
## Vpar: expand anchors
* anchorLong
```bash
BASE_DIR=$HOME/data/anchr/Vpar
cd ${BASE_DIR}
anchr cover \
--parallel 16 \
-c 2 -m 40 \
-b 20 --len 1000 --idt 0.9 \
merge/anchor.merge.fasta \
canu-raw-40x/Vpar.trimmedReads.fasta.gz \
-o merge/anchor.cover.fasta
rm -fr anchorLong
anchr overlap2 \
--parallel 16 \
merge/anchor.cover.fasta \
canu-raw-40x/Vpar.trimmedReads.fasta.gz \
-d anchorLong \
-b 20 --len 1000 --idt 0.98
anchr overlap \
merge/anchor.cover.fasta \
--serial --len 10 --idt 0.9999 \
-o stdout \
| perl -nla -e '
BEGIN {
our %seen;
our %count_of;
}
@F == 13 or next;
$F[3] > 0.9999 or next;
my $pair = join( "-", sort { $a <=> $b } ( $F[0], $F[1], ) );
next if $seen{$pair};
$seen{$pair} = $_;
$count_of{ $F[0] }++;
$count_of{ $F[1] }++;
END {
for my $pair ( keys %seen ) {
my ($f_id, $g_id) = split "-", $pair;
next if $count_of{$f_id} > 2;
next if $count_of{$g_id} > 2;
print $seen{$pair};
}
}
' \
| sort -k 1n,1n -k 2n,2n \
> anchorLong/anchor.ovlp.tsv
ANCHOR_COUNT=$(faops n50 -H -N 0 -C anchorLong/anchor.fasta)
echo ${ANCHOR_COUNT}
rm -fr anchorLong/group
anchr group \
doc/bacteria_2_3.md view on Meta::CPAN
# merge others
anchr contained \
Q20L60_2000000/anchor/pe.others.fa \
Q25L60_2000000/anchor/pe.others.fa \
Q30L60_2000000/anchor/pe.others.fa \
--len 1000 --idt 0.98 --proportion 0.99999 --parallel 16 \
-o stdout \
| faops filter -a 1000 -l 0 stdin merge/others.contained.fasta
anchr orient merge/others.contained.fasta --len 1000 --idt 0.98 -o merge/others.orient.fasta
anchr merge merge/others.orient.fasta --len 1000 --idt 0.999 -o stdout \
| faops filter -a 1000 -l 0 stdin merge/others.merge.fasta
# sort on ref
bash ~/Scripts/cpan/App-Anchr/share/sort_on_ref.sh merge/anchor.merge.fasta 1_genome/genome.fa merge/anchor.sort
nucmer -l 200 1_genome/genome.fa merge/anchor.sort.fa
mummerplot -png out.delta -p anchor.sort --large
# mummerplot files
rm *.[fr]plot
rm out.delta
rm *.gp
mv anchor.sort.png merge/
# quast
rm -fr 9_qa
quast --no-check --threads 16 \
-R 1_genome/genome.fa \
merge/anchor.merge.fasta \
merge/others.merge.fasta \
1_genome/paralogs.fas \
--label "merge,others,paralogs" \
-o 9_qa
```
## Lpne: 3GS
```bash
BASE_DIR=$HOME/data/anchr/Lpne
cd ${BASE_DIR}
canu \
-p Lpne -d canu-raw-40x \
gnuplot=$(brew --prefix)/Cellar/$(brew list --versions gnuplot | sed 's/ /\//')/bin/gnuplot \
genomeSize=3.4m \
-pacbio-raw 3_pacbio/pacbio.40x.fasta
canu \
-p Lpne -d canu-raw-80x \
gnuplot=$(brew --prefix)/Cellar/$(brew list --versions gnuplot | sed 's/ /\//')/bin/gnuplot \
genomeSize=3.4m \
-pacbio-raw 3_pacbio/pacbio.80x.fasta
faops n50 -S -C canu-raw-40x/Lpne.trimmedReads.fasta.gz
faops n50 -S -C canu-raw-80x/Lpne.trimmedReads.fasta.gz
```
## Lpne: expand anchors
* anchorLong
```bash
BASE_DIR=$HOME/data/anchr/Lpne
cd ${BASE_DIR}
anchr cover \
--parallel 16 \
-c 2 -m 40 \
-b 20 --len 1000 --idt 0.9 \
merge/anchor.merge.fasta \
canu-raw-40x/Lpne.trimmedReads.fasta.gz \
-o merge/anchor.cover.fasta
rm -fr anchorLong
anchr overlap2 \
--parallel 16 \
merge/anchor.cover.fasta \
canu-raw-40x/Lpne.trimmedReads.fasta.gz \
-d anchorLong \
-b 20 --len 1000 --idt 0.98
anchr overlap \
merge/anchor.cover.fasta \
--serial --len 10 --idt 0.9999 \
-o stdout \
| perl -nla -e '
BEGIN {
our %seen;
our %count_of;
}
@F == 13 or next;
$F[3] > 0.9999 or next;
my $pair = join( "-", sort { $a <=> $b } ( $F[0], $F[1], ) );
next if $seen{$pair};
$seen{$pair} = $_;
$count_of{ $F[0] }++;
$count_of{ $F[1] }++;
END {
for my $pair ( keys %seen ) {
my ($f_id, $g_id) = split "-", $pair;
next if $count_of{$f_id} > 2;
next if $count_of{$g_id} > 2;
print $seen{$pair};
}
}
' \
| sort -k 1n,1n -k 2n,2n \
> anchorLong/anchor.ovlp.tsv
ANCHOR_COUNT=$(faops n50 -H -N 0 -C anchorLong/anchor.fasta)
echo ${ANCHOR_COUNT}
rm -fr anchorLong/group
anchr group \
doc/bacteria_2_3.md view on Meta::CPAN
# merge others
anchr contained \
Q20L60_2000000/anchor/pe.others.fa \
Q25L60_2000000/anchor/pe.others.fa \
Q30L60_2000000/anchor/pe.others.fa \
--len 1000 --idt 0.98 --proportion 0.99999 --parallel 16 \
-o stdout \
| faops filter -a 1000 -l 0 stdin merge/others.contained.fasta
anchr orient merge/others.contained.fasta --len 1000 --idt 0.98 -o merge/others.orient.fasta
anchr merge merge/others.orient.fasta --len 1000 --idt 0.999 -o stdout \
| faops filter -a 1000 -l 0 stdin merge/others.merge.fasta
# sort on ref
bash ~/Scripts/cpan/App-Anchr/share/sort_on_ref.sh merge/anchor.merge.fasta 1_genome/genome.fa merge/anchor.sort
nucmer -l 200 1_genome/genome.fa merge/anchor.sort.fa
mummerplot -png out.delta -p anchor.sort --large
# mummerplot files
rm *.[fr]plot
rm out.delta
rm *.gp
mv anchor.sort.png merge/
# quast
rm -fr 9_qa
quast --no-check --threads 16 \
-R 1_genome/genome.fa \
merge/anchor.merge.fasta \
merge/others.merge.fasta \
1_genome/paralogs.fas \
--label "merge,others,paralogs" \
-o 9_qa
```
## Ngon: 3GS
```bash
BASE_DIR=$HOME/data/anchr/Ngon
cd ${BASE_DIR}
canu \
-p Ngon -d canu-raw-40x \
gnuplot=$(brew --prefix)/Cellar/$(brew list --versions gnuplot | sed 's/ /\//')/bin/gnuplot \
genomeSize=2.3m \
-pacbio-raw 3_pacbio/pacbio.40x.fasta
canu \
-p Ngon -d canu-raw-80x \
gnuplot=$(brew --prefix)/Cellar/$(brew list --versions gnuplot | sed 's/ /\//')/bin/gnuplot \
genomeSize=2.3m \
-pacbio-raw 3_pacbio/pacbio.80x.fasta
faops n50 -S -C canu-raw-40x/Ngon.trimmedReads.fasta.gz
faops n50 -S -C canu-raw-80x/Ngon.trimmedReads.fasta.gz
```
## Ngon: expand anchors
* anchorLong
```bash
BASE_DIR=$HOME/data/anchr/Ngon
cd ${BASE_DIR}
anchr cover \
--parallel 16 \
-c 2 -m 40 \
-b 20 --len 1000 --idt 0.9 \
merge/anchor.merge.fasta \
canu-raw-40x/Ngon.trimmedReads.fasta.gz \
-o merge/anchor.cover.fasta
rm -fr anchorLong
anchr overlap2 \
--parallel 16 \
merge/anchor.cover.fasta \
canu-raw-40x/Ngon.trimmedReads.fasta.gz \
-d anchorLong \
-b 20 --len 1000 --idt 0.98
anchr overlap \
merge/anchor.cover.fasta \
--serial --len 10 --idt 0.9999 \
-o stdout \
| perl -nla -e '
BEGIN {
our %seen;
our %count_of;
}
@F == 13 or next;
$F[3] > 0.9999 or next;
my $pair = join( "-", sort { $a <=> $b } ( $F[0], $F[1], ) );
next if $seen{$pair};
$seen{$pair} = $_;
$count_of{ $F[0] }++;
$count_of{ $F[1] }++;
END {
for my $pair ( keys %seen ) {
my ($f_id, $g_id) = split "-", $pair;
next if $count_of{$f_id} > 2;
next if $count_of{$g_id} > 2;
print $seen{$pair};
}
}
' \
| sort -k 1n,1n -k 2n,2n \
> anchorLong/anchor.ovlp.tsv
ANCHOR_COUNT=$(faops n50 -H -N 0 -C anchorLong/anchor.fasta)
echo ${ANCHOR_COUNT}
rm -fr anchorLong/group
anchr group \
doc/bacteria_2_3.md view on Meta::CPAN
$(
parallel -k --no-run-if-empty -j 6 "
if [ -e Q{1}L{2}X{3}P{4}/anchor/pe.others.fa ]; then
echo Q{1}L{2}X{3}P{4}/anchor/pe.others.fa
fi
" ::: 25 30 ::: 60 ::: 40 80 120 160 ::: 000 001 002 003 004 005 006
) \
--len 1000 --idt 0.98 --proportion 0.99999 --parallel 16 \
-o stdout \
| faops filter -a 1000 -l 0 stdin merge/others.contained.fasta
anchr orient merge/others.contained.fasta --len 1000 --idt 0.98 -o merge/others.orient.fasta
anchr merge merge/others.orient.fasta --len 1000 --idt 0.999 -o stdout \
| faops filter -a 1000 -l 0 stdin merge/others.merge.fasta
# anchors sorted on ref
bash ~/Scripts/cpan/App-Anchr/share/sort_on_ref.sh merge/anchor.merge.fasta 1_genome/genome.fa merge/anchor.sort
nucmer -l 200 1_genome/genome.fa merge/anchor.sort.fa
mummerplot -png out.delta -p anchor.sort --large
# mummerplot files
rm *.[fr]plot
rm out.delta
rm *.gp
mv anchor.sort.png merge/
# quast
rm -fr 9_qa
quast --no-check --threads 16 \
-R 1_genome/genome.fa \
merge/anchor.merge.fasta \
merge/others.merge.fasta \
1_genome/paralogs.fas \
--label "merge,others,paralogs" \
-o 9_qa
```
## Nmen: 3GS
```bash
BASE_DIR=$HOME/data/anchr/Nmen
cd ${BASE_DIR}
canu \
-p Nmen -d canu-raw-40x \
gnuplot=$(brew --prefix)/Cellar/$(brew list --versions gnuplot | sed 's/ /\//')/bin/gnuplot \
genomeSize=2.3m \
-pacbio-raw 3_pacbio/pacbio.40x.fasta
canu \
-p Nmen -d canu-raw-80x \
gnuplot=$(brew --prefix)/Cellar/$(brew list --versions gnuplot | sed 's/ /\//')/bin/gnuplot \
genomeSize=2.3m \
-pacbio-raw 3_pacbio/pacbio.80x.fasta
faops n50 -S -C canu-raw-40x/Nmen.trimmedReads.fasta.gz
faops n50 -S -C canu-raw-80x/Nmen.trimmedReads.fasta.gz
```
## Nmen: expand anchors
* anchorLong
```bash
BASE_NAME=Nmen
cd ${HOME}/data/anchr/${BASE_NAME}
anchr cover \
--parallel 16 \
-c 2 -m 40 \
-b 20 --len 1000 --idt 0.9 \
merge/anchor.merge.fasta \
canu-raw-40x/${BASE_NAME}.trimmedReads.fasta.gz \
-o merge/anchor.cover.fasta
rm -fr anchorLong
anchr overlap2 \
--parallel 16 \
merge/anchor.cover.fasta \
canu-raw-40x/${BASE_NAME}.trimmedReads.fasta.gz \
-d anchorLong \
-b 20 --len 1000 --idt 0.98
anchr overlap \
merge/anchor.cover.fasta \
--serial --len 10 --idt 0.9999 \
-o stdout \
| perl -nla -e '
BEGIN {
our %seen;
our %count_of;
}
@F == 13 or next;
$F[3] > 0.9999 or next;
my $pair = join( "-", sort { $a <=> $b } ( $F[0], $F[1], ) );
next if $seen{$pair};
$seen{$pair} = $_;
$count_of{ $F[0] }++;
$count_of{ $F[1] }++;
END {
for my $pair ( keys %seen ) {
my ($f_id, $g_id) = split "-", $pair;
next if $count_of{$f_id} > 2;
next if $count_of{$g_id} > 2;
print $seen{$pair};
}
}
' \
| sort -k 1n,1n -k 2n,2n \
> anchorLong/anchor.ovlp.tsv
ANCHOR_COUNT=$(faops n50 -H -N 0 -C anchorLong/anchor.fasta)
echo ${ANCHOR_COUNT}
rm -fr anchorLong/group
anchr group \
doc/bacteria_2_3.md view on Meta::CPAN
parallel -k --no-run-if-empty -j 6 "
if [ -e Q{1}L{2}X{3}P{4}/anchor/pe.others.fa ]; then
echo Q{1}L{2}X{3}P{4}/anchor/pe.others.fa
fi
" ::: 25 30 ::: 60 ::: 40 80 120 160 240 ::: 000 001 002 003 004 005 006
) \
--len 1000 --idt 0.98 --proportion 0.99999 --parallel 16 \
-o stdout \
| faops filter -a 1000 -l 0 stdin merge/others.contained.fasta
anchr orient merge/others.contained.fasta --len 1000 --idt 0.98 -o merge/others.orient.fasta
anchr merge merge/others.orient.fasta --len 1000 --idt 0.999 -o stdout \
| faops filter -a 1000 -l 0 stdin merge/others.merge.fasta
# anchors sorted on ref
bash ~/Scripts/cpan/App-Anchr/share/sort_on_ref.sh merge/anchor.merge.fasta 1_genome/genome.fa merge/anchor.sort
nucmer -l 200 1_genome/genome.fa merge/anchor.sort.fa
mummerplot -png out.delta -p anchor.sort --large
# mummerplot files
rm *.[fr]plot
rm out.delta
rm *.gp
mv anchor.sort.png merge/
# quast
rm -fr 9_qa
quast --no-check --threads 16 \
-R 1_genome/genome.fa \
merge/anchor.merge.fasta \
merge/others.merge.fasta \
1_genome/paralogs.fas \
--label "merge,others,paralogs" \
-o 9_qa
```
## Cdip: 3GS
```bash
BASE_NAME=Cdip
REAL_G=2488635
cd $HOME/data/anchr/${BASE_NAME}
canu \
-p ${BASE_NAME} -d canu-raw-40x \
gnuplot=$(brew --prefix)/Cellar/$(brew list --versions gnuplot | sed 's/ /\//')/bin/gnuplot \
genomeSize=${REAL_G} \
-pacbio-raw 3_pacbio/pacbio.40x.fasta
canu \
-p ${BASE_NAME} -d canu-raw-80x \
gnuplot=$(brew --prefix)/Cellar/$(brew list --versions gnuplot | sed 's/ /\//')/bin/gnuplot \
genomeSize=${REAL_G} \
-pacbio-raw 3_pacbio/pacbio.80x.fasta
faops n50 -S -C canu-raw-40x/${BASE_NAME}.trimmedReads.fasta.gz
faops n50 -S -C canu-raw-80x/${BASE_NAME}.trimmedReads.fasta.gz
```
## Cdip: expand anchors
* anchorLong
```bash
BASE_NAME=Cdip
cd ${HOME}/data/anchr/${BASE_NAME}
anchr cover \
--parallel 16 \
-c 2 -m 40 \
-b 20 --len 1000 --idt 0.9 \
merge/anchor.merge.fasta \
canu-raw-40x/${BASE_NAME}.trimmedReads.fasta.gz \
-o merge/anchor.cover.fasta
rm -fr anchorLong
anchr overlap2 \
--parallel 16 \
merge/anchor.cover.fasta \
canu-raw-40x/${BASE_NAME}.trimmedReads.fasta.gz \
-d anchorLong \
-b 20 --len 1000 --idt 0.98
anchr overlap \
merge/anchor.cover.fasta \
--serial --len 10 --idt 0.9999 \
-o stdout \
| perl -nla -e '
BEGIN {
our %seen;
our %count_of;
}
@F == 13 or next;
$F[3] > 0.9999 or next;
my $pair = join( "-", sort { $a <=> $b } ( $F[0], $F[1], ) );
next if $seen{$pair};
$seen{$pair} = $_;
$count_of{ $F[0] }++;
$count_of{ $F[1] }++;
END {
for my $pair ( keys %seen ) {
my ($f_id, $g_id) = split "-", $pair;
next if $count_of{$f_id} > 2;
next if $count_of{$g_id} > 2;
print $seen{$pair};
}
}
' \
| sort -k 1n,1n -k 2n,2n \
> anchorLong/anchor.ovlp.tsv
ANCHOR_COUNT=$(faops n50 -H -N 0 -C anchorLong/anchor.fasta)
echo ${ANCHOR_COUNT}
rm -fr anchorLong/group
anchr group \
doc/bacteria_2_3.md view on Meta::CPAN
parallel -k --no-run-if-empty -j 6 "
if [ -e Q{1}L{2}X{3}P{4}/anchor/pe.others.fa ]; then
echo Q{1}L{2}X{3}P{4}/anchor/pe.others.fa
fi
" ::: 25 30 ::: 60 ::: 40 80 120 160 ::: 000 001 002 003 004 005 006
) \
--len 1000 --idt 0.98 --proportion 0.99999 --parallel 16 \
-o stdout \
| faops filter -a 1000 -l 0 stdin merge/others.contained.fasta
anchr orient merge/others.contained.fasta --len 1000 --idt 0.98 -o merge/others.orient.fasta
anchr merge merge/others.orient.fasta --len 1000 --idt 0.999 -o stdout \
| faops filter -a 1000 -l 0 stdin merge/others.merge.fasta
# anchors sorted on ref
bash ~/Scripts/cpan/App-Anchr/share/sort_on_ref.sh merge/anchor.merge.fasta 1_genome/genome.fa merge/anchor.sort
nucmer -l 200 1_genome/genome.fa merge/anchor.sort.fa
mummerplot -png out.delta -p anchor.sort --large
# mummerplot files
rm *.[fr]plot
rm out.delta
rm *.gp
mv anchor.sort.png merge/
# quast
rm -fr 9_qa
quast --no-check --threads 16 \
-R 1_genome/genome.fa \
merge/anchor.merge.fasta \
merge/others.merge.fasta \
1_genome/paralogs.fas \
--label "merge,others,paralogs" \
-o 9_qa
```
## Ftul: 3GS
```bash
BASE_NAME=Ftul
GENOME_SIZE=1.9m
cd $HOME/data/anchr/${BASE_NAME}
canu \
-p ${BASE_NAME} -d canu-raw-40x \
gnuplot=$(brew --prefix)/Cellar/$(brew list --versions gnuplot | sed 's/ /\//')/bin/gnuplot \
genomeSize=${GENOME_SIZE} \
-pacbio-raw 3_pacbio/pacbio.40x.fasta
canu \
-p ${BASE_NAME} -d canu-raw-80x \
gnuplot=$(brew --prefix)/Cellar/$(brew list --versions gnuplot | sed 's/ /\//')/bin/gnuplot \
genomeSize=${GENOME_SIZE} \
-pacbio-raw 3_pacbio/pacbio.80x.fasta
faops n50 -S -C canu-raw-40x/${BASE_NAME}.trimmedReads.fasta.gz
faops n50 -S -C canu-raw-80x/${BASE_NAME}.trimmedReads.fasta.gz
```
## Ftul: expand anchors
* anchorLong
```bash
BASE_NAME=Ftul
cd ${HOME}/data/anchr/${BASE_NAME}
anchr cover \
--parallel 16 \
-c 2 -m 40 \
-b 20 --len 1000 --idt 0.9 \
merge/anchor.merge.fasta \
canu-raw-40x/${BASE_NAME}.trimmedReads.fasta.gz \
-o merge/anchor.cover.fasta
rm -fr anchorLong
anchr overlap2 \
--parallel 16 \
merge/anchor.cover.fasta \
canu-raw-40x/${BASE_NAME}.trimmedReads.fasta.gz \
-d anchorLong \
-b 20 --len 1000 --idt 0.98
anchr overlap \
merge/anchor.cover.fasta \
--serial --len 10 --idt 0.9999 \
-o stdout \
| perl -nla -e '
BEGIN {
our %seen;
our %count_of;
}
@F == 13 or next;
$F[3] > 0.9999 or next;
my $pair = join( "-", sort { $a <=> $b } ( $F[0], $F[1], ) );
next if $seen{$pair};
$seen{$pair} = $_;
$count_of{ $F[0] }++;
$count_of{ $F[1] }++;
END {
for my $pair ( keys %seen ) {
my ($f_id, $g_id) = split "-", $pair;
next if $count_of{$f_id} > 2;
next if $count_of{$g_id} > 2;
print $seen{$pair};
}
}
' \
| sort -k 1n,1n -k 2n,2n \
> anchorLong/anchor.ovlp.tsv
ANCHOR_COUNT=$(faops n50 -H -N 0 -C anchorLong/anchor.fasta)
echo ${ANCHOR_COUNT}
rm -fr anchorLong/group
anchr group \
( run in 0.859 second using v1.01-cache-2.11-cpan-97f6503c9c8 )