App-Anchr
view release on metacpan or search on metacpan
doc/bacteria_2_3.md view on Meta::CPAN
- [Ngon: create anchors](#ngon-create-anchors)
- [Ngon: results](#ngon-results)
- [Ngon: merge anchors](#ngon-merge-anchors)
- [Ngon: 3GS](#ngon-3gs)
- [Ngon: expand anchors](#ngon-expand-anchors)
- [Neisseria meningitidis FDAARGOS_209, èèçå¥çæ°è](#neisseria-meningitidis-fdaargos-209-èèçå¥çæ°è)
- [Nmen: download](#nmen-download)
- [Nmen: combinations of different quality values and read lengths](#nmen-combinations-of-different-quality-values-and-read-lengths)
- [Nmen: quorum](#nmen-quorum)
- [Nmen: down sampling](#nmen-down-sampling)
- [Nmen: k-unitigs and anchors (sampled)](#nmen-k-unitigs-and-anchors-sampled)
- [Nmen: merge anchors](#nmen-merge-anchors)
- [Nmen: 3GS](#nmen-3gs)
- [Nmen: expand anchors](#nmen-expand-anchors)
- [Bordetella pertussis FDAARGOS_195, ç¾æ¥å³åå¾·ç¹æ°æè](#bordetella-pertussis-fdaargos-195-ç¾æ¥å³åå¾·ç¹æ°æè)
- [Bper: download](#bper-download)
- [Bper: combinations of different quality values and read lengths](#bper-combinations-of-different-quality-values-and-read-lengths)
- [Bper: down sampling](#bper-down-sampling)
- [Bper: generate super-reads](#bper-generate-super-reads)
- [Bper: create anchors](#bper-create-anchors)
- [Bper: results](#bper-results)
- [Bper: merge anchors](#bper-merge-anchors)
- [Corynebacterium diphtheriae FDAARGOS_197, ç½åæè](#corynebacterium-diphtheriae-fdaargos-197-ç½åæè)
- [Cdip: download](#cdip-download)
- [Cdip: combinations of different quality values and read lengths](#cdip-combinations-of-different-quality-values-and-read-lengths)
- [Cdip: quorum](#cdip-quorum)
- [Cdip: down sampling](#cdip-down-sampling)
- [Cdip: k-unitigs and anchors (sampled)](#cdip-k-unitigs-and-anchors-sampled)
- [Cdip: merge anchors](#cdip-merge-anchors)
- [Cdip: 3GS](#cdip-3gs)
- [Cdip: expand anchors](#cdip-expand-anchors)
- [Francisella tularensis FDAARGOS_247, åæçå¼æè¥¿æ¯æ°è](#francisella-tularensis-fdaargos-247-åæçå¼æè¥¿æ¯æ°è)
- [Ftul: download](#ftul-download)
- [Ftul: combinations of different quality values and read lengths](#ftul-combinations-of-different-quality-values-and-read-lengths)
- [Ftul: quorum](#ftul-quorum)
- [Ftul: down sampling](#ftul-down-sampling)
- [Ftul: k-unitigs and anchors (sampled)](#ftul-k-unitigs-and-anchors-sampled)
- [Ftul: merge anchors](#ftul-merge-anchors)
- [Ftul: 3GS](#ftul-3gs)
- [Ftul: expand anchors](#ftul-expand-anchors)
- [Haemophilus influenzae FDAARGOS_199, æµæåè¡æè](#haemophilus-influenzae-fdaargos-199-æµæåè¡æè)
- [Hinf: download](#hinf-download)
- [Listeria monocytogenes FDAARGOS_351, åæ ¸ç»èå¢çææ¯ç¹æ°è](#listeria-monocytogenes-fdaargos-351-åæ ¸ç»èå¢çææ¯ç¹æ°è)
- [Lmon: download](#lmon-download)
- [Clostridioides difficile 630](#clostridioides-difficile-630)
- [Cdif: download](#cdif-download)
- [Campylobacter jejuni subsp. jejuni ATCC 700819, 空è å¼¯æ²æè](#campylobacter-jejuni-subsp-jejuni-atcc-700819-空è å¼¯æ²æè)
- [Cjej: download](#cjej-download)
# Escherichia virus Lambda
Project
[SRP055199](https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP055199)
## lambda: download
* Reference genome
* Strain: Escherichia virus Lambda (viruses)
* Taxid: [10710](https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=10710&lvl=3&lin=f&keep=1&srchmode=1&unlock)
* RefSeq assembly accession:
[GCF_000840245.1](ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/840/245/GCF_000840245.1_ViralProj14204/GCF_000840245.1_ViralProj14204_assembly_report.txt)
* Proportion of paralogs (> 1000 bp): 0.0
```bash
mkdir -p ~/data/anchr/lambda/1_genome
cd ~/data/anchr/lambda/1_genome
aria2c -x 9 -s 3 -c ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/840/245/GCF_000840245.1_ViralProj14204/GCF_000840245.1_ViralProj14204_genomic.fna.gz
TAB=$'\t'
cat <<EOF > replace.tsv
NC_001416.1${TAB}1
EOF
faops replace GCF_000840245.1_ViralProj14204_genomic.fna.gz replace.tsv genome.fa
#cp ~/data/anchr/paralogs/otherbac/Results/lambda/lambda.multi.fas paralogs.fas
```
* PacBio
```bash
mkdir -p ~/data/anchr/lambda/3_pacbio
cd ~/data/anchr/lambda/3_pacbio
cat << EOF > sra_ftp.txt
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR179/005/SRR1796325/SRR1796325.fastq.gz
EOF
aria2c -x 9 -s 3 -c -i sra_ftp.txt
cat << EOF > sra_md5.txt
2c663d7ea426eea0aaba9017e1a9168c SRR1796325.fastq.gz
EOF
md5sum --check sra_md5.txt
cd ~/data/anchr/lambda
faops filter -l 0 3_pacbio/SRR1796325.fastq.gz 3_pacbio/pacbio.fasta
```
## lambda: preprocess PacBio reads
```bash
BASE_NAME=lambda
cd ${HOME}/data/anchr/${BASE_NAME}
head -n 3000 3_pacbio/pacbio.fasta > 3_pacbio/pacbio.40x.fasta
anchr trimlong --parallel 16 -v \
3_pacbio/pacbio.40x.fasta \
-o 3_pacbio/pacbio.40x.trim.fasta
```
## lambda: reads stats
doc/bacteria_2_3.md view on Meta::CPAN
* Stats
```bash
BASE_DIR=$HOME/data/anchr/Vpar
cd ${BASE_DIR}
printf "| %s | %s | %s | %s |\n" \
"Name" "N50" "Sum" "#" \
> stat3.md
printf "|:--|--:|--:|--:|\n" >> stat3.md
printf "| %s | %s | %s | %s |\n" \
$(echo "Genome"; faops n50 -H -S -C 1_genome/genome.fa;) >> stat3.md
printf "| %s | %s | %s | %s |\n" \
$(echo "Paralogs"; faops n50 -H -S -C 1_genome/paralogs.fas;) >> stat3.md
printf "| %s | %s | %s | %s |\n" \
$(echo "anchor.merge"; faops n50 -H -S -C merge/anchor.merge.fasta;) >> stat3.md
printf "| %s | %s | %s | %s |\n" \
$(echo "others.merge"; faops n50 -H -S -C merge/others.merge.fasta;) >> stat3.md
printf "| %s | %s | %s | %s |\n" \
$(echo "anchor.cover"; faops n50 -H -S -C merge/anchor.cover.fasta;) >> stat3.md
printf "| %s | %s | %s | %s |\n" \
$(echo "anchorLong"; faops n50 -H -S -C anchorLong/contig.fasta;) >> stat3.md
printf "| %s | %s | %s | %s |\n" \
$(echo "contigTrim"; faops n50 -H -S -C contigTrim/contig.fasta;) >> stat3.md
cat stat3.md
```
| Name | N50 | Sum | # |
|:-------------|--------:|--------:|---:|
| Genome | 3288558 | 5165770 | 2 |
| Paralogs | 3333 | 155714 | 62 |
| anchor.merge | 175008 | 5047543 | 73 |
| others.merge | 20100 | 20100 | 1 |
| anchor.cover | 175008 | 5024576 | 78 |
| anchorLong | 208183 | 5023596 | 63 |
| contigTrim | 1488730 | 5148305 | 11 |
* Clear QxxLxxx.
```bash
BASE_DIR=$HOME/data/anchr/Vpar
cd ${BASE_DIR}
rm -fr 2_illumina/Q{20,25,30}L*
rm -fr Q{20,25,30}L*
```
# Legionella pneumophila subsp. pneumophila ATCC 33152D-5; Philadelphia-1, åèºåå¢è
Project
[SRP040661](https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP040661)
## Lpne: download
* Reference genome
* Strain: Legionella pneumophila subsp. pneumophila str. Philadelphia 1
* Taxid: [272624](https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=272624&lvl=3&lin=f&keep=1&srchmode=1&unlock)
* RefSeq assembly accession:
[GCF_000008485.1](ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/008/485/GCF_000008485.1_ASM848v1/GCF_000008485.1_ASM848v1_assembly_report.txt)
* Proportion of paralogs (> 1000 bp): 0.0264
```bash
mkdir -p ~/data/anchr/Lpne/1_genome
cd ~/data/anchr/Lpne/1_genome
aria2c -x 9 -s 3 -c ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/008/485/GCF_000008485.1_ASM848v1/GCF_000008485.1_ASM848v1_genomic.fna.gz
TAB=$'\t'
cat <<EOF > replace.tsv
NC_002942.5${TAB}1
EOF
faops replace GCF_000008485.1_ASM848v1_genomic.fna.gz replace.tsv genome.fa
cp ~/data/anchr/paralogs/otherbac/Results/Lpne/Lpne.multi.fas paralogs.fas
```
* Illumina
* [SRX2179279](https://www.ncbi.nlm.nih.gov/sra/SRX2179279) SRR4272054
```bash
mkdir -p ~/data/anchr/Lpne/2_illumina
cd ~/data/anchr/Lpne/2_illumina
cat << EOF > sra_ftp.txt
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR427/004/SRR4272054/SRR4272054_1.fastq.gz
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR427/004/SRR4272054/SRR4272054_2.fastq.gz
EOF
aria2c -x 9 -s 3 -c -i sra_ftp.txt
cat << EOF > sra_md5.txt
6391a189c30acde364eb553e1f592a81 SRR4272054_1.fastq.gz
67ec48fd2c37e09b35f232f262c46d15 SRR4272054_2.fastq.gz
EOF
md5sum --check sra_md5.txt
ln -s SRR4272054_1.fastq.gz R1.fq.gz
ln -s SRR4272054_2.fastq.gz R2.fq.gz
```
* PacBio
```bash
mkdir -p ~/data/anchr/Lpne/3_pacbio
cd ~/data/anchr/Lpne/3_pacbio
# download from sra
cat <<EOF > hdf5.txt
http://sra-download.ncbi.nlm.nih.gov/srapub_files/SRR4272055_SRR4272055_hdf5.tgz
http://sra-download.ncbi.nlm.nih.gov/srapub_files/SRR4272056_SRR4272056_hdf5.tgz
http://sra-download.ncbi.nlm.nih.gov/srapub_files/SRR4272057_SRR4272057_hdf5.tgz
EOF
doc/bacteria_2_3.md view on Meta::CPAN
cat << EOF > sra_ftp.txt
EOF
aria2c -x 9 -s 3 -c -i sra_ftp.txt
cat << EOF > sra_md5.txt
EOF
md5sum --check sra_md5.txt
ln -s SRR4244665_1.fastq.gz R1.fq.gz
ln -s SRR4244665_2.fastq.gz R2.fq.gz
```
# Clostridioides difficile 630
Project
[SRP040661](https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP040661)
## Cdif: download
* Reference genome
* Strain: Clostridioides difficile 630
* Taxid: [272563](https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=272563)
* RefSeq assembly accession:
[GCF_000009205.1](ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/009/205/GCF_000009205.1_ASM920v1/GCF_000009205.1_ASM920v1_assembly_report.txt)
* Proportion of paralogs (> 1000 bp): 0.0661
```bash
mkdir -p ~/data/anchr/Cdif/1_genome
cd ~/data/anchr/Cdif/1_genome
aria2c -x 9 -s 3 -c ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/009/205/GCF_000009205.1_ASM920v1/GCF_000009205.1_ASM920v1_genomic.fna.gz
TAB=$'\t'
cat <<EOF > replace.tsv
NC_009089.1${TAB}1
NC_008226.1${TAB}pCD630
EOF
faops replace GCF_000009205.1_ASM920v1_genomic.fna.gz replace.tsv genome.fa
cp ~/data/anchr/paralogs/otherbac/Results/Cdif/Cdif.multi.fas paralogs.fas
```
SRX2107163
# Campylobacter jejuni subsp. jejuni ATCC 700819, 空è å¼¯æ²æè
Project
[SRP040661](https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP040661)
## Cjej: download
* Reference genome
* Strain: Campylobacter jejuni subsp. jejuni NCTC 11168 = ATCC 700819
* Taxid: [192222](https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=192222&lvl=3&lin=f&keep=1&srchmode=1&unlock)
* RefSeq assembly accession:
[GCF_000009085.1](ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/009/085/GCF_000009085.1_ASM908v1/GCF_000009085.1_ASM908v1_assembly_report.txt)
* Proportion of paralogs (> 1000 bp): 0.0196
```bash
mkdir -p ~/data/anchr/Cjej/1_genome
cd ~/data/anchr/Cjej/1_genome
aria2c -x 9 -s 3 -c ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/009/085/GCF_000009085.1_ASM908v1/GCF_000009085.1_ASM908v1_genomic.fna.gz
TAB=$'\t'
cat <<EOF > replace.tsv
NC_002163.1${TAB}1
EOF
faops replace GCF_000009085.1_ASM908v1_genomic.fna.gz replace.tsv genome.fa
cp ~/data/anchr/paralogs/otherbac/Results/Cjej/Cjej.multi.fas paralogs.fas
```
SRX2107012
( run in 1.276 second using v1.01-cache-2.11-cpan-524268b4103 )