App-Anchr
view release on metacpan or search on metacpan
doc/dazzler.md view on Meta::CPAN
# dazzler: basic usage of dazz_db and daligner
## dazz_db
`DBsplit`
* If the `-x` option is set then all reads less than the given length are ignored.
* Each block is of size `-s` * 1 Mbp except for the last.
### Rename sequences for dazzler
Create two files, `renamed.fasta`, `stdout.replace.tsv`.
```bash
mkdir -p ~/data/test/dazzler
cd ~/data/test/dazzler
cat ~/data/anchr/iso_1/merge/anchor.merge.fasta \
| anchr dazzname stdin -o stdout \
| faops filter -l 0 stdin renamed.fasta
```
### Create and split DB
`myDB.db` and its hidden companions.
```bash
cd ~/data/test/dazzler
echo "Make the dazzler DB"
DBrm myDB
fasta2DB myDB renamed.fasta
DBdust myDB
# each block is of size 50 MB
DBsplit -s50 myDB
BLOCK_NUMBER=$(cat myDB.db | perl -nl -e '/^blocks\s+=\s+(\d+)/ and print $1')
echo ${BLOCK_NUMBER}
```
### Retrieve some records from DB
* If the `-n` option is set then the DNA sequence is **not** displayed
```bash
cd ~/data/test/dazzler
# headers
DBshow -n myDB 1-10 102 100-101
# sequences from the original file
faops some -l 0 renamed.fasta <(DBshow -n myDB 1-10 102 100-101 | sed 's/^>//') stdout
```
## daligner
`HPC.daligner`
* local alignments involving at least `-l` base pairs (default 1000)
* An average correlation rate of `-e` (default 70%) set to 80%
* The default number of threads is 4, set by `-T` option (power of 2)
* Set the `-t` parameter which suppresses the use of any *k*-mer that occurs more than *t* times in
either the subject or target block.
* Let the program automatically select a value of *t* that meets a given memory usage limit
specified (in Gb) by the `-M` parameter
* one or more interval tracks specified with the `-m` option (m for mask)
### Create jobs by `HPC.daligner` and execute it
Three .las (`myDB.[1-3].las`) files are generated then concatenated to `myDB.las`.
```bash
cd ~/data/test/dazzler
if [[ -e myDB.las || -e myDB.1.las ]]; then
rm myDB*.las
fi
HPC.daligner -v -M16 -e.96 -l500 -s500 -mdust myDB > job.sh
bash job.sh
LAcat -v myDB.#.las > myDB.las
```
Contents of `job.sh`
```bash
# Daligner jobs (3)
daligner -v -e0.96 -l500 -s500 -M16 -mdust myDB.1 myDB.1
daligner -v -e0.96 -l500 -s500 -M16 -mdust myDB.2 myDB.1 myDB.2
daligner -v -e0.96 -l500 -s500 -M16 -mdust myDB.3 myDB.1 myDB.2 myDB.3
# Check initial .las files jobs (3) (optional but recommended)
LAcheck -vS myDB myDB.1.myDB.1 myDB.1.myDB.2 myDB.1.myDB.3
LAcheck -vS myDB myDB.2.myDB.1 myDB.2.myDB.2 myDB.2.myDB.3
LAcheck -vS myDB myDB.3.myDB.1 myDB.3.myDB.2 myDB.3.myDB.3
# Level 1 merge jobs (3)
LAmerge -v myDB.1 myDB.1.myDB.1 myDB.1.myDB.2 myDB.1.myDB.3
LAmerge -v myDB.2 myDB.2.myDB.1 myDB.2.myDB.2 myDB.2.myDB.3
LAmerge -v myDB.3 myDB.3.myDB.1 myDB.3.myDB.2 myDB.3.myDB.3
# Check level 2 .las files jobs (3) (optional but recommended)
LAcheck -vS myDB myDB.1
LAcheck -vS myDB myDB.2
LAcheck -vS myDB myDB.3
# Remove level 1 .las files (optional)
rm myDB.1.myDB.1.las myDB.1.myDB.2.las myDB.1.myDB.3.las
rm myDB.2.myDB.1.las myDB.2.myDB.2.las myDB.2.myDB.3.las
rm myDB.3.myDB.1.las myDB.3.myDB.2.las myDB.3.myDB.3.las
```
The 3 lines of daligner are equivalent to the following:
```bash
daligner -v -e0.96 -l500 -s500 -M16 -mdust myDB.1 myDB.1
daligner -v -e0.96 -l500 -s500 -M16 -mdust myDB.1 myDB.2
daligner -v -e0.96 -l500 -s500 -M16 -mdust myDB.1 myDB.3
daligner -v -e0.96 -l500 -s500 -M16 -mdust myDB.2 myDB.2
daligner -v -e0.96 -l500 -s500 -M16 -mdust myDB.2 myDB.3
daligner -v -e0.96 -l500 -s500 -M16 -mdust myDB.3 myDB.3
```
Results.
( run in 1.043 second using v1.01-cache-2.11-cpan-39bf76dae61 )