App-Anchr

 view release on metacpan or  search on metacpan

doc/dazzler.md  view on Meta::CPAN

# dazzler: basic usage of dazz_db and daligner

## dazz_db

`DBsplit`

* If the `-x` option is set then all reads less than the given length are ignored.
* Each block is of size `-s` * 1 Mbp except for the last.

### Rename sequences for dazzler

Create two files, `renamed.fasta`, `stdout.replace.tsv`.

```bash
mkdir -p ~/data/test/dazzler
cd ~/data/test/dazzler

cat ~/data/anchr/iso_1/merge/anchor.merge.fasta \
    | anchr dazzname stdin -o stdout \
    | faops filter -l 0 stdin renamed.fasta
```

### Create and split DB

`myDB.db` and its hidden companions.

```bash
cd ~/data/test/dazzler

echo "Make the dazzler DB"
DBrm myDB
fasta2DB myDB renamed.fasta
DBdust myDB
# each block is of size 50 MB
DBsplit -s50 myDB

BLOCK_NUMBER=$(cat myDB.db | perl -nl -e '/^blocks\s+=\s+(\d+)/ and print $1')
echo ${BLOCK_NUMBER}
```

### Retrieve some records from DB

* If the `-n` option is set then the DNA sequence is **not** displayed

```bash
cd ~/data/test/dazzler

# headers
DBshow -n myDB 1-10 102 100-101

# sequences from the original file
faops some -l 0 renamed.fasta <(DBshow -n myDB 1-10 102 100-101 | sed 's/^>//') stdout
```

## daligner

`HPC.daligner`

* local alignments involving at least `-l` base pairs (default 1000)
* An average correlation rate of `-e` (default 70%) set to 80%
* The default number of threads is 4, set by `-T` option (power of 2)
* Set the `-t` parameter which suppresses the use of any *k*-mer that occurs more than *t* times in
  either the subject or target block.
* Let the program automatically select a value of *t* that meets a given memory usage limit
  specified (in Gb) by the `-M` parameter
* one or more interval tracks specified with the `-m` option (m for mask)

### Create jobs by `HPC.daligner` and execute it

Three .las (`myDB.[1-3].las`) files are generated then concatenated to `myDB.las`.

```bash
cd ~/data/test/dazzler

if [[ -e myDB.las || -e myDB.1.las ]]; then
    rm myDB*.las
fi
HPC.daligner -v -M16 -e.96 -l500 -s500 -mdust myDB > job.sh
bash job.sh

LAcat -v myDB.#.las > myDB.las
```

Contents of `job.sh`

```bash
# Daligner jobs (3)
daligner -v -e0.96 -l500 -s500 -M16 -mdust myDB.1 myDB.1
daligner -v -e0.96 -l500 -s500 -M16 -mdust myDB.2 myDB.1 myDB.2
daligner -v -e0.96 -l500 -s500 -M16 -mdust myDB.3 myDB.1 myDB.2 myDB.3
# Check initial .las files jobs (3) (optional but recommended)
LAcheck -vS myDB myDB.1.myDB.1 myDB.1.myDB.2 myDB.1.myDB.3
LAcheck -vS myDB myDB.2.myDB.1 myDB.2.myDB.2 myDB.2.myDB.3
LAcheck -vS myDB myDB.3.myDB.1 myDB.3.myDB.2 myDB.3.myDB.3
# Level 1 merge jobs (3)
LAmerge -v myDB.1 myDB.1.myDB.1 myDB.1.myDB.2 myDB.1.myDB.3
LAmerge -v myDB.2 myDB.2.myDB.1 myDB.2.myDB.2 myDB.2.myDB.3
LAmerge -v myDB.3 myDB.3.myDB.1 myDB.3.myDB.2 myDB.3.myDB.3
# Check level 2 .las files jobs (3) (optional but recommended)
LAcheck -vS myDB myDB.1
LAcheck -vS myDB myDB.2
LAcheck -vS myDB myDB.3
# Remove level 1 .las files (optional)
rm myDB.1.myDB.1.las myDB.1.myDB.2.las myDB.1.myDB.3.las
rm myDB.2.myDB.1.las myDB.2.myDB.2.las myDB.2.myDB.3.las
rm myDB.3.myDB.1.las myDB.3.myDB.2.las myDB.3.myDB.3.las
```

The 3 lines of daligner are equivalent to the following:

```bash
daligner -v -e0.96 -l500 -s500 -M16 -mdust myDB.1 myDB.1
daligner -v -e0.96 -l500 -s500 -M16 -mdust myDB.1 myDB.2
daligner -v -e0.96 -l500 -s500 -M16 -mdust myDB.1 myDB.3
daligner -v -e0.96 -l500 -s500 -M16 -mdust myDB.2 myDB.2
daligner -v -e0.96 -l500 -s500 -M16 -mdust myDB.2 myDB.3
daligner -v -e0.96 -l500 -s500 -M16 -mdust myDB.3 myDB.3
```

Results.



( run in 1.043 second using v1.01-cache-2.11-cpan-39bf76dae61 )