App-Anchr

 view release on metacpan or  search on metacpan

doc/pacbio_consensus.md  view on Meta::CPAN


试运行.

```bash
source ~/share/pitchfork/deployment/setup-env.sh

quiver --help
```

## 直接安装 falcon-integrate, 现在不推荐

[wiki page](https://github.com/PacificBiosciences/FALCON-integrate/wiki/Installation)

```bash
mkdir -p $HOME/share
cd $HOME/share

git clone git://github.com/PacificBiosciences/FALCON-integrate.git
cd FALCON-integrate
git checkout master  # or whatever version you want
make init
source env.sh
make config-edit-user
make -j all

# Test data stored in dropbox. f* gfw
# make test
```

编译完成后, 会生成`fc_env`目录, 里面是可执行文件. `tree -L 2 fc_env`, `6 directories, 79 files`.

# falcon 样例数据

falcon-examples 里的数据是通过一个小众程序 `git-sym` 从 dropbox 下载的, 在墙内无法按说明文件里的提示来使用.

同时其内的很多设置都是写死的集群路径, 以及 sge 配置, 大大增加了复杂度, 并让人无法理解.

注意:

* fasta 文件 **必须** 以 `.fasta` 为扩展名
* fasta 文件中的序列名称, 必须符合 falcon (fasta2DB of dazz_db) 的要求, 即 sra 默认名称**不符合要求**,
  错误提示为 `Pacbio header line format error`
* [这里](https://github.com/PacificBiosciences/FALCON/issues/251)有个脚本帮助解决这个问题. 已经放到本地,
  `falcon_name_fasta.pl`

* Clear intermediate dirs

    ```bash
    find $HOME/data/pacbio -type d -name 'm_*' | xargs rm -fr
    find $HOME/data/pacbio -type d -name 'job_*' | xargs rm -fr
    ```

## `falcon/example` 里的 [*E. coli* 样例](https://github.com/PacificBiosciences/FALCON/wiki/Setup:-Complete-example).

* 过墙下载以下三个文件

```bash
mkdir -p $HOME/data/pacbio/rawdata/ecoli_test
cd $HOME/data/pacbio/rawdata/ecoli_test

proxychains4 wget -c https://www.dropbox.com/s/tb78i5i3nrvm6rg/m140913_050931_42139_c100713652400000001823152404301535_s1_p0.1.subreads.fasta
proxychains4 wget -c https://www.dropbox.com/s/v6wwpn40gedj470/m140913_050931_42139_c100713652400000001823152404301535_s1_p0.2.subreads.fasta
proxychains4 wget -c https://www.dropbox.com/s/j61j2cvdxn4dx4g/m140913_050931_42139_c100713652400000001823152404301535_s1_p0.3.subreads.fasta

# N50 14124
# C   105451
faops n50 -C *.subreads.fasta
```

* 配置文件及运行

```bash
source ~/share/pitchfork/deployment/setup-env.sh

if [ -d $HOME/data/pacbio/ecoli_test ];
then
    rm -fr $HOME/data/pacbio/ecoli_test
fi
mkdir -p $HOME/data/pacbio/ecoli_test
cd $HOME/data/pacbio/ecoli_test
find $HOME/data/pacbio/rawdata/ecoli_test -name "*.fasta" > input.fofn

# https://github.com/PacificBiosciences/FALCON/blob/master/examples/fc_run_ecoli.cfg
cat <<EOF > fc_run.cfg
[General]
job_type = local

# list of files of the initial bas.h5 files
input_fofn = input.fofn

input_type = raw
#input_type = preads

# The length cutoff used for seed reads used for initial mapping
length_cutoff = 12000

# The length cutoff used for seed reads used for pre-assembly
length_cutoff_pr = 12000

# Cluster queue setting
sge_option_da =
sge_option_la =
sge_option_pda =
sge_option_pla =
sge_option_fc =
sge_option_cns =

pa_concurrent_jobs = 4
ovlp_concurrent_jobs = 4

pa_HPCdaligner_option =  -v -B4 -t16 -e.70 -l1000 -s1000
ovlp_HPCdaligner_option = -v -B4 -t32 -h60 -e.96 -l500 -s1000

pa_DBsplit_option = -x500 -s50
ovlp_DBsplit_option = -x500 -s50

falcon_sense_option = --output_multi --min_idt 0.70 --min_cov 4 --max_n_read 200 --n_core 2

overlap_filtering_setting = --max_diff 100 --max_cov 100 --min_cov 20 --bestn 10 --n_core 2

EOF

# macOS, i7-6700k, 32G RAM, SSD



( run in 2.390 seconds using v1.01-cache-2.11-cpan-f56aa216473 )