Snakemake:按染色体分裂后未知的输出/输入文件 [英] Snakemake: unknown output/input files after splitting by chromosome

查看:156
本文介绍了Snakemake:按染色体分裂后未知的输出/输入文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为了加快某些蛇行步骤,我想:

To speed up a certain snakemake step I would like to:

  • 使用
    每条染色体分割我的bam文件 bamtools split -in sample.bam --reference
    这将导致文件名为sample.REF_{chromosome}.bam
  • 对每个执行
  • 变体调用会导致例如sample.REF_{chromosome}.vcf
  • 使用
    使用vcf-concat(VCFtools)重组获得的vcf文件 vcf-concat file1.vcf file2.vcf file3.vcf > sample.vcf
  • split my bamfile per chromosome using
    bamtools split -in sample.bam --reference
    this results in files named as sample.REF_{chromosome}.bam
  • perform variant calling on each resulting in e.g. sample.REF_{chromosome}.vcf
  • recombine the obtained vcf files using vcf-concat (VCFtools) using
    vcf-concat file1.vcf file2.vcf file3.vcf > sample.vcf

问题是我不知道bam文件中可能存在哪些染色体.因此,我无法准确指定bamtools split的输出.此外,我不确定如何输入vcf-concat来获取所有vcf文件.

The problem is that I don't know a priori which chromosomes may be in my bam file. So I cannot specify accurately the output of bamtools split. Furthermore, I'm not sure how to make the input of vcf-concat to take all vcf files.

我想到要使用samples.fofn并做类似的事情

I thought of using a samples.fofn and do something like

rule split_bam:
    input:
        bam = "alignment/{sample}.bam",
        pattern = "alignment/{sample}.REF_"
    output:
        alignment/anon.splitbams.fofn
    log:
        "logs/bamtools_split/{sample}.log"
    shell:
        "bamtools split -in {input.bam} -reference && \
         ls alignment/{input.pattern}*.bam | sed 's/.bam/.vcf/' > {output}"

并使用相同的fofn串联所获得的vcf文件.但这感觉很笨拙,我很感谢您的建议.

And use the same fofn for concatenating the obtained vcf files. But this feels like a very awkward hack and I'd appreciate your suggestions.

编辑20180409

EDIT 20180409

正如@jeeyem所建议的那样,我尝试了dynamic()函数,但无法弄清楚.

As suggested by @jeeyem I tried the dynamic() functions, but I can't figure it out.

我完整的蛇文件在 GitHub 上,动态部分位于第99-133行.

My complete snakefile is on GitHub, the dynamic part is at lines 99-133.

我得到的错误是: InputFunctionException in line 44 of /home/wdecoster/DR34/SV-nanopore.smk: KeyError: 'anon___snakemake_dynamic' Wildcards: sample=anon___snakemake_dynamic (带有anon匿名的{sample}标识符)

The error I get is: InputFunctionException in line 44 of /home/wdecoster/DR34/SV-nanopore.smk: KeyError: 'anon___snakemake_dynamic' Wildcards: sample=anon___snakemake_dynamic (with anon an anonymized {sample} identifier)

使用--debug-dag运行可给出(错误之前的最后一部分): candidate job cat_vcfs wildcards: sample=anon candidate job nanosv wildcards: sample=anon___snakemake_dynamic, chromosome=_ candidate job samtools_index wildcards: aligner=split_ngmlr, sample=anon___snakemake_dynamic.REF__ candidate job split_bam wildcards: sample=anon___snakemake_dynamic, chromosome=_ InputFunctionException in line 44 of /home/wdecoster/DR34/SV-nanopore.smk: KeyError: 'anon___snakemake_dynamic' Wildcards: sample=anon___snakemake_dynamic

Running with --debug-dag gives (last parts before erroring): candidate job cat_vcfs wildcards: sample=anon candidate job nanosv wildcards: sample=anon___snakemake_dynamic, chromosome=_ candidate job samtools_index wildcards: aligner=split_ngmlr, sample=anon___snakemake_dynamic.REF__ candidate job split_bam wildcards: sample=anon___snakemake_dynamic, chromosome=_ InputFunctionException in line 44 of /home/wdecoster/DR34/SV-nanopore.smk: KeyError: 'anon___snakemake_dynamic' Wildcards: sample=anon___snakemake_dynamic

哪个显示通配符被误解了?

Which shows that the wildcard is misinterpreted?

干杯, 哇

推荐答案

您可以从bam标头或相应的.fai文件中查找染色体名称,以用作参考.这可以在您的Snakefile的开头完成.然后,您可以使用expand("alignment/{{sample}}.REF_{chromosome}.bam", chromosome=chromosomes)定义该规则的输出文件.无需使用动态.

You can lookup the chromosome names from the bam header, or a corresponding .fai file for the used reference. This can be done at the beginning of your Snakefile. Then, you can use expand("alignment/{{sample}}.REF_{chromosome}.bam", chromosome=chromosomes) to define the output files of that rule. No need to use dynamic.

这篇关于Snakemake:按染色体分裂后未知的输出/输入文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆