最后执行一定的规则 [英] Execute certain rule at the very end

查看:33
本文介绍了最后执行一定的规则的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在编写一个 Snakefile,它进行了大量的对齐后质量控制(CollectInsertSizeMetics、CollectAlignmentSummaryMetrics、CollectGcBiasMetrics、...).在 Snakefile 的最后,我正在运行 multiQC 以将所有指标合并到一个 html 报告中.

I am currently writing a Snakefile, which does a lot of post-alignment quality control (CollectInsertSizeMetics, CollectAlignmentSummaryMetrics, CollectGcBiasMetrics, ...). At the very end of the Snakefile, I am running multiQC to combine all the metrics in one html report.

我知道如果我使用规则 A 的输出作为规则 B 的输入,规则 B 只会在规则 A 完成后执行.我的问题是 multiQC 的输入是一个目录,从一开始就存在.在此目录中,multiQC 将搜索某些文件,然后创建报告.如果我当前正在执行我的 Snakemake 文件,multiQC 将在执行所有质量控制之前执行(例如 fastqc 需要相当长的时间),因此最终报告中缺少这些.

I know that if I use the output of rule A as input of rule B, rule B will only be executed after rule A is finished. The problem in my case is that the input of multiQC is a directory, which exists right from the start. Inside of this directory, multiQC will search for certain files and then create the report. If I am currently executing my Snakemake file, multiQC will be executed before all quality controls will be performed (e.g. fastqc takes quite some time), thus these are missing in the final report.

所以我的问题是,如果有一个选项,指定最后执行某个规则.我知道我可以使用 --wait-for-files 来等待某个 fastqc 报告,但这似乎非常不灵活.

So my question is, if there is an option, that specifies that a certain rule is executed last. I know that I could use --wait-for-files to wait for a certain fastqc report, but that seems very inflexible.

当前的最后一条规则如下所示:

The last rule currently looks like this:

rule multiQC:
    input:
        input_dir = "post-alignment-qc"

    output:
        output_html="post-alignment-qc/multiQC/mutliqc-report.html"

    log:
        err='post-alignment-qc/logs/fastQC/multiqc_stderr.err'

    benchmark:
        "post-alignment-qc/benchmark/multiQC/multiqc.tsv"

    shell:
         "multiqc -f -n {output.output_html} {input.input_dir} 2> {log.err}"

感谢任何帮助!

推荐答案

您可以将各个 QC 规则生成的文件提供给 multiqc 规则的输入.这样,一旦所有这些文件都可用,multiqc 就会启动:

You could give to the input of multiqc rule the files produced by the individual QC rules. In this way, multiqc will start once all those files are available:

samples = ['a', 'b', 'c']
    
rule collectInsertSizeMetrics:
        input:
            '{sample}.bam',
        output:
            'post-alignment-qc/{sample}.insertSizeMetrics.txt' # <- Some file produced by CollectInsertSizeMetrics
        shell: 
            "CollectInsertSizeMetics {input} > {output}"
    
    rule CollectAlignmentSummaryMetrics:
        output: 
            'post-alignment-qc/{sample}.CollectAlignmentSummaryMetrics.txt'

    rule multiqc:
        input:
            expand('post-alignment-qc/{sample}.insertSizeMetrics.txt', sample=samples),
            expand('post-alignment-qc/{sample}.CollectAlignmentSummaryMetrics.txt', sample=samples),
        shell:
            "multiqc -f -n {output.output_html} post-alignment-qc 2> {log.err}"

这篇关于最后执行一定的规则的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆