最后执行一定的规则 [英] Execute certain rule at the very end
问题描述
我目前正在编写一个 Snakefile,它进行了大量的对齐后质量控制(CollectInsertSizeMetics、CollectAlignmentSummaryMetrics、CollectGcBiasMetrics
、...).在 Snakefile 的最后,我正在运行 multiQC 以将所有指标合并到一个 html 报告中.
I am currently writing a Snakefile, which does a lot of post-alignment quality control (CollectInsertSizeMetics, CollectAlignmentSummaryMetrics, CollectGcBiasMetrics
, ...).
At the very end of the Snakefile, I am running multiQC to combine all the metrics in one html report.
我知道如果我使用规则 A 的输出作为规则 B 的输入,规则 B 只会在规则 A 完成后执行.我的问题是 multiQC 的输入是一个目录,从一开始就存在.在此目录中,multiQC 将搜索某些文件,然后创建报告.如果我当前正在执行我的 Snakemake 文件,multiQC 将在执行所有质量控制之前执行(例如 fastqc
需要相当长的时间),因此最终报告中缺少这些.
I know that if I use the output of rule A as input of rule B, rule B will only be executed after rule A is finished.
The problem in my case is that the input of multiQC is a directory, which exists right from the start. Inside of this directory, multiQC will search for certain files and then create the report.
If I am currently executing my Snakemake file, multiQC will be executed before all quality controls will be performed (e.g. fastqc
takes quite some time), thus these are missing in the final report.
所以我的问题是,如果有一个选项,指定最后执行某个规则.我知道我可以使用 --wait-for-files
来等待某个 fastqc
报告,但这似乎非常不灵活.
So my question is, if there is an option, that specifies that a certain rule is executed last.
I know that I could use --wait-for-files
to wait for a certain fastqc
report, but that seems very inflexible.
当前的最后一条规则如下所示:
The last rule currently looks like this:
rule multiQC:
input:
input_dir = "post-alignment-qc"
output:
output_html="post-alignment-qc/multiQC/mutliqc-report.html"
log:
err='post-alignment-qc/logs/fastQC/multiqc_stderr.err'
benchmark:
"post-alignment-qc/benchmark/multiQC/multiqc.tsv"
shell:
"multiqc -f -n {output.output_html} {input.input_dir} 2> {log.err}"
感谢任何帮助!
推荐答案
您可以将各个 QC 规则生成的文件提供给 multiqc
规则的输入.这样,一旦所有这些文件都可用,multiqc 就会启动:
You could give to the input of multiqc
rule the files produced by the individual QC rules. In this way, multiqc will start once all those files are available:
samples = ['a', 'b', 'c']
rule collectInsertSizeMetrics:
input:
'{sample}.bam',
output:
'post-alignment-qc/{sample}.insertSizeMetrics.txt' # <- Some file produced by CollectInsertSizeMetrics
shell:
"CollectInsertSizeMetics {input} > {output}"
rule CollectAlignmentSummaryMetrics:
output:
'post-alignment-qc/{sample}.CollectAlignmentSummaryMetrics.txt'
rule multiqc:
input:
expand('post-alignment-qc/{sample}.insertSizeMetrics.txt', sample=samples),
expand('post-alignment-qc/{sample}.CollectAlignmentSummaryMetrics.txt', sample=samples),
shell:
"multiqc -f -n {output.output_html} post-alignment-qc 2> {log.err}"
这篇关于最后执行一定的规则的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!