Snakemake中输入函数的并行输出 [英] Parallelise output of input function in Snakemake

查看:181
本文介绍了Snakemake中输入函数的并行输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

你好Snakemake社区,

Hello Snakemake community,

在Snakemake中正确定义功能并在 params 部分中调用它时,我遇到了很多麻烦.该函数的输出是一个列表,我的目标是将列表的每一项用作shell命令的参数.换句话说,我想使用不同的参数并行运行同一外壳命令的多个作业.

I am having quite some troubles to define correctly a function in Snakemake and call it in the params section. The output of the function is a list and my aim is to use each item of the list as a parameter of a shell command. In other words, I would like to run multiple jobs in parallel of the same shell command with a different parameter.

这是功能:

import os, glob
def get_scontigs_names(wildcards):
   scontigs = glob.glob(os.path.join("reference", "Supercontig*"))
   files = [os.path.basename(s) for s in scontigs]
   return name

输出是一个看起来像这样的列表:

The output is a list that looks like:

['Supercontig0','Supercontig100','Supercontig2',...]

Snakemake规则是:

The Snakemake rules are:

rule all:
    input:
        "updated/all_supercontigs.sorted.vcf.gz"
rule update_vcf:
    input:
        len="genome/genome_contigs_len_cumsum.txt",
        vcf="filtered/all.vcf.gz"
    output:
        cat="updated/all_supercontigs.updated.list"
    params:
        scaf=get_scontigs_names
    shell:
        """
        python 3.7 scripts/update_genomic_reg.py -len {input.len} -vcf {input.vcf} -scaf {params.scaf}
        ls updated/*.updated.vcf.gz > {output.cat}
        """

此代码是不正确的,因为当我调用 {params.scaf} 时,列表中的所有项目都已加载到shell命令中.当前的shell命令如下:

This code is incorrect because all the items of the list are loaded into the shell command when I call {params.scaf}. The current shell commands looks like:

python 3.7脚本/update_genomic_reg.py -len基因组/genome_contigs_len_cumsum.txt -vcf已过滤/all.vcf.gz -scaf Supercontig0 Supercontig100 Supercontig2 ...

我想要得到的是:*

python 3.7脚本/update_genomic_reg.py -len基因组/genome_contigs_len_cumsum.txt -vcf已过滤/all.vcf.gz -scaf Supercontig0

python 3.7脚本/update_genomic_reg.py -len基因组/genome_contigs_len_cumsum.txt -vcf已过滤/all.vcf.gz -scaf Supercontig100

以此类推.

我尝试在函数内使用通配符,但是我没有赋予它正确的属性.

I have tried to use wildcards inside the function but I am failing to give it the correct attribute.

关于输入函数和通配符,还有snakemake文档,有好几篇文章,但我无法真正将它们应用于我的案例.有人可以帮我吗?

There are several posts about input functions and wildcards plus the snakemake docs but I could not really apply them to my case. Can somebody help me with this, please?

推荐答案

我发现了受@dariober启发的问题的解决方案.

I have found the solution to my question inspired by @dariober.

rule all:
input:
    "updated/all_supercontigs.updated.list"

import os, glob

def get_scontigs_names(wildcards):
    scontigs = glob.glob(os.path.join("reference", "Supercontig*"))
    files = [os.path.basename(s) for s in scontigs]
    name = [i.split('_')[0] for i in files]
    return name

rule update_vcf:
    input:
        len="genome/genome_contigs_len_cumsum.txt",
        vcf="filtered/all.vcf.gz"
    output:
        vcf="updated/all_{supercontig}.updated.vcf.gz"
    params:
        py3=config["modules"]["py3"],
        scaf=get_scontigs_names
    shell:
        """
        {params.py3} scripts/update_genomic_reg.py -len {input.len} -vcf 
        {input.vcf} -scaf {wildcards.supercontig}
        """


rule list_updated:
    input:
        expand("updated/all_{supercontig}.updated.vcf.gz", supercontig = 
        supercontigs)
    output:
        "updated/all_supercontigs.updated.list"
    shell:
        """
        ls {input} > {output}
        """

这篇关于Snakemake中输入函数的并行输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆