Snakemake-以特定于规则的方式覆盖LSF(bsub)群集配置 [英] Snakemake - Override LSF (bsub) cluster config in a rule-specific manner

查看:342
本文介绍了Snakemake-以特定于规则的方式覆盖LSF(bsub)群集配置的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以在集群配置文件中定义内存和资源的默认设置,然后在需要时以规则特定的方式覆盖?规则中的资源字段是否直接绑定到群集配置文件?还是出于可读性目的, params 字段的一种奇特方式?

Is it possible to define default settings for memory and resources in cluster config file, and then override in rule specific manner, when needed? Is resources field in rules directly tied to cluster config file? Or is it just a fancy way for params field for readability purposes?

在下面的示例中,我如何对规则a 使用默认群集配置,但使用自定义更改( memory = 40000 rusage = 15000 )在规则b 中?

In the example below, how do I use default cluster configs for rule a, but use custom changes (memory=40000 and rusage=15000) in rule b?

cluster.json:

{
    "__default__":
    {
        "memory": 20000,
        "resources": "\"rusage[mem=8000] span[hosts=1]\"",
        "output": "logs/cluster/{rule}.{wildcards}.out",
        "error": "logs/cluster/{rule}.{wildcards}.err"
    },
}

Snakefile:

rule all:
    'a_out.txt', 'b_out.txt'

rule a:
    input:
        'a.txt'
    output:
        'a_out.txt'
    shell:
        'touch {output}'

rule b:
    input:
        'b.txt'
    output:
        'b_out.txt'
    shell:
        'touch {output}'

执行命令:

 snakemake --cluster-config cluster.json 
           --cluster "bsub -M {cluster.memory} -R {cluster.resources} -o logs.txt" 
           -j 50

我知道可以在集群配置文件中定义特定于规则的资源要求,但是,如果可能的话,我宁愿直接在Snakefile中定义它们。

I understand that it is possible to define rule specific resources requirements in cluster config file, but I would prefer to define them directly in Snakefile, if possible.

否则,如果有更好的实现方法,请告诉我。

Or else, if there is a better way of implementing this, please let me know.

推荐答案

new.cluster.json 您实际上可以为特定规则定义资源。因此,根据您的情况,您可以执行以下操作

In new.cluster.json you can actually define resources for specific rules. So in your case you would do the following

{
    "__default__":
    {
        "memory": 20000,
        "resources": "\"rusage[mem=8000] span[hosts=1]\"",
        "output": "logs/cluster/{rule}.{wildcards}.out",
        "error": "logs/cluster/{rule}.{wildcards}.err"
    },
    "b":
    {
        "memory": 40000,
        "resources": "\"rusage[mem=15000] span[hosts=1]\"",
        "output": "logs/cluster/{rule}.{wildcards}.out",
        "error": "logs/cluster/{rule}.{wildcards}.err"
    },
}

然后在 Snakefile 中可以参考这些导入 new.cluster.json 并在您的规则中引用它

Then in the Snakefile you can refer to these resources by importing new.cluster.json and referring to it in your rule

import json

with open('new.cluster.json') as fh:
    cluster_config = json.load(fh)

rule all:
    'a_out.txt' , 'b_out.txt'

rule a:
    input:
        'a.txt'
    output:
        'a_out.txt'
    shell:
        'touch {output}'
rule b:
    input:
        'b.txt'
    output:
        'b_out.txt'
    resources:
        mem_mb=cluster_config["b"]["memory"]
    shell:
        'touch {output}'

如果您通过此存储库,您可以看到我如何在野外使用这些集群配置。

If you take a look through this repository, you can see how I use these cluster configs in the wild.

这篇关于Snakemake-以特定于规则的方式覆盖LSF(bsub)群集配置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆