snakemake中的未知输出 [英] unknown output in snakemake

查看：70 发布时间：2021/4/15 19:47:03 bioinformatics snakemake

本文介绍了snakemake中的未知输出的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在用snakemake实现一个非常简单的管道，希望用一个有凝聚力的Snakefile代替一连串烦人的bash脚本.

I'm working on implementing a very simple pipeline in snakemake in hopes of replacing a chain of annoying bash scripts with one cohesive Snakefile.

我在编写将文件分割成较小的规则(使用GNU split)，然后导致输出连接在一起的第二条规则时遇到了麻烦.

I'm having trouble writing a rule that splits a file into smaller pieces (using GNU split), and then leads to a second rule where the output is concatenated together.

我不知道在concat步骤中为输入内容写什么，因为我不知道如何定义所有符合模式 bam_files/test * 的文件.我尝试使用glob，但是绝对不能正常工作(似乎实际上是在跳过包含glob的拆分).有什么更好的办法可以做到这一点吗?

I don't know what to write for the input in the concat step, since I don't know how to define all the files fitting the pattern bam_files/test*. I tried with glob, but that decidedly doesn't seem to work (it seems like it's actually skipping split altogether with the glob included). Is there any better way that I could be doing this?

# test snakemake pipeline
import glob


SAMPLE_IDS = ["test"]

rule all: 
    input: 
        expand("bam_files/{FASTQ}.out", FASTQ=SAMPLE_IDS)


rule split: 
    input: 
        expand("{FASTQ}.txt", FASTQ=SAMPLE_IDS)
    output: 
        "bam_files/{FASTQ}."
    shell:
        "cat {input} | split -l 1000 -d - {output}."


rule concat: 
    input:
        split_files = glob.glob("bam_files/{FASTQ}.*")
    output: 
        "bam_files/{FASTQ}.out"
    shell: 
        "cat {input} > {output}"

推荐答案

我认为这应该有效:

SAMPLE_IDS = ["test"]

rule all: 
    input: 
        expand("bam_files/{FASTQ}.out", FASTQ=SAMPLE_IDS)


rule split: 
    input: 
        "{FASTQ}.txt"
    output: 
        dynamic("bam_files/{FASTQ}.{PART}")
    params:
        length=1000
    shell:
        "cat {input} | split -l {params.length} -d - bam_files/{FASTQ}."


rule concat: 
    input:
        split_files = dynamic("bam_files/{FASTQ}.{PART}")
    output: 
        "bam_files/{FASTQ}.out"
    shell: 
        "cat {input} > {output}"

看来 split 规则应该一次获取一个文件 {FASTQ} .txt 并生成 {FASTQ} .1，{FASTQ}.2，... 或类似的内容.因为您不预先知道它将产生多少个文件，所以您需要对 split.output 和 concat.input <都使用 dynamic()/code>.


It looks like the split rule should be taking one file {FASTQ}.txt at a time and producing {FASTQ}.1, {FASTQ}.2, ... or something similar. Because you don't know ahead of time how many files it will produce, you need to use dynamic() for both split.output and concat.input.

                        这篇关于snakemake中的未知输出的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

snakemake中的未知输出 [英] unknown output in snakemake

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

snakemake中的未知输出 [英] unknown output in snakemake

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭