Snakemake:如何有效地使用配置文件 [英] Snakemake: How to use config file efficiently

查看:2142
本文介绍了Snakemake:如何有效地使用配置文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 snakemake 中使用以下配置文件格式进行一些测序分析练习(我有大量样本,每个样本包含2个fastq文件:

I'm using the following config file format in snakemake for a some sequencing analysis practice (I have loads of samples each containing 2 fastq files:

samples:
Sample1_XY:
    - fastq_files/SRR4356728_1.fastq.gz
    - fastq_files/SRR4356728_2.fastq.gz
Sample2_AB:
    - fastq_files/SRR6257171_1.fastq.gz
    - fastq_files/SRR6257171_2.fastq.gz 

我在我的管道开始时使用以下规则来运行fastqc并对齐fastqc文件:

I'm using the following rules at the start of my pipeline to run fastqc and for alignment of the fastqc files:

import os
# read config info into this namespace
configfile: "config.yaml"

rule all:
    input:
    expand("FastQC/{sample}_fastqc.zip", sample=config["samples"]),
    expand("bam_files/{sample}.bam", sample=config["samples"]),
    "FastQC/fastq_multiqc.html"

rule fastqc:
    input:
        sample=lambda wildcards: config['samples'][wildcards.sample]
    output:
        # Output needs to end in '_fastqc.html' for multiqc to work
        html="FastQC/{sample}_fastqc.html",
        zip="FastQC/{sample}_fastqc.zip"
    params: ""
        wrapper:
        "0.21.0/bio/fastqc"

rule bowtie2:
    input:
         sample=lambda wildcards: config['samples'][wildcards.sample]
    output:
         "bam_files/{sample}.bam"
    log:
         "logs/bowtie2/{sample}.txt"
    params:
         index=config["index"],  # prefix of reference genome index (built with bowtie2-build),
    extra=""
         threads: 8
    wrapper:
         "0.21.0/bio/bowtie2/align"

 rule multiqc_fastq:
    input:
         expand("FastQC/{sample}_fastqc.html", sample=config["samples"])
    output:
         "FastQC/fastq_multiqc.html"
    params:
    log:
         "logs/multiqc.log"
    wrapper:
         "0.21.0/bio/multiqc"

我的问题在于fastqc规则。

My issue is with the fastqc rule.

目前fastqc规则和bowtie2规则都创建了一个使用两个输入生成的输出文件 SRRXXXXXXX_1.fastq.gz SRRXXXXXXX_2.fastq.gz

Currently both the fastqc rule and the bowtie2 rule create one output file generated using two inputs SRRXXXXXXX_1.fastq.gz and SRRXXXXXXX_2.fastq.gz.

我需要fastq规则来生成两个文件,每个 fastq.gz 文件都有一个文件但是我不确定如何从fastqc规则输入语句正确索引配置文件,或者如何组合expand和wildcards命令来解决这个问题。我可以通过在输入语句的末尾添加 [0] [1] 来获取单个fastq文件,但不是两个单独/分开运行。

I need the fastq rule to generate two files, a separate one for each of the fastq.gz files but I'm unsure how to index the config file correctly from the fastqc rule input statement, or how to combine the the expand and wildcards commands to solve this. I can get an individual fastq file by adding [0] or [1] to the end of the input statement, but not both run individually/separately.

我一直在努力获取正确的索引格式来分别访问每个文件。当前格式是我管理的唯一一个允许 snakemake -np 生成作业列表的格式。

I've been messing around trying to get the correct indexing format to access each file separately. The current format is the only one I've managed that allows snakemake -np to generate a job list.

任何提示都将不胜感激。

Any tips would be greatly appreciated.

推荐答案

sample将有两个fastq文件,它们的格式为 *** _ 1.fastq.gz *** _ 2.fastq.gz 。在这种情况下,下面的配置和代码将起作用。

It appears each sample would have two fastq files, and they are named in format ***_1.fastq.gz and ***_2.fastq.gz. In that case, config and code below would work.

config.yaml:

config.yaml:

samples:
    Sample_A: fastq_files/SRR4356728
    Sample_B: fastq_files/SRR6257171

Snakefile:

Snakefile:

# read config info into this namespace
configfile: "config.yaml"
print (config['samples'])

rule all:
    input:
        expand("FastQC/{sample}_{num}_fastqc.zip", sample=config["samples"], num=['1', '2']),
        expand("bam_files/{sample}.bam", sample=config["samples"]),
        "FastQC/fastq_multiqc.html"

rule fastqc:
    input:
        sample=lambda wildcards: f"{config['samples'][wildcards.sample]}_{wildcards.num}.fastq.gz"
    output:
        # Output needs to end in '_fastqc.html' for multiqc to work
        html="FastQC/{sample}_{num}_fastqc.html",
        zip="FastQC/{sample}_{num}_fastqc.zip"
    wrapper:
        "0.21.0/bio/fastqc"

rule bowtie2:
    input:
         sample=lambda wildcards: expand(f"{config['samples'][wildcards.sample]}_{{num}}.fastq.gz", num=[1,2])
    output:
         "bam_files/{sample}.bam"
    wrapper:
         "0.21.0/bio/bowtie2/align"

rule multiqc_fastq:
    input:
        expand("FastQC/{sample}_{num}_fastqc.html", sample=config["samples"], num=['1', '2'])
    output:
        "FastQC/fastq_multiqc.html"
    wrapper:
        "0.21.0/bio/multiqc"

这篇关于Snakemake:如何有效地使用配置文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆