在Snakemake中使用多个文件名作为通配符 [英] Using multiple filenames as wildcards in Snakemake
问题描述
我正在尝试创建一个在snakemake
中实现bedtools
的规则,该规则将closest
一个文件,其中一堆文件位于另一个目录中.
I am trying to create a rule to implement bedtools
in snakemake
, which will closest
a file with bunch of files in another directory.
我所拥有的是/home/bedfiles
目录下的20张病床文件:
What I have is, under /home/bedfiles
directory, 20 bed files:
1A.bed , 2B_83.bed , 3f_33.bed ...
我想要的是在/home/bedfiles
目录下的20个修改过的床文件:
What I want is, under /home/bedfiles
directory, 20 modified bed files:
1A_modified, 2B_83_modified , 3f_33_modified ...
所以bash命令应该是:
So the bash command would be :
filelist='/home/bedfiles/*.bed'
for mfile in $filelist;
do
bedtools closest -a /home/other/merged.txt -b ${mfile} > ${mfile}_modified
因此,此命令将在/home/bedfiles
目录中创建扩展名为_modified
的文件.
So this command would make files with _modified
extension, in /home/bedfiles
directory.
我想用Snakemake
来实现,但是我一直遇到语法错误,我不知道如何解决.我的审判是:
I want to implement this with Snakemake
, however I keep having a syntax error, that I have no idea of how to fix. My trial is:
第一步:在目录中获取床文件的第一部分
FIRSTPART = [f.split(".")[0] for f in os.listdir("/home/bedfiles") if f.endswith('.bed')]
第2步:定义输出名称和文件夹
MODIFIED = expand("/home/bedfiles/{first}_modified", first=FIRSTPART)
第3步:在rule all
中编写:
Step3: Writing this in rule all
:
rule all:
input: MODIFIED
第4步:制定特定规则以实施最接近的卧床工具"
rule closest:
input:
input1 = "/home/other/merged.txt" , \
input2 = expand("/home/bedfiles/{first}.bed", first=FIRSTPART)
output:
expand("/home/bedfiles/{first}_modified", first=FIRSTPART)
shell:
""" bedtools closest -a {input.input1} -b {input.input2} > {output} """
在规则全部输入的行上,我抛出了错误:
And it throws me the error at the line for rule all,input:
invalid syntax
您知道如何克服此错误或以其他任何方式实施此错误吗?
Do you know how to overpass this error or any other way to implement it?
PS:不能一一写入文件名.
PS : Writing the names of the files one by one is not possible.
推荐答案
在您定义的input
和output
中的output
上删除对expand
的调用.您当前正在传递20个文件名的矢量作为input.input2
和20个文件名的矢量作为output
.
Remove the call to expand
in your definition of input
and output
in closest
. You're currently passing in a vector of 20 filenames as input.input2
and a vector of 20 filenames as output
.
也就是说,您的规则closest
当前尝试运行一次并创建20个文件;而它应该运行20次并每次创建一个文件.
That is, your rule closest
is currently trying to run once and create 20 files; whereas it should run 20 times and create a single file each time.
在closest
中,您希望每次运行规则时input.input2
是单个文件,而output
是单个文件:
In closest
you want input.input2
to be a single file and output
to be a single file each time that rule is ran:
FIRSTPART = [f.split(".")[0] for f in os.listdir("/home/bedfiles") if f.endswith('.bed')]
print("These are the input files:")
print([f + ".bed" for f in FIRSTPART])
MODIFIED = expand("/home/bedfiles/{first}_modified", first=FIRSTPART)
print("These will be created")
print(MODIFIED)
rule all:
input: MODIFIED
rule closest:
message: """
Converts /home/other/merged.txt and /some/dir/xyz.bed
into /some/dir/xyz_modified
"""
input:
input1 = "/home/other/merged.txt",
input2 = "{prefix}.bed"
output: "{prefix}_modified"
shell:
"""
bedtools closest -a {input.input1} -b {input.input2} > {output}
"""
这是一个实验:
Here's an experiment:
将自己移至临时目录,然后在该目录中执行以下操作:
Move yourself into a temporary directory and within that directory do the following:
mkdir bedfiles
touch bedfiles/{a,b,c,d}.bed
然后在当前目录中添加一个名为Snakefile
的文件,其中包含以下代码
Then add a file called Snakefile
into your current directory that contains the following code
import os
import os.path
import re
input_dir = "bedfiles"
input_files = [os.path.join(input_dir, f) for f in os.listdir(input_dir)]
print(input_files)
output_files = [re.sub(".bed$", "_modified", f) for f in input_files]
print(output_files)
rule all:
input: output_files
rule mover:
input: "{prefix}.bed"
output: "{prefix}_modified"
shell:
""" cp {input} {output} """
然后在命令行上使用snakemake
运行它. Snakemake是面向目标的;它说明了如何根据现有文件进行所需的输出.
Then run it using snakemake
at the command line. Snakemake is goal-oriented; it works out how to make your desired outputs based on the existing files.
这篇关于在Snakemake中使用多个文件名作为通配符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!