如何在目录中的多个文件上传递Biopython SeqIO.convert()? [英] How do I pass Biopython SeqIO.convert() over multiple files in a directory?

查看:113
本文介绍了如何在目录中的多个文件上传递Biopython SeqIO.convert()?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个python脚本(2.7版),它将指定目录中的每个输入文件(.nexus格式)更改为.fasta格式. Biopython模块SeqIO.convert可以完美地处理单个指定文件的转换,但是当我尝试使用os.walk通过目录自动执行该过程时,我无法正确地将每个输入文件的路径名传递给SeqIO.convert.我要去哪里错了?我是否需要使用os.path模块中的join()并将完整路径名传递给SeqIO.convert?

I’m writing a python script (version 2.7) that will change every input file (.nexus format) within the specified directory into .fasta format. The Biopython module SeqIO.convert handles the conversion perfectly for individually specified files but when I try to automate the process over a directory using os.walk I’m unable to correctly pass the pathname of each input file to SeqIO.convert. Where are I going wrong? Do I need to use join() from os.path module and pass the full path names on to SeqIO.convert?

    #Import modules
    import sys
    import re
    import os
    import fileinput

    from Bio import SeqIO

    #Specify directory of interest
    PSGDirectory = "/Users/InputDirectory"
    #Create a class that will run the SeqIO.convert function repeatedly
    def process(filename):
      count = SeqIO.convert("files", "nexus", "files.fa", "fasta", alphabet= IUPAC.ambiguous_dna)
    #Make sure os.walk works correctly
    for path, dirs, files in os.walk(PSGDirectory):
       print path
       print dirs
       print files

    #Now recursively do the count command on each file inside PSGDirectory
    for files in os.walk(PSGDirectory):
       print("Converted %i records" % count)
       process(files)      

运行脚本时,出现以下错误消息: Traceback (most recent call last): File "nexus_to_fasta.psg", line 45, in <module> print("Converted %i records" % count) NameError: name 'count' is not defined 此对话非常有帮助,但我没有知道在何处插入join()函数语句. 这是我的一个链接文件的示例 感谢您的帮助!

When I run the script I get this error message: Traceback (most recent call last): File "nexus_to_fasta.psg", line 45, in <module> print("Converted %i records" % count) NameError: name 'count' is not defined This conversation was very helpful but I don’t know where to insert the join() function statements. Here is an example of one of my nexus files Thanks for your help!

推荐答案

发生了一些事情.

首先,您的流程函数未返回"count".您可能想要:

First, your process function isn't returning 'count'. You probably want:

def process(filename):
   return seqIO.convert("files", "nexus", "files.fa", "fasta", alphabet=IUPAC.ambiguous_dna) 
   # assuming seqIO.convert actually returns the number you want

此外,当您编写for files in os.walk(PSGDirectory)时,您使用的是os.walk返回的3元组,而不是单个文件.您想做这样的事情(请注意os.path.join的使用):

Also, when you write for files in os.walk(PSGDirectory) you're operating on the 3-tuple that os.walk returns, not individual files. You want to do something like this (note the use of os.path.join):

for root, dirs, files in os.walk(PSGDirectory):
    for filename in files:
            fullpath = os.path.join(root, filename)
            print process(fullpath)

更新:

因此,我查看了seqIO.convert的文档,并希望使用以下文档进行调用:

So I looked at the documentation for seqIO.convert and it expects to be called with:

  • in_file-输入句柄或文件名
  • in_format-输入文件格式,小写字符串
  • out_file-输出句柄或文件名
  • out_format-输出文件格式,小写字符串
  • 字母-可选的字母

in_file是要转换的文件的名称,最初您只是用"files"调用seqIO.convert.

in_file is the name of the file to convert, and originally you were just calling seqIO.convert with "files".

所以您的流程函数可能应该是这样的:

so your process function should probably be something like this:

def process(filename):
    return seqIO.convert(filename, "nexus", filename + '.fa', "fasta", alphabet=IUPAC.ambiguous_dna)

这篇关于如何在目录中的多个文件上传递Biopython SeqIO.convert()?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆