使用BioPython读取.pdb文件的整个目录 [英] Reading an entire directory of .pdb files using BioPython

查看:369
本文介绍了使用BioPython读取.pdb文件的整个目录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近受命用python编写程序,以从.pdb(蛋白质数据库)的蛋白质中找到距每种金属2埃以内的原子.这是我为此编写的脚本.

I was recently tasked to write a program in python to find atoms within 2 angstroms distance from every metal in a protein from a .pdb (Protein Data Bank). This is the script I wrote for it.

from Bio.PDB import *
parser = PDBParser(PERMISSIVE=True)

def print_coordinates(list):
    neighborList = list
    for y in neighborList:
        print "     ", y.get_coord()

structure_id = '5m6n'
fileName = '5m6n.pdb'
structure = parser.get_structure(structure_id, fileName)

atomList = Selection.unfold_entities(structure, 'A')

ns = NeighborSearch(atomList)

for x in structure.get_atoms():
    if x.name == 'ZN' or x.name == 'FE' or x.name == 'CU' or x.name == 'MG' or x.name == 'CA' or x.name == 'MN':
        center = x.get_coord()
        neighbors = ns.search(center,2.0)
        neighborList = Selection.unfold_entities(neighbors, 'A')

        print x.get_id(), ': ', neighborList
        print_coordinates(neighborList)
    else:
        continue

但这仅适用于单个.pdb文件,我希望能够读取它们的整个目录.由于直到现在我一直只使用Java,所以我不太确定如何在Python 2.7中做到这一点.我的想法是,将脚本放入try catch语句中,并在其中进行while循环,然后在到达末尾时引发异常,但这就是我在Java中的处理方式,不确定如何用Python完成.所以我很想听听任何人可能有的任何想法或示例代码.

But this is only for a single .pdb file, I would like to be able to read an entire directory of them. Since I've only been using Java until now, I am not entirely sure how I would be able to do this in Python 2.7. An idea I have is that I would put the script in a try catch statement and in it, a while loop, then throw an exception when it reaches the end, but that's how I would've done in Java, not sure how I would do it in Python. So I would love to hear any idea or sample code anyone might have.

推荐答案

您的代码中有一些冗余,例如,这样做是相同的:

You have some redundancies in your code, for instance this does the same:

from Bio.PDB import *
parser = PDBParser(PERMISSIVE=True)

def print_coordinates(neighborList):
    for y in neighborList:
        print "     ", y.get_coord()

structure_id = '5m6n'
fileName = '5m6n.pdb'
structure = parser.get_structure(structure_id, fileName)
metals = ['ZN', 'FE', 'CU', 'MG', 'CA', 'MN']

atomList = [atom for atom in structure.get_atoms() if atom.name in metals]
ns = NeighborSearch(Selection.unfold_entities(structure, 'A'))

for atom in atomList:
    neighbors = ns.search(atom.coord, 2)
    print("{0}: {1}").format(atom.name, neighbors)
    print_coordinates(neighborList)

要回答您的问题,您可以使用glob模块获取所有pdb文件的列表,并将代码嵌套在对所有文件进行迭代的for循环中.假设您的pdb文件位于/home/pdb_files/:

To answer your question, you can get a list of all your pdb files using the glob module and nest your code on a for loop iterating over all files. Supposing your pdb files are at /home/pdb_files/:

from Bio.PDB import *
from glob import glob
parser = PDBParser(PERMISSIVE=True)
pdb_files = glob('/home/pdb_files/*')

def print_coordinates(neighborList):
    for y in neighborList:
        print "     ", y.get_coord()

for fileName in pdb_files:
     structure_id = fileName.rsplit('/', 1)[1][:-4]
     structure = parser.get_structure(structure_id, fileName)
     # The rest of your code

这篇关于使用BioPython读取.pdb文件的整个目录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆