如何提取从PDB文件链? [英] How to extract chains from a PDB file?

查看:374
本文介绍了如何提取从PDB文件链?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想提取pdb文件链。我有一个名为pdb.txt文件,其中包含PDB的ID,如下图所示。前四个字符重新present PDB ID和最后一个字符是连锁的ID。

  1B68A
1BZ4B
4FUTA

我想1)逐行读取文件中的行
                 2)从相应的PDB文件下载每个链的原子坐标。结果
                 3)输出保存到一个文件夹。

我用下面的脚本提取链。但是,这code仅打印链从pdb文件。

 对我1B68 1BZ4 4FUT

wget的-c \"http://www.pdb.org/pdb/download/downloadFile.do?fileFormat=pdb&com$p$pssion=NO&structureId=\"$i -O $ i.pdb
grep的ATOM $ i.pdb | grep的'A'> $ I \\ _A.pdb
DONE


解决方案

以下BioPython code应该满足您的需求很好。

它使用 PDB.Select 来只选择所需的链(在你的情况下,一个链)和 PDBIO()以创建只包含链的结构。

 导入OS
从生物进口PDB
类ChainSplitter:
    高清__init __(自我,out_dir =无):
        创建分析和写作的对象,指定输出目录
        self.parser = PDB.PDBParser()
        self.writer = PDB.PDBIO()
        如果out_dir是无:
            out_dir = os.path.join(os.getcwd(),chain_PDBs)
        self.out_dir = out_dir    高清make_pdb(个体经营,pdb_path,chain_letters,覆盖=假,结构=无):
        创建仅包含指定链一个新的PDB文件。        返回的路径中创建的文件。        :参数pdb_path:完整路径晶体结构
        :参数chain_letters:可迭代的链条字符(不区分大小写)
        :参数覆盖:如果存在改写输出文件
        
        chain_letters = [chain.upper()在chain_letters链]        #输入/输出文件
        (pdb_dir,pdb_fn)= os.path.split这样(pdb_path)
        pdb_id = pdb_fn [3:7]
        out_name =PDB%S_%s.ent%(pdb_id,。加入(chain_letters))
        out_path = os.path.join(self.out_dir,out_name)
        打印OUT PATH:out_path
        复数=S,如果(LEN(chain_letters)→1)其他,#打印        #跳过PDB一代,如果该文件已经存在
        如果(未覆盖)和(os.path.isfile(out_path)):
            打印(链条%s%S'%s'已提取到%s'的。%
                    (复数,。加入(chain_letters),pdb_id,out_name))
            返回out_path        打印(连锁提取%s%S%s的......%(复数,
                ,。加入(chain_letters),pdb_fn))        #获取结构,只给写链的新文件
        如果结构是无:
            结构= self.parser.get_structure(pdb_id,pdb_path)
        self.writer.set_structure(结构)
        self.writer.save(out_path,选择= SelectChains(chain_letters))        返回out_path
类SelectChains(PDB.Select):
    只有保存时接受指定的枷锁。
    高清__init __(自我,chain_letters):
        self.chain_letters = chain_letters    高清accept_chain(个体经营,连锁):
        返回(在self.chain_letters chain.get_id())
如果__name__ ==__main__:
    解析PDB ID的期望链,并创造了新的PDB结构。
    进口SYS
    如果不是LEN(sys.argv中)== 2:
        打印用法:$蟒蛇%s的pdb.txt'%__FILE__
        sys.exit()    pdb_textfn = sys.argv中[1]    pdbList = PDB.PDBList()
    分路器= ChainSplitter(/家庭/史蒂夫/ chain_pdbs)#改变我。    开放(pdb_textfn)为pdb_textfile:
        在pdb_textfile行:
            pdb_id =行[:4] .lower()
            链=行[4]
            pdb_fn = pdbList.retrieve_pdb_file(pdb_id)
            splitter.make_pdb(pdb_fn,链)


最后一点:不写自己的解析器作为PDB文件。格式规范是丑陋的(真难看的),以及故障的PDB文件的数量那里是惊人的。使用像BioPython一个工具,将处理解析为您服务!

此外,而不是使用的wget ,应使用与PDB数据库你互动的工具。他们采取的FTP连接限制在内,PDB数据库性质的变化,等等。我应该知道 - 我updated Bio.PDBList 考虑到数据库的变化。 =)

I would like to extract chains from pdb files. I have a file named pdb.txt which contains pdb IDs as shown below. The first four characters represent PDB IDs and last character is the chain IDs.

1B68A 
1BZ4B
4FUTA

I would like to 1) read the file line by line 2) download the atomic coordinates of each chain from the corresponding PDB files.
3) save the output to a folder.

I used the following script to extract chains. But this code prints only A chains from pdb files.

for i in 1B68 1BZ4 4FUT
do 
wget -c "http://www.pdb.org/pdb/download/downloadFile.do?fileFormat=pdb&compression=NO&structureId="$i -O $i.pdb
grep  ATOM $i.pdb | grep 'A' > $i\_A.pdb
done

解决方案

The following BioPython code should suit your needs well.

It uses PDB.Select to only select the desired chains (in your case, one chain) and PDBIO() to create a structure containing just the chain.

import os
from Bio import PDB


class ChainSplitter:
    def __init__(self, out_dir=None):
        """ Create parsing and writing objects, specify output directory. """
        self.parser = PDB.PDBParser()
        self.writer = PDB.PDBIO()
        if out_dir is None:
            out_dir = os.path.join(os.getcwd(), "chain_PDBs")
        self.out_dir = out_dir

    def make_pdb(self, pdb_path, chain_letters, overwrite=False, struct=None):
        """ Create a new PDB file containing only the specified chains.

        Returns the path to the created file.

        :param pdb_path: full path to the crystal structure
        :param chain_letters: iterable of chain characters (case insensitive)
        :param overwrite: write over the output file if it exists
        """
        chain_letters = [chain.upper() for chain in chain_letters]

        # Input/output files
        (pdb_dir, pdb_fn) = os.path.split(pdb_path)
        pdb_id = pdb_fn[3:7]
        out_name = "pdb%s_%s.ent" % (pdb_id, "".join(chain_letters))
        out_path = os.path.join(self.out_dir, out_name)
        print "OUT PATH:",out_path
        plural = "s" if (len(chain_letters) > 1) else ""  # for printing

        # Skip PDB generation if the file already exists
        if (not overwrite) and (os.path.isfile(out_path)):
            print("Chain%s %s of '%s' already extracted to '%s'." %
                    (plural, ", ".join(chain_letters), pdb_id, out_name))
            return out_path

        print("Extracting chain%s %s from %s..." % (plural,
                ", ".join(chain_letters), pdb_fn))

        # Get structure, write new file with only given chains
        if struct is None:
            struct = self.parser.get_structure(pdb_id, pdb_path)
        self.writer.set_structure(struct)
        self.writer.save(out_path, select=SelectChains(chain_letters))

        return out_path


class SelectChains(PDB.Select):
    """ Only accept the specified chains when saving. """
    def __init__(self, chain_letters):
        self.chain_letters = chain_letters

    def accept_chain(self, chain):
        return (chain.get_id() in self.chain_letters)


if __name__ == "__main__":
    """ Parses PDB id's desired chains, and creates new PDB structures. """
    import sys
    if not len(sys.argv) == 2:
        print "Usage: $ python %s 'pdb.txt'" % __file__
        sys.exit()

    pdb_textfn = sys.argv[1]

    pdbList = PDB.PDBList()
    splitter = ChainSplitter("/home/steve/chain_pdbs")  # Change me.

    with open(pdb_textfn) as pdb_textfile:
        for line in pdb_textfile:
            pdb_id = line[:4].lower()
            chain = line[4]
            pdb_fn = pdbList.retrieve_pdb_file(pdb_id)
            splitter.make_pdb(pdb_fn, chain)


One final note: don't write your own parser for PDB files. The format specification is ugly (really ugly), and the amount of faulty PDB files out there is staggering. Use a tool like BioPython that will handle parsing for you!

Furthermore, instead of using wget, you should use tools that interact with the PDB database for you. They take FTP connection limitations into account, the changing nature of the PDB database, and more. I should know - I updated Bio.PDBList to account for changes in the database. =)

这篇关于如何提取从PDB文件链?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆