如何提取从PDB文件链? [英] How to extract chains from a PDB file?
问题描述
我想提取pdb文件链。我有一个名为pdb.txt文件,其中包含PDB的ID,如下图所示。前四个字符重新present PDB ID和最后一个字符是连锁的ID。
1B68A
1BZ4B
4FUTA
我想1)逐行读取文件中的行
2)从相应的PDB文件下载每个链的原子坐标。结果
3)输出保存到一个文件夹。
我用下面的脚本提取链。但是,这code仅打印链从pdb文件。
对我1B68 1BZ4 4FUT
做
wget的-c \"http://www.pdb.org/pdb/download/downloadFile.do?fileFormat=pdb&com$p$pssion=NO&structureId=\"$i -O $ i.pdb
grep的ATOM $ i.pdb | grep的'A'> $ I \\ _A.pdb
DONE
以下BioPython code应该满足您的需求很好。
它使用 PDB.Select
来只选择所需的链(在你的情况下,一个链)和 PDBIO()
以创建只包含链的结构。
导入OS
从生物进口PDB
类ChainSplitter:
高清__init __(自我,out_dir =无):
创建分析和写作的对象,指定输出目录
self.parser = PDB.PDBParser()
self.writer = PDB.PDBIO()
如果out_dir是无:
out_dir = os.path.join(os.getcwd(),chain_PDBs)
self.out_dir = out_dir 高清make_pdb(个体经营,pdb_path,chain_letters,覆盖=假,结构=无):
创建仅包含指定链一个新的PDB文件。 返回的路径中创建的文件。 :参数pdb_path:完整路径晶体结构
:参数chain_letters:可迭代的链条字符(不区分大小写)
:参数覆盖:如果存在改写输出文件
chain_letters = [chain.upper()在chain_letters链] #输入/输出文件
(pdb_dir,pdb_fn)= os.path.split这样(pdb_path)
pdb_id = pdb_fn [3:7]
out_name =PDB%S_%s.ent%(pdb_id,。加入(chain_letters))
out_path = os.path.join(self.out_dir,out_name)
打印OUT PATH:out_path
复数=S,如果(LEN(chain_letters)→1)其他,#打印 #跳过PDB一代,如果该文件已经存在
如果(未覆盖)和(os.path.isfile(out_path)):
打印(链条%s%S'%s'已提取到%s'的。%
(复数,。加入(chain_letters),pdb_id,out_name))
返回out_path 打印(连锁提取%s%S%s的......%(复数,
,。加入(chain_letters),pdb_fn)) #获取结构,只给写链的新文件
如果结构是无:
结构= self.parser.get_structure(pdb_id,pdb_path)
self.writer.set_structure(结构)
self.writer.save(out_path,选择= SelectChains(chain_letters)) 返回out_path
类SelectChains(PDB.Select):
只有保存时接受指定的枷锁。
高清__init __(自我,chain_letters):
self.chain_letters = chain_letters 高清accept_chain(个体经营,连锁):
返回(在self.chain_letters chain.get_id())
如果__name__ ==__main__:
解析PDB ID的期望链,并创造了新的PDB结构。
进口SYS
如果不是LEN(sys.argv中)== 2:
打印用法:$蟒蛇%s的pdb.txt'%__FILE__
sys.exit() pdb_textfn = sys.argv中[1] pdbList = PDB.PDBList()
分路器= ChainSplitter(/家庭/史蒂夫/ chain_pdbs)#改变我。 开放(pdb_textfn)为pdb_textfile:
在pdb_textfile行:
pdb_id =行[:4] .lower()
链=行[4]
pdb_fn = pdbList.retrieve_pdb_file(pdb_id)
splitter.make_pdb(pdb_fn,链)
最后一点:不写自己的解析器作为PDB文件。格式规范是丑陋的(真难看的),以及故障的PDB文件的数量那里是惊人的。使用像BioPython一个工具,将处理解析为您服务!
此外,而不是使用的wget
,应使用与PDB数据库你互动的工具。他们采取的FTP连接限制在内,PDB数据库性质的变化,等等。我应该知道 - 我updated Bio.PDBList
考虑到数据库的变化。 =)
I would like to extract chains from pdb files. I have a file named pdb.txt which contains pdb IDs as shown below. The first four characters represent PDB IDs and last character is the chain IDs.
1B68A
1BZ4B
4FUTA
I would like to 1) read the file line by line
2) download the atomic coordinates of each chain from the corresponding PDB files.
3) save the output to a folder.
I used the following script to extract chains. But this code prints only A chains from pdb files.
for i in 1B68 1BZ4 4FUT
do
wget -c "http://www.pdb.org/pdb/download/downloadFile.do?fileFormat=pdb&compression=NO&structureId="$i -O $i.pdb
grep ATOM $i.pdb | grep 'A' > $i\_A.pdb
done
The following BioPython code should suit your needs well.
It uses PDB.Select
to only select the desired chains (in your case, one chain) and PDBIO()
to create a structure containing just the chain.
import os
from Bio import PDB
class ChainSplitter:
def __init__(self, out_dir=None):
""" Create parsing and writing objects, specify output directory. """
self.parser = PDB.PDBParser()
self.writer = PDB.PDBIO()
if out_dir is None:
out_dir = os.path.join(os.getcwd(), "chain_PDBs")
self.out_dir = out_dir
def make_pdb(self, pdb_path, chain_letters, overwrite=False, struct=None):
""" Create a new PDB file containing only the specified chains.
Returns the path to the created file.
:param pdb_path: full path to the crystal structure
:param chain_letters: iterable of chain characters (case insensitive)
:param overwrite: write over the output file if it exists
"""
chain_letters = [chain.upper() for chain in chain_letters]
# Input/output files
(pdb_dir, pdb_fn) = os.path.split(pdb_path)
pdb_id = pdb_fn[3:7]
out_name = "pdb%s_%s.ent" % (pdb_id, "".join(chain_letters))
out_path = os.path.join(self.out_dir, out_name)
print "OUT PATH:",out_path
plural = "s" if (len(chain_letters) > 1) else "" # for printing
# Skip PDB generation if the file already exists
if (not overwrite) and (os.path.isfile(out_path)):
print("Chain%s %s of '%s' already extracted to '%s'." %
(plural, ", ".join(chain_letters), pdb_id, out_name))
return out_path
print("Extracting chain%s %s from %s..." % (plural,
", ".join(chain_letters), pdb_fn))
# Get structure, write new file with only given chains
if struct is None:
struct = self.parser.get_structure(pdb_id, pdb_path)
self.writer.set_structure(struct)
self.writer.save(out_path, select=SelectChains(chain_letters))
return out_path
class SelectChains(PDB.Select):
""" Only accept the specified chains when saving. """
def __init__(self, chain_letters):
self.chain_letters = chain_letters
def accept_chain(self, chain):
return (chain.get_id() in self.chain_letters)
if __name__ == "__main__":
""" Parses PDB id's desired chains, and creates new PDB structures. """
import sys
if not len(sys.argv) == 2:
print "Usage: $ python %s 'pdb.txt'" % __file__
sys.exit()
pdb_textfn = sys.argv[1]
pdbList = PDB.PDBList()
splitter = ChainSplitter("/home/steve/chain_pdbs") # Change me.
with open(pdb_textfn) as pdb_textfile:
for line in pdb_textfile:
pdb_id = line[:4].lower()
chain = line[4]
pdb_fn = pdbList.retrieve_pdb_file(pdb_id)
splitter.make_pdb(pdb_fn, chain)
One final note: don't write your own parser for PDB files. The format specification is ugly (really ugly), and the amount of faulty PDB files out there is staggering. Use a tool like BioPython that will handle parsing for you!
Furthermore, instead of using wget
, you should use tools that interact with the PDB database for you. They take FTP connection limitations into account, the changing nature of the PDB database, and more. I should know - I updated Bio.PDBList
to account for changes in the database. =)
这篇关于如何提取从PDB文件链?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!