如何在Python中从空格分隔的文件中提取特定的列? [英] How to extract specific columns from a space separated file in Python?
问题描述
我正在尝试处理蛋白质数据库中的文件,该文件之间用空格(不是\ t)分隔.我有一个.txt文件,我想提取特定的行,并且我只想从那几行中提取几列.
I'm trying to process a file from the protein data bank which is separated by spaces (not \t). I have a .txt file and I want to extract specific rows and, from that rows, I want to extract only a few columns.
我需要用Python做到这一点.我首先尝试使用命令行,并使用awk命令没有问题,但是我不知道如何在Python中执行同样的操作.
I need to do it in Python. I tried first with command line and used awk command with no problem, but I have no idea of how to do the same in Python.
这是我文件的摘录:
[...]
SEQRES 6 B 80 ALA LEU SER ILE LYS LYS ALA GLN THR PRO GLN GLN TRP
SEQRES 7 B 80 LYS PRO
HELIX 1 1 THR A 68 SER A 81 1 14
HELIX 2 2 CYS A 97 LEU A 110 1 14
HELIX 3 3 ASN A 122 SER A 133 1 12
[...]
例如,我只想使用"HELIX"行,然后是第4、6、7和9列.我开始使用for循环逐行读取文件,然后提取那些以'HELIX'开头的行……仅此而已.
For example, I'd like to take only the 'HELIX' rows and then the 4th, 6th, 7th and 9th columns. I started reading the file line by line with a for loop and then extracted those rows starting with 'HELIX'... and that's all.
这是我现在拥有的代码,但是打印不能正常工作,仅打印每个块的第一行(HELIX SHEET和DBREF)
This is the code I have right now, but the print doesn't work properly, only prints the first line of each block (HELIX SHEET AND DBREF)
#!/usr/bin/python
import sys
for line in open(sys.argv[1]):
if 'HELIX' in line:
helix = line.split()
elif 'SHEET'in line:
sheet = line.split()
elif 'DBREF' in line:
dbref = line.split()
print (helix), (sheet), (dbref)
推荐答案
如果您已经提取了该行,则可以使用line.split()
对其进行拆分.这将为您提供一个列表,您可以从中提取所需的所有元素:
If you already have extracted the line, you can split it using line.split()
. This will give you a list, of which you can extract all the elements you need:
>>> test='HELIX 2 2 CYS A 97'
>>> test.split()
['HELIX', '2', '2', 'CYS', 'A', '97']
>>> test.split()[3]
'CYS'
这篇关于如何在Python中从空格分隔的文件中提取特定的列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!