如何在Python中从空格分隔的文件中提取特定的列? [英] How to extract specific columns from a space separated file in Python?

查看:413
本文介绍了如何在Python中从空格分隔的文件中提取特定的列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试处理蛋白质数据库中的文件,该文件之间用空格(不是\ t)分隔.我有一个.txt文件,我想提取特定的行,并且我只想从那几行中提取几列.

I'm trying to process a file from the protein data bank which is separated by spaces (not \t). I have a .txt file and I want to extract specific rows and, from that rows, I want to extract only a few columns.

我需要用Python做到这一点.我首先尝试使用命令行,并使用awk命令没有问题,但是我不知道如何在Python中执行同样的操作.

I need to do it in Python. I tried first with command line and used awk command with no problem, but I have no idea of how to do the same in Python.

这是我文件的摘录:


[...]
SEQRES   6 B   80  ALA LEU SER ILE LYS LYS ALA GLN THR PRO GLN GLN TRP          
SEQRES   7 B   80  LYS PRO                                                      
HELIX    1   1 THR A   68  SER A   81  1                                  14    
HELIX    2   2 CYS A   97  LEU A  110  1                                  14    
HELIX    3   3 ASN A  122  SER A  133  1                                  12    
[...]

例如,我只想使用"HELIX"行,然后是第4、6、7和9列.我开始使用for循环逐行读取文件,然后提取那些以'HELIX'开头的行……仅此而已.

For example, I'd like to take only the 'HELIX' rows and then the 4th, 6th, 7th and 9th columns. I started reading the file line by line with a for loop and then extracted those rows starting with 'HELIX'... and that's all.

这是我现在拥有的代码,但是打印不能正常工作,仅打印每个块的第一行(HELIX SHEET和DBREF)

This is the code I have right now, but the print doesn't work properly, only prints the first line of each block (HELIX SHEET AND DBREF)

#!/usr/bin/python
import sys

for line in open(sys.argv[1]):
 if 'HELIX' in line:
   helix = line.split()
 elif 'SHEET'in line:
   sheet = line.split()
 elif 'DBREF' in line:
   dbref = line.split()

print (helix), (sheet), (dbref)

推荐答案

如果您已经提取了该行,则可以使用line.split()对其进行拆分.这将为您提供一个列表,您可以从中提取所需的所有元素:

If you already have extracted the line, you can split it using line.split(). This will give you a list, of which you can extract all the elements you need:

>>> test='HELIX 2 2 CYS A 97'
>>> test.split()
['HELIX', '2', '2', 'CYS', 'A', '97']
>>> test.split()[3]
'CYS'

这篇关于如何在Python中从空格分隔的文件中提取特定的列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆