python链从tsv文件的列表 [英] python chain a list from a tsv file

查看:532
本文介绍了python链从tsv文件的列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个tsv文件包含一些链接路径每个链接由一个';'分隔我想使用:

i have this tsv file containing some paths of links each link is seperated by a ';' i want to use:

在下面的例子中,我们可以文件中的文本是分开的
,我只想通过最后一列读出以'14th'开头的路径

In the example below we can se that the text in the file is seperated and i only want to read through the last column wich is a path starting with '14th'

6a3701d319fc3754    1297740409  166    14th_century;15th_century;16th_century;Pacific_Ocean;Atlantic_Ocean;Accra;Africa;Atlantic_slave_trade;African_slave_trade    NULL
3824310e536af032    1344753412  88     14th_century;Europe;Africa;Atlantic_slave_trade;African_slave_trade  3
415612e93584d30e    1349298640  138    14th_century;Niger;Nigeria;British_Empire;Slavery;Africa;Atlantic_slave_trade;African_slave_trade

我想以某种方式拆分路径成这样的链:

I want to somehow split the path into a chain like this:

['14th_century', 'Niger', 'Nigeria'....] 

如何读取文件并删除前3列,所以我只有最后一个?

how do i read the file and remove the first 3 columns so i only got the last one ?

UPDATE:

我已经尝试过了:

import re
with open('test.tsv') as f:
    lines = f.readlines()
for line in lines[22:len(lines)]:
    re.sub(r"^\s+", " ", line, flags = re.MULTILINE)
    e_line = line.split(' ')
    real_line = e_line[0]
    print real_line.split(';')

但问题是,它不删除的前三列?

But the problem is that it not deleting the first 3 columns ?

推荐答案

如果隔板betweeen第一只是一个空格,而不是一系列空格或制表符,您可以使用open('file_name')来实现

If the separator betweeen first is only a space and not a serie of spaces or a tab, you could do that

with open('file_name') as f:
    lines = f.readlines()
for line in lines:
    e_line = line.split(' ')
    real_line = e_line[3]
    print real_line.split(';')

这篇关于python链从tsv文件的列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆