python链从tsv文件的列表 [英] python chain a list from a tsv file
问题描述
我有这个tsv文件包含一些链接路径每个链接由一个';'分隔我想使用:
i have this tsv file containing some paths of links each link is seperated by a ';' i want to use:
在下面的例子中,我们可以文件中的文本是分开的
,我只想通过最后一列读出以'14th'开头的路径
In the example below we can se that the text in the file is seperated and i only want to read through the last column wich is a path starting with '14th'
6a3701d319fc3754 1297740409 166 14th_century;15th_century;16th_century;Pacific_Ocean;Atlantic_Ocean;Accra;Africa;Atlantic_slave_trade;African_slave_trade NULL
3824310e536af032 1344753412 88 14th_century;Europe;Africa;Atlantic_slave_trade;African_slave_trade 3
415612e93584d30e 1349298640 138 14th_century;Niger;Nigeria;British_Empire;Slavery;Africa;Atlantic_slave_trade;African_slave_trade
我想以某种方式拆分路径成这样的链:
I want to somehow split the path into a chain like this:
['14th_century', 'Niger', 'Nigeria'....]
如何读取文件并删除前3列,所以我只有最后一个?
how do i read the file and remove the first 3 columns so i only got the last one ?
UPDATE:
我已经尝试过了:
import re
with open('test.tsv') as f:
lines = f.readlines()
for line in lines[22:len(lines)]:
re.sub(r"^\s+", " ", line, flags = re.MULTILINE)
e_line = line.split(' ')
real_line = e_line[0]
print real_line.split(';')
但问题是,它不删除的前三列?
But the problem is that it not deleting the first 3 columns ?
推荐答案
如果隔板betweeen第一只是一个空格,而不是一系列空格或制表符,您可以使用open('file_name')来实现
If the separator betweeen first is only a space and not a serie of spaces or a tab, you could do that
with open('file_name') as f:
lines = f.readlines()
for line in lines:
e_line = line.split(' ')
real_line = e_line[3]
print real_line.split(';')
这篇关于python链从tsv文件的列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!