转换tsv文件,所以我可以使用它的节点和边缘在python [英] Convert tsv file so i can use it for nodes and edges in python
问题描述
我有这个tsv文件我想读取和以某种方式计数一个路径中的节点数量
I have this tsv file i would like to read and somehow count the numbers of nodes in a path
这是tsv文件的部分如何看起来像: / p>
This is how the parts of tsv file looks like:
6a3701d319fc3754 1297740409 166 14th_century;15th_century;16th_century;Pacific_Ocean;Atlantic_Ocean;Accra;Africa;Atlantic_slave_trade;African_slave_trade NULL
3824310e536af032 1344753412 88 14th_century;Europe;Africa;Atlantic_slave_trade;African_slave_trade 3
路径只有这样:第14世纪分隔符';'
the paths is only the ones looking like this : 14th_century;15th_century; seperated by ';'
我的代码所以fare:
my code so fare :
import networkx as nx
fh = open("test.tsv", 'rb')
G = nx.read_edgelist("test.tsv", create_using=nx.DiGraph())
print G.nodes()
print G.edges()
问题是我如何计算路径所触及的节点数量?
So my question is how do i count the numbers of nodes touched by a path?
推荐答案
我在这里使用pandas库的速度,您可以使用 pip install pandas
进行安装,并在此处查看: http ://pandas.pydata.org/
I'm using the pandas library here for speed, you can install using pip install pandas
and also check here: http://pandas.pydata.org/
首先从您的示例代码构建我们的数据框架:
Firstly construct our dataframe from your sample code:
In [39]:
temp = """6a3701d319fc3754 1297740409 166 14th_century;15th_century;16th_century;Pacific_Ocean;Atlantic_Ocean;Accra;Africa;Atlantic_slave_trade;African_slave_trade NULL
3824310e536af032 1344753412 88 14th_century;Europe;Africa;Atlantic_slave_trade;African_slave_trade 3"""
# construct the dataframe
# in your case replace io.String() with the path to your tsv file
df = pd.read_csv(io.StringIO(temp), sep='\s+', header=None, names=['a','b','c','d','e'])
df
Out[39]:
a b c \
0 6a3701d319fc3754 1297740409 166
1 3824310e536af032 1344753412 88
d e
0 14th_century;15th_century;16th_century;Pacific... NaN
1 14th_century;Europe;Africa;Atlantic_slave_trad... 3
[2 rows x 5 columns]
In [65]:
# use itertools to flatten our list of lists
import itertools
def to_edge_list(x):
# split on semi-colon
split_list = x.split(';')
#print(split_list)
# get our main node
primary_node = split_list[0]
# construct our edge list
edge_list=[]
# create a list comprehension from the split list
edge_list = [(primary_node, x) for x in split_list[1:] ]
#print(edge_list)
return edge_list
# now use itertools to flatten the list of lists into a single list
combined_edge_list = list(itertools.chain.from_iterable(df['d'].apply(to_edge_list)))
print(combined_edge_list)
[('14th_century', '15th_century'), ('14th_century', '16th_century'), ('14th_century', 'Pacific_Ocean'), ('14th_century', 'Atlantic_Ocean'), ('14th_century', 'Accra'), ('14th_century', 'Africa'), ('14th_century', 'Atlantic_slave_trade'), ('14th_century', 'African_slave_trade'), ('14th_century', 'Europe'), ('14th_century', 'Africa'), ('14th_century', 'Atlantic_slave_trade'), ('14th_century', 'African_slave_trade')]
# Now construct our networkx graph from the edge list
In [66]:
import networkx as nx
G = nx.MultiDiGraph()
G.add_edges_from(combined_edge_list)
G.edges()
Out[66]:
[('14th_century', '15th_century'),
('14th_century', 'Africa'),
('14th_century', 'Africa'),
('14th_century', 'Atlantic_slave_trade'),
('14th_century', 'Atlantic_slave_trade'),
('14th_century', 'African_slave_trade'),
('14th_century', 'African_slave_trade'),
('14th_century', '16th_century'),
('14th_century', 'Accra'),
('14th_century', 'Europe'),
('14th_century', 'Atlantic_Ocean'),
('14th_century', 'Pacific_Ocean')]
绘制图形(看起来不太漂亮,但又是什么):
draw the graph (doesn't look pretty but what the hell):
这篇关于转换tsv文件,所以我可以使用它的节点和边缘在python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!