在Python 3中从CSV文件创建Networkx图 [英] Create Networkx Graph from CSV file in Python 3

查看:766
本文介绍了在Python 3中从CSV文件创建Networkx图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从 CSV文件构建NetworkX社交网络图.我正在使用Networkx 2.1和Python 3

I am trying to build a NetworkX social network graph from a CSV file. I am using Networkx 2.1 and Python 3

我关注了这篇文章,但没有运气因为我一直收到错误消息:AttributeError:'list'对象没有属性'decode'.

I followed this post with no luck because I keep receiving the error: AttributeError: 'list' object has no attribute 'decode'.

我的目标是使权重在较粗的边缘显示更高的权重.

My goal is to make the weights display thicker edges for the higher weights.

到目前为止,这是我的代码:

Here is my code so far:

import networkx as nx
import csv

Data  = open('testest.csv', "r", encoding='utf8')
read = csv.reader(Data)
Graphtype=nx.Graph()   # use net.Graph() for undirected graph

G = nx.read_edgelist(read, create_using=Graphtype, nodetype=int, data=(('weight',float),))

for x in G.nodes():
      print ("Node:", x, "has total #degree:",G.degree(x), " , In_degree: ", G.out_degree(x)," and out_degree: ", G.in_degree(x))   
for u,v in G.edges():
      print ("Weight of Edge ("+str(u)+","+str(v)+")", G.get_edge_data(u,v))

nx.draw(G)
plt.show()

有没有更简单的方法来解决这个问题?数据相对简单.

Is there a more simplified way to approach this? The data is relatively simple.

谢谢您的帮助!

推荐答案

您正在滥用函数read_edgelist.从文档中,每行都需要解析一个字符串,而csv.reader将输入文件中的行解析为字符串列表(例如,202,237,1 -> ['202', '237', '1']).因此,引发AttributeError是因为read_edgelist试图解析csv.reader提供的列表,而这些列表应该是字符串.

You are misusing the function read_edgelist. From the documentation, each line needs to be parsed a string, while csv.reader parses the lines in the input file into lists of strings (for example, 202,237,1 -> ['202', '237', '1']). Therefore, AttributeError is raised because read_edgelist is trying to parse the lists provided by csv.reader, while they should be strings.

我们可以在不使用csv模块的情况下从输入文件中正确解析图形.但是,我们仍然需要处理输入文件的第一行(标题),不应对其进行解析.有两种方法.第一种方法使用next跳过第一行:

We can correctly parse the graph from the input file without using the csv module. However, we still need to deal with the first line (the headers) of the input file, which should not be parsed. There are two methods. The first method skip the first line using next:

Data = open('test.csv', "r")
next(Data, None)  # skip the first line in the input file
Graphtype = nx.Graph()

G = nx.parse_edgelist(Data, delimiter=',', create_using=Graphtype,
                      nodetype=int, data=(('weight', float),))

第二种方法有点"hacky":由于第一行以target开头,因此我们将字符t标记为输入文件中注释的开始.

The second method is a bit "hacky": since the first line starts with target, we mark the character t as the start of a comment in the input file.

Data = open('test.csv', "r")
Graphtype = nx.Graph()

G = nx.parse_edgelist(Data, comments='t', delimiter=',', create_using=Graphtype,
                      nodetype=int, data=(('weight', float),))

在这两种方法中,我们都必须使用parse_edgelist而不是read_edgelist,因为输入文件的换行符使用\r.要使用read_edgelist,需要以二进制模式打开文件,该文件的行在换行符是\r\n\n .因此,带有\r换行符的输入文件不能拆分成几行,因此不能正确解析.

In both methods, we have to use parse_edgelist instead of read_edgelist because the input file uses \r for newlines. To use read_edgelist, the file needs to be opened in binary mode, whose lines are split iff the newlines are either \r\n or \n. Thus the input file with \r newlines cannot be split into lines, and thus cannot parsed correctly.

此外,由于要查找进度和出度,因此应使用DiGraph而不是Graph创建图形.

Also, since you want to find the in-degrees and out-degrees, the graph should be created using DiGraph, not Graph.

此处的关键点是跳过输入文件中的标题.为此,我们可以先将输入文件读入pandas.DataFrame,然后将其转换为图形.

The key point here is to skip the header in the input file. We can achieve this by first reading the input file into a pandas.DataFrame, then we convert it to a graph.

import networkx as nx
import pandas as pd

df = pd.read_csv('test.csv')
Graphtype = nx.Graph()
G = nx.from_pandas_edgelist(df, edge_attr='weight', create_using=Graphtype)

这篇关于在Python 3中从CSV文件创建Networkx图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆