从DataFrame加载属性和边缘到NetworkX的节点 [英] Load nodes with attributes and edges from DataFrame to NetworkX

查看:1348
本文介绍了从DataFrame加载属性和边缘到NetworkX的节点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是使用Python处理图表的新人:NetworkX。直到现在我已经使用了Gephi。标准步骤(但不是唯一可能的)是:


  1. 从表格/电子表格加载节点信息;其中一列应该是ID,其余的是关于节点的元数据(节点是人,所以性别,群体......通常用于着色)。喜欢:

      id; NormalizedName; Gender 
    per1;Jesús; male
    per2; Abraham; male
    per3; Isaac;男性
    per4; Jacob;男性
    per5;Judá;男性
    per6;添马舰;女性
    ...

  2. 然后从表格/电子表格中加载边缘,使用节点的相同名称节点电子表格通常有四列(目标,来源,重量和类型):

     目标;来源;重量;类型
    per1; per2; 3;无向的
    per3; per4; 2;无向的
    ...


这是我拥有的两个数据框,我想用Python加载。阅读关于NetworkX,看起来不太可能将两个表(一个用于节点,一个用于边)加载到同一个图中,我不确定最佳方式是什么:


  1. 我应该仅使用DataFrame中的节点信息创建图形,然后从其他DataFrame中添加(追加)边缘?如果是这样,因为nx.from_pandas_dataframe()期望有关边缘的信息,我想我不应该用它来创建节点......我应该只是将这些信息作为列表传递出去?


  2. 我应该仅使用DataFrame中的边缘信息创建图形,然后将其他DataFrame中的信息添加到每个节点作为属性?有没有比迭代DataFrame和节点更好的方法?

  3. >

    使用 nx.from_pandas_dataframe

     将networkx导入为nx 
    将pandas导入为pd

    edges = pd.DataFrame({'source':[0,1],
    'target':[1,2],
    'weight':[100,50]})

    nodes = pd.DataFrame({'node':[0,1,2],
    'name':['Foo ','Bar','Baz'],
    'gender':['M','F','M']})

    G = nx.from_pandas_dataframe(边缘, 'source','target','weight')

    然后添加dictiona中的节点属性使用 set_node_attributes

    pre $ nx.set_node_attributes(G,'name',pd.Series(nodes .name,index = nodes.node).to_dict())
    nx.set_node_attributes(G,'gender',pd.Series(nodes.gender,index = nodes.node).to_dict())

    或者遍历图来添加节点属性:

      for i in sorted(G.nodes()):
    G.node [i] ['name'] = nodes.name [i]
    G .node [i] ['gender'] = nodes.gender [i]



    更新: h3>

    nx 2.0 nx.set_node_attributes 已更改(G,values,姓名=无)



    使用上面的示例:

      nx.set_node_attributes(G,pd.Series(nodes.gender,index = nodes.node).to_dict(),'gender')


    I am new using Python for working with graphs: NetworkX. Until now I have used Gephi. There the standard steps (but not the only possible) are:

    1. Load the nodes informations from a table/spreadsheet; one of the columns should be ID and the rest are metadata about the nodes (nodes are people, so gender, groups... normally to be used for coloring). Like:

      id;NormalizedName;Gender
      per1;Jesús;male
      per2;Abraham;male
      per3;Isaac;male
      per4;Jacob;male
      per5;Judá;male
      per6;Tamar;female
      ...
      

    2. Then load the edges also from a table/spreadsheet, using the same names for the nodes as it was in the column ID of the nodes spreadsheet with normally four columns (Target, Source, Weight and Type):

      Target;Source;Weight;Type
      per1;per2;3;Undirected
      per3;per4;2;Undirected
      ...
      

    This are the two dataframes that I have and that I want to load in Python. Reading about NetworkX, it seems that it's not quite possible to load two tables (one for nodes, one for edges) into the same graph and I am not sure what would be the best way:

    1. Should I create a graph only with the nodes informations from the DataFrame, and then add (append) the edges from the other DataFrame? If so and since nx.from_pandas_dataframe() expects information about the edges, I guess I shouldn't use it to create the nodes... Should I just pass the information as lists?

    2. Should I create a graph only with the edges information from the DataFrame and then add to each node the information from the other DataFrame as attributes? Is there a better way for doing that than iterating over the DataFrame and the nodes?

    解决方案

    Create the weighted graph from the edge table using nx.from_pandas_dataframe:

    import networkx as nx
    import pandas as pd
    
    edges = pd.DataFrame({'source' : [0, 1],
                          'target' : [1, 2],
                          'weight' : [100, 50]})
    
    nodes = pd.DataFrame({'node' : [0, 1, 2],
                          'name' : ['Foo', 'Bar', 'Baz'],
                          'gender' : ['M', 'F', 'M']})
    
    G = nx.from_pandas_dataframe(edges, 'source', 'target', 'weight')
    

    Then add the node attributes from dictionaries using set_node_attributes:

    nx.set_node_attributes(G, 'name', pd.Series(nodes.name, index=nodes.node).to_dict())
    nx.set_node_attributes(G, 'gender', pd.Series(nodes.gender, index=nodes.node).to_dict())
    

    Or iterate over the graph to add the node attributes:

    for i in sorted(G.nodes()):
        G.node[i]['name'] = nodes.name[i]
        G.node[i]['gender'] = nodes.gender[i]
    

    Update:

    As of nx 2.0 the argument order of nx.set_node_attributes has changed: (G, values, name=None)

    Using the example from above:

    nx.set_node_attributes(G, pd.Series(nodes.gender, index=nodes.node).to_dict(), 'gender')
    

    这篇关于从DataFrame加载属性和边缘到NetworkX的节点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆