从python数据框的列构造二分图 [英] Construct bipartite graph from columns of python dataframe

查看:927
本文介绍了从python数据框的列构造二分图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含三列的数据框。

I have a dataframe with three columns.

data['subdomain'],  data['domain'], data ['IP']

我想为subdomain的每个元素建立一个二分图,
对应于同一个域,权重为
对应的次数。

I want to build one bipartite graph for every element of subdomain that corresponds to the same domain, and the weight to be the number of times that it corresponds.

例如我的数据可能是:

subdomain , domain, IP
test1, example.org, 10.20.30.40
something, site.com, 30.50.70.90
test2, example.org, 10.20.30.41
test3, example.org, 10.20.30.42
else, website.com, 90.80.70.10

我想要一个二分图,说明 example.org 的权重为3
3边缘等等,我想将这些结果组合成一个新的
数据框。

I want a bipartite graph stating that example.org has a weight of 3 as it has 3 edges on it etc. And I want to group these results together into a new dataframe.

我一直在尝试使用 networkX ,但是我没有经验,特别是当边缘需要计算时。

I have been trying with networkX but I have no experience especially when the edges need to be computed.

B=nx.Graph()
B.add_nodes_from(data['subdomain'],bipartite=0)
B.add_nodes_from(data['domain'],bipartite=1)
B.add_edges_from (...)


推荐答案

您可以使用

B.add_weighted_edges_from(
    [(row['domain'], row['subdomain'], 1) for idx, row in df.iterrows()], 
    weight='weight')

添加加权边缘,或者您可以使用

to add weighted edges, or you could use

B.add_edges_from(
    [(row['domain'], row['subdomain']) for idx, row in df.iterrows()])

添加没有权重的边。

您可能不需要权重,因为节点度数是与该节点相邻的
的边数。例如,

You may not need weights since the node degree is the number of edges adjacent to that node. For example,

>>> B.degree('example.org')
3







import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt

df = pd.DataFrame(
    {'IP': ['10.20.30.40',
      '30.50.70.90',
      '10.20.30.41',
      '10.20.30.42',
      '90.80.70.10'],
     'domain': ['example.org',
      'site.com',
      'example.org',
      'example.org',
      'website.com'],
     'subdomain': ['test1', 'something', 'test2', 'test3', 'else']})

B = nx.Graph()
B.add_nodes_from(df['subdomain'], bipartite=0)
B.add_nodes_from(df['domain'], bipartite=1)
B.add_weighted_edges_from(
    [(row['domain'], row['subdomain'], 1) for idx, row in df.iterrows()], 
    weight='weight')

print(B.edges(data=True))
# [('test1', 'example.org', {'weight': 1}), ('test3', 'example.org', {'weight': 1}), ('test2', 'example.org', {'weight': 1}), ('website.com', 'else', {'weight': 1}), ('site.com', 'something', {'weight': 1})]

pos = {node:[0, i] for i,node in enumerate(df['domain'])}
pos.update({node:[1, i] for i,node in enumerate(df['subdomain'])})
nx.draw(B, pos, with_labels=False)
for p in pos:  # raise text positions
    pos[p][1] += 0.25
nx.draw_networkx_labels(B, pos)

plt.show()


这篇关于从python数据框的列构造二分图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆