将组ID分配给networkx中的组件 [英] Assigning Group ID to components in networkx
问题描述
我有一个图,该图由节点组成,这些节点中存储有酒店的父母"和"phone_search". 我建立该图的主要目的是(递归地)连接所有具有相似"phone_search"的"parentid",例如,如果父母A具有phone_search 1,2; B有2,3; C有3,4; D有5,6,E有6,7,则A,B,C将被分组为一个簇,而D和E将被分组为另一个簇.
I have a graph which consists of nodes having "parentid" of hotels and "phone_search" stored in them. My main aim to build this graph was to connect all "parentid" which have similar "phone_search" (recursively), eg, if parentid A has phone_search 1,2; B has 2,3; C has 3,4; D has 5,6 and E has 6,7, then A,B, C will be grouped in 1 cluster and D and E in another cluster.
这是我构建网络的代码:
This is my code to build the nework:
from pymongo import MongoClient # To import client for MongoDB
import networkx as nx
import pickle
G = nx.Graph()
#Defining variables
hotels = []
phones = []
allResult = []
finalResult = []
#dictNx = {}
# Initializing MongoDB client
client = MongoClient()
# Connection
db = client.hotel
collection = db.hotelData
for post in collection.find():
hotels.append(post)
for hotel in hotels:
try:
phones = hotel["phone_search"].split("|")
for phone in phones:
if phone == '':
pass
else:
G.add_edge(hotel["parentid"],phone)
except:
phones = hotel["phone_search"]
if phone == '':
pass
else:
G.add_edge(hotel["parentid"],phone)
# nx.write_gml(G,"export.gml")
pickle.dump(G, open('/home/justdial/newHotel/graph.txt', 'w'))
我要做什么:我想为每个组件分配一个组ID,并将其存储到字典中,以便每次都可以直接从字典中轻松访问它们.
What I want to do: I want to assign a group ID to each component and store it into a dictionary so that I can access them with ease every time directly from the dictionary.
示例:Gid 1将包含同一集群中的一些parentids和phone_searches.同样,Gid 2将包含来自另一个群集的节点,依此类推...
Example : Gid 1 will contain some parentids and phone_searches which are in the same cluster. Similarly Gid 2 will contain nodes from another cluster and so on...
我还有一个疑问.使用group ID从字典访问节点的速度比在networkx图上执行bfs更快吗?
I have one more doubt. Is accessing the nodes from dictionary using group ID faster than performing a bfs on networkx graph?
推荐答案
基本上,您需要一个基于其组件(而不是集群)的节点列表,这非常简单.您需要 connected_component_subgraphs()
.
You want basically a list of nodes based on their component (not cluster), which is fairly straightforward. You need connected_component_subgraphs()
.
G = nx.caveman_graph(3, 4) # generate example with 3 components of four members each
components = nx.connected_component_subgraphs(G)
comp_dict = {idx: comp.nodes() for idx, comp in enumerate(components)}
print comp_dict
# {0: [0, 1, 2, 3], 1: [4, 5, 6, 7], 2: [8, 9, 10, 11]}
如果希望将组件ID作为节点属性:
In case you want the component IDs as node attributes:
attr = {n: comp_id for comp_id, nodes in comp_dict.items() for n in nodes}
nx.set_node_attributes(G, "component", attr)
print G.nodes(data=True)
# [(0, {'component': 0}), (1, {'component': 0}), (2, {'component': 0}), (3, {'component': 0}), (4, {'component': 1}), (5, {'component': 1}), (6, {'component': 1}), (7, {'component': 1}), (8, {'component': 2}), (9, {'component': 2}), (10, {'component': 2}), (11, {'component': 2})]
这篇关于将组ID分配给networkx中的组件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!