区分同名节点的正确图形数据结构是什么? [英] What is the correct graph data structure to differentiate between nodes with the same name?
问题描述
我正在学习图形(它们似乎超级有用),并且想知道是否可以就构建图形的可能方式获得一些建议.
I'm learning about graphs(they seem super useful) and was wondering if I could get some advice on a possible way to structure my graphs.
简而言之,可以说我每天都获得采购订单数据,有些日子与前一天相同,而另一些日子则有所不同.例如,昨天我有一个铅笔和橡皮的订单,我创建了两个节点来代表它们,然后今天我得到了一个橡皮和一个标记的订单,依此类推.每天过后,我的程序还希望查看谁订购了什么东西,如果鲍勃昨天订购了铅笔,今天订购了橡皮,则它会产生有向边.我的逻辑是,我可以看到谁每天都买了东西,并且可以跟踪Bob的购买行为(并可以用它来推断自己或其他用户的模式).
Simply, Lets say I get purchase order data everyday and some days its the same as the day before and on others its different. For example, yesterday I had an order of pencils and erasers, I create the two nodes to represent them and then today I get an order for an eraser and a marker, and so on. After each day, my program also looks to see who ordered what, and if Bob ordered a pencil yesterday and then an eraser today, it creates a directed edge. My logic for this is I can see who bought what on each day and I can track the purchase behaviour of Bob(and maybe use it to infer patterns with himself or other users).
我的问题是,我使用networkx(python)并为昨天创建了一个节点'pencil',然后为day2创建了另一个节点'pencil',我无法区分它们.
My problem is, I'm using networkx(python) and creating a node 'pencil' for yesterday and then another node 'pencil' for day2 and I can't differentiate them.
我认为(一直以来)将其命名为day2-pencil,然后扫描整个图形并剥离出"day2-"以跟踪铅笔订单.这对我来说似乎是错误的(更不用说处理器上的昂贵了).我认为关键是如果我可以将每天以某种方式标记为自己的子图,那么当我想研究特定的一天或几天时,不必扫描整个图.
I thought(and have been) naming it day2-pencil and then scanning the entire graph and stripping out the 'day2-' to track pencil orders. This seems wrong to me(not to mention expensive on the processor). I think the key would be if I can somehow mark each day as its own subgraph so when I want to study a specific day or a few days, I don't have to scan the entire graph.
随着我的测试数据越来越大,它变得越来越混乱,所以我想知道最佳实践是什么?任何生成建议都将是很棒的(因为networkx似乎功能很全,所以他们可能有办法做到这一点).
As my test data gets larger, its getting more and more confusing so I am wondering what the best practice is? Any generate suggestions would be great(as networkx seems pretty full featured so they probably have a way of doing it).
提前谢谢!
更新:仍然没有运气,但这可能会有所帮助:
Update: Still no luck, but this maybe helpful:
import networkx as nx
G=nx.Graph()
G.add_node('pencil', day='1/1/12', colour='blue')
G.add_node('eraser', day='1/1/12', colour='rubberish colour. I know thats not a real colour')
G.add_node('pencil', day='1/2/12', colour='blue')
我输入以下命令 G.node
的结果是:
The result I get typing the following command G.node
is:
{'pencil': {'colour': 'blue', 'day': '1/2/12'}, 'eraser': {'colour': 'rubberish colour. I know thats not a real colour', 'day': '1/1/12'}}
很明显,它用1/1/12的铅笔用1/2/12的铅笔覆盖,不确定我是否可以改用铅笔.
Its obviously overwriting the pencil from 1/1/12 with 1/2/12 one, not sure if I can make a distint one.
推荐答案
这实际上主要取决于您的目标.您要分析的是图形设计中的决定性因素.但是,从您的结构来看,一般的结构是 Customers
和 Products
的节点,它们通过 Days
连接(我不知道如果这对您有所帮助,但这实际上是二部图).
This is mostly depending on your goal actually. What you want to analyze is the definitive factor in your graph design. But, looking at your structure, a general structure would be nodes for Customers
and Products
, that are connected by Days
(I don't know if this would help you any better but this is in fact a bipartite graph).
所以您的结构应如下所示:
So your structure would be something like this:
node(Person) --- edge(Day) ---> node(Product)
比方说,鲍勃在1/1/12买了一支铅笔:
Let's say, Bob buys a pencil on 1/1/12:
node(Bob) --- 1/1/12 ---> node(Pencil)
好的,现在鲍勃去买1/2/12的另一支铅笔:
Ok, now Bob goes and buys another pencil on 1/2/12:
-- 1/1/12 --
/ \
node(Bob) > node(Pencil)
\ /
-- 1/2/12 --
等等...
这实际上可以通过 networkx
来实现.由于节点之间有多个边缘,因此必须根据边缘的方向性在 MultiGraph
Mor MultiDiGraph
之间进行选择.
This is actually possible with networkx
. Since you have multiple edges between nodes, you have to choose between MultiGraph
Mor MultiDiGraph
depending on the directed-ness of your edges.
In : g = networkx.MultiDiGraph()
In : g.add_node("Bob")
In : g.add_node("Alice")
In : g.add_node("Pencil")
In : g.add_edge("Bob","Pencil",key="1/1/12")
In : g.add_edge("Bob","Pencil",key="1/2/12")
In : g.add_edge("Alice","Pencil",key="1/3/12")
In : g.add_edge("Alice","Pencil",key="1/2/12")
In : g.edges(keys=True)
Out:
[('Bob', 'Pencil', '1/2/12'),
('Bob', 'Pencil', '1/1/12'),
('Alice', 'Pencil', '1/3/12'),
('Alice', 'Pencil', '1/2/12')]
到目前为止,还不错.您实际上可以查询爱丽丝在12年1月1日买了铅笔吗?"之类的东西.
so far, not bad. You can actually query things like "Did Alice buy a Pencil on 1/1/12?".
In : g.has_edge("Alice","Pencil","1/1/12")
Out: False
In : g.has_edge("Alice","Pencil","1/2/12")
Out: True
如果您想要特定日期的所有订单,情况可能会变糟.糟糕的是,我不是指代码方面的意思,而是计算方面的意思.就代码而言,这非常简单:
Things might get bad if you want all orders on specific days. By bad, I don't mean code-wise, but computation-wise. Code-wise it is rather simple:
In : [(from_node, to_node) for from_node, to_node, key in g.edges(keys=True) if key=="1/2/12"]
Out: [('Bob', 'Pencil'), ('Alice', 'Pencil')]
但是,这将扫描网络中的所有边缘并过滤所需的边缘.我认为 networkx
没有更好的方法.
But this scans all the edges in the network and filters the ones you want. I don't think networkx
has any better way.
这篇关于区分同名节点的正确图形数据结构是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!