区分同名节点的正确图形数据结构是什么? [英] What is the correct graph data structure to differentiate between nodes with the same name?

查看:90
本文介绍了区分同名节点的正确图形数据结构是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在学习图形(它们似乎超级有用),并且想知道是否可以就构建图形的可能方式获得一些建议.

I'm learning about graphs(they seem super useful) and was wondering if I could get some advice on a possible way to structure my graphs.

简而言之,可以说我每天都获得采购订单数据,有些日子与前一天相同,而另一些日子则有所不同.例如,昨天我有一个铅笔和橡皮的订单,我创建了两个节点来代表它们,然后今天我得到了一个橡皮和一个标记的订单,依此类推.每天过后,我的程序还希望查看谁订购了什么东西,如果鲍勃昨天订购了铅笔,今天订购了橡皮,则它会产生有向边.我的逻辑是,我可以看到谁每天都买了东西,并且可以跟踪Bob的购买行为(并可以用它来推断自己或其他用户的模式).

Simply, Lets say I get purchase order data everyday and some days its the same as the day before and on others its different. For example, yesterday I had an order of pencils and erasers, I create the two nodes to represent them and then today I get an order for an eraser and a marker, and so on. After each day, my program also looks to see who ordered what, and if Bob ordered a pencil yesterday and then an eraser today, it creates a directed edge. My logic for this is I can see who bought what on each day and I can track the purchase behaviour of Bob(and maybe use it to infer patterns with himself or other users).

我的问题是,我使用networkx(python)并为昨天创建了一个节点'pencil',然后为day2创建了另一个节点'pencil',我无法区分它们.

My problem is, I'm using networkx(python) and creating a node 'pencil' for yesterday and then another node 'pencil' for day2 and I can't differentiate them.

我认为(一直以来)将其命名为day2-pencil,然后扫描整个图形并剥离出"day2-"以跟踪铅笔订单.这对我来说似乎是错误的(更不用说处理器上的昂贵了).我认为关键是如果我可以将每天以某种方式标记为自己的子图,那么当我想研究特定的一天或几天时,不必扫描整个图.

I thought(and have been) naming it day2-pencil and then scanning the entire graph and stripping out the 'day2-' to track pencil orders. This seems wrong to me(not to mention expensive on the processor). I think the key would be if I can somehow mark each day as its own subgraph so when I want to study a specific day or a few days, I don't have to scan the entire graph.

随着我的测试数据越来越大,它变得越来越混乱,所以我想知道最佳实践是什么?任何生成建议都将是很棒的(因为networkx似乎功能很全,所以他们可能有办法做到这一点).

As my test data gets larger, its getting more and more confusing so I am wondering what the best practice is? Any generate suggestions would be great(as networkx seems pretty full featured so they probably have a way of doing it).

提前谢谢!

更新:仍然没有运气,但这可能会有所帮助:

Update: Still no luck, but this maybe helpful:

import networkx as nx
G=nx.Graph()
G.add_node('pencil', day='1/1/12', colour='blue')
G.add_node('eraser', day='1/1/12', colour='rubberish colour. I know thats not a real colour')
G.add_node('pencil', day='1/2/12', colour='blue')

我输入以下命令 G.node 的结果是:

The result I get typing the following command G.node is:

{'pencil': {'colour': 'blue', 'day': '1/2/12'}, 'eraser': {'colour': 'rubberish colour. I know thats not a real colour', 'day': '1/1/12'}}

很明显,它用1/1/12的铅笔用1/2/12的铅笔覆盖,不确定我是否可以改用铅笔.

Its obviously overwriting the pencil from 1/1/12 with 1/2/12 one, not sure if I can make a distint one.

推荐答案

这实际上主要取决于您的目标.您要分析的是图形设计中的决定性因素.但是,从您的结构来看,一般的结构是 Customers Products 的节点,它们通过 Days 连接(我不知道如果这对您有所帮助,但这实际上是二部图).

This is mostly depending on your goal actually. What you want to analyze is the definitive factor in your graph design. But, looking at your structure, a general structure would be nodes for Customers and Products, that are connected by Days (I don't know if this would help you any better but this is in fact a bipartite graph).

所以您的结构应如下所示:

So your structure would be something like this:

node(Person) --- edge(Day) ---> node(Product)

比方说,鲍勃在1/1/12买了一支铅笔:

Let's say, Bob buys a pencil on 1/1/12:

node(Bob) --- 1/1/12 ---> node(Pencil)

好的,现在鲍勃去买1/2/12的另一支铅笔:

Ok, now Bob goes and buys another pencil on 1/2/12:

          -- 1/1/12 --
         /            \
node(Bob)              > node(Pencil)
         \            /
          -- 1/2/12 --

等等...

这实际上可以通过 networkx 来实现.由于节点之间有多个边缘,因此必须根据边缘的方向性在 MultiGraph Mor MultiDiGraph 之间进行选择.

This is actually possible with networkx. Since you have multiple edges between nodes, you have to choose between MultiGraphMor MultiDiGraph depending on the directed-ness of your edges.

In : g = networkx.MultiDiGraph()

In : g.add_node("Bob")
In : g.add_node("Alice")

In : g.add_node("Pencil")

In : g.add_edge("Bob","Pencil",key="1/1/12")
In : g.add_edge("Bob","Pencil",key="1/2/12")

In : g.add_edge("Alice","Pencil",key="1/3/12")
In : g.add_edge("Alice","Pencil",key="1/2/12")

In : g.edges(keys=True)
Out:
[('Bob', 'Pencil', '1/2/12'),
 ('Bob', 'Pencil', '1/1/12'),
 ('Alice', 'Pencil', '1/3/12'),
 ('Alice', 'Pencil', '1/2/12')]

到目前为止,还不错.您实际上可以查询爱丽丝在12年1月1日买了铅笔吗?"之类的东西.

so far, not bad. You can actually query things like "Did Alice buy a Pencil on 1/1/12?".

In : g.has_edge("Alice","Pencil","1/1/12")
Out: False

In : g.has_edge("Alice","Pencil","1/2/12")
Out: True

如果您想要特定日期的所有订单,情况可能会变糟.糟糕的是,我不是指代码方面的意思,而是计算方面的意思.就代码而言,这非常简单:

Things might get bad if you want all orders on specific days. By bad, I don't mean code-wise, but computation-wise. Code-wise it is rather simple:

In : [(from_node, to_node) for from_node, to_node, key in g.edges(keys=True) if key=="1/2/12"]
Out: [('Bob', 'Pencil'), ('Alice', 'Pencil')]

但是,这将扫描网络中的所有边缘并过滤所需的边缘.我认为 networkx 没有更好的方法.

But this scans all the edges in the network and filters the ones you want. I don't think networkx has any better way.

这篇关于区分同名节点的正确图形数据结构是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆