使用 Neo4j 进行任意查询的性能 [英] Performance of arbitrary queries with Neo4j

查看:18
本文介绍了使用 Neo4j 进行任意查询的性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在阅读 Neo4J 发表的一篇论文(不久前):http://dist.neo4j.org/neo-technology-introduction.pdf

I was reading a paper published by Neo4J (a while ago): http://dist.neo4j.org/neo-technology-introduction.pdf

在倒数第二页的缺点部分指出 Neo4J 不适用于任意查询.

and on the 2nd to last page the Drawbacks section states that Neo4J is not good for arbitrary queries.

假设我有具有以下属性的用户节点:姓名、年龄、性别

Say I had Nodes of users with the following properties: NAME, AGE, GENDER

以及以下关系:LIKE(指向体育、技术等节点)和 FRIEND(指向另一个用户).

And the following relationships: LIKE (points to Sports, Technology, etc. NODE) and FRIEND (Points to another USER).

Neo4J 在查询类似于以下内容时效率不高:

Is Neo4J not very efficient in querying something similar to:

找到喜欢体育、科技等的朋友(给定节点的)阅读 OVER_THE_AGE 21 岁.

Find FRIENDS (of given node) that LIKE Sports, Tech, & Reading that were OVER_THE_AGE 21.

因此,您必须先找到 USER1 的 FRIEND 边,然后找到朋友的 LIKE 边并确定该节点是否称为 Sports,并且您必须确定给定朋友的年龄属性是否 > 21.

Therefore, you must first find the FRIEND edges of USER1 and then find the LIKE edges of friends and determine if that node was called Sports and you must determine if the age property of the given friend is > 21.

这是一个糟糕的数据模型吗?特别是对于图形数据库?LIKE关系的原因是你想找到所有喜欢体育的人.

Is this a poor data model to begin with? And especially for graph databases? The reason for the LIKE relationship is in the event that you want to find all people who LIKE Sports.

对此更好的数据库选择是什么?Redis、Cassandra、HBase、PostgreSQL?为什么?

What would be the better database choice for this? Redis, Cassandra, HBase, PostgreSQL? And Why?

有人有这方面的经验数据吗?

Does anyone have any empirical data regarding this?

推荐答案

这是一个关于图数据库本质的一般性问题.希望其中一位 Neo4j 开发人员会跳到这里,但这是我的理解.

This is a general question about the nature of graph databases. Hopefully one of the neo4j devs will jump in here, but here is my understanding.

您可以将任何数据库视为以某种方式自然索引".在关系数据库中,当您在存储中查找记录时,通常下一条记录就存储在存储中的旁边.我们可以称其为自然索引",因为如果您想要做的是扫描一堆记录,那么关系结构只是从根本上设置以使其表现良好.

You can think of any database as being "naturally indexed" in a certain way. In a relational database, when you look up a record in storage, generally the next record is stored right next to it in storage. We might call this a "natural index" because if what you want to do is scan through a bunch of records, the relational structure is just fundamentally set up to make that perform really well.

另一方面,图数据库通常由关系自然地索引.(Neo4J 开发人员,如果这需要在 Neo4j 如何在磁盘上存储方面进行改进,请加入).这意味着一般来说,图数据库非常快速地遍历关系,但在批量/批量查询上表现不佳.

Graph databases on the other hand are generally naturally indexed by relationships. (Neo4J devs, jump in if this needs refinement in terms of how neo4j does storage on disk). This means that in general, graph databases traverse relationships very quickly, but perform less well on mass/bulk queries.

现在,我们只讨论相对性能.这是 RDBMS 样式查询的示例.我希望 MySQL 在这个查询的性能上击败 neo4j:

Now, we're only talking about relative performance. Here's an example of an RDBMS style query. I'd expect MySQL to blow away neo4j in performance on this query:

MATCH n WHERE n.name='Abe' RETURN n;

请注意,这根本不利用任何关系,并强制数据库扫描所有节点.您可以通过将其缩小到某个标签或通过对名称进行索引来改进这一点,但一般来说,如果您有一个带有名称"列的人员"的 MySQL 表,那么 RDBMS 将在诸如此类的查询上大放异彩这一点,图表的效果会差一些.

Note that this exploits no relationships at all, and forces the DB to scan ALL nodes. You could improve this by narrowing it down to a certain label, or by indexing on name, but in general, if you had a MySQL table of "people" with a "name" column, an RDBMS is going to kick ass on queries like this, and graph is going to do less well.

好的,这就是缺点.有什么好处?我们来看看这个查询:

OK, so that's the downside. What's the upside? Let's take a look at this query:

MATCH n-[r:foo|bar*..5]->m RETURN m;

这是一个完全不同的野兽.查询的实际操作是匹配 n 和 m 之间的可变长度路径.我们将如何在关系中做到这一点?我们可能会建立一个节点"和边"表,然后在它们之间添加一个 PK/FK 关系.然后,您可以编写一个 SQL 查询,该查询递归地连接两个表以遍历该路径".相信我,我已经在 SQL 中尝试过这个,它需要向导级别的技能来表达该查询的1 到 5 跳之间"部分.此外,RDMBS 将在此查询上像狗一样执行,因为它不是非常有选择性,并且递归查询非常昂贵,执行所有这些重复连接.

This is an entirely different beast. The real action of the query is in matching a variable length path between n and m. How would we do this in relational? We might set up a "nodes" and "edges" table, then add a PK/FK relationship between them. You then could write an SQL query that recursively joined the two tables to traverse that "path". Believe me, I have tried this in SQL, and it requires wizard-level skill to express the "between 1 and 5 hops" part of that query. Also, RDMBS will perform like a dog on this query, because it's not terribly selective, and the recursive query is quite expensive, doing all those repetitive joins.

在这样的查询中,neo4j 会踢 RDBMS 的屁股.

On queries like this, neo4j is going to kick RDBMS's ass.

所以——关于你关于任意查询的问题——世界上没有一个系统擅长任意查询,也就是说,所有查询.系统有优点也有缺点.Neo4J 可以 执行任意查询,但不能保证对于某些类别的查询,它的性能会优于其他查询.但这种观察是一般性的 - MySQL、MongoDB 以及您选择的任何其他产品也是如此.

So -- on your question about arbitrary queries -- no system in the world is good at arbitrary queries, that is to say, all queries. Systems have strengths and weaknesses. Neo4J can execute arbitrary queries, but there's no guarantee that for some class of queries, it will perform better than some alternative. But that observation is general - the same is true of MySQL, MongoDB, and anything else you choose.

好的,底线和观察:

  1. 图数据库在 RDMBS(和其他)表现不佳的一类查询上表现良好.
  2. 图形数据库并未像我提供的示例那样针对批量/批量查询的高性能进行优化.他们可以做到,你可以调整他们的性能以改进那里的事情,但他们永远不会像 RDBMS 一样好
  3. 这从根本上是因为它们的布局方式,以及它们思考/存储数据的方式.
  4. 那你该怎么办?如果你的问题包含很多关系/路径遍历类型的问题,图是一个很大的胜利!(即,您的数据是图表,遍历关系对您很重要).如果您的问题包括扫描大量对象,那么关系模型可能更适合.

在他们擅长的领域使用工具.不要像关系数据库那样使用neo4j,否则它的性能会和你用螺丝刀敲钉子一样好.:)

Use tools in their area of strength. Don't use neo4j like a relational database, or it will perform about as well as if you tried to use a screwdriver to pound nails. :)

这篇关于使用 Neo4j 进行任意查询的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆