使用Neo4j执行任意查询 [英] Performance of arbitrary queries with Neo4j

查看:1711
本文介绍了使用Neo4j执行任意查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在阅读Neo4J发表的文章(前面): http:// dist.neo4j.org/neo-technology-introduction.pdf



,而在第2页到最后一页,缺点部分Neo4J不利于任意查询。



说我有拥有以下属性的用户节点:$ b​​ $ b NAME,AGE,GENDER



和以下关系:
LIKE(指向Sports,Technology等NODE)和FRIEND(指向另一个USER)。



Neo4J不是很有效率在查询类似于以下的内容时:



查找LIKE Sports,Tech和&阅读是OVER_THE_AGE 21张。



因此,您必须先找到USER1的FRIEND边缘,然后找到朋友的LIKE边缘,并确定该节点是否称为Sports必须确定给定朋友的age属性是否> 21。



这是一个糟糕的数据模型吗?尤其是图形数据库? LIKE关系的原因是您想要查找所有LIKE Sports的用户。



这是什么更好的数据库选择? Redis,Cassandra,HBase,PostgreSQL?

解决方案



p>这是有关图数据库性质的一般问题。希望其中的neo4j开发人员会在这里跳,但这里是我的理解。



你可以认为任何数据库是自然索引在某种方式。在关系数据库中,当查找存储中的记录时,通常下一个记录在存储中紧挨着存储。我们可以称之为自然索引,因为如果你想要做的是扫描一堆记录,关系结构只是从根本上设置,以使它的表现真的很好。


$ b $另一方面,图形数据库通常由关系自然地索引。 (Neo4J devs,跳入,如果这需要细化neo4j在磁盘上的存储方面)。这意味着,一般来说,图形数据库很快地遍历关系,但在大量/批量查询上表现不佳。



现在,我们只讨论相对性能。这里是一个RDBMS样式查询的例子。我期望MySQL在这个查询中击败neo4j的性能:

  MATCH n WHERE n.name ='Abe' n; 

请注意,这根本不使用关系,并强制数据库扫描所有节点。你可以通过把它缩小到一个特定的标签,或者通过名字索引来改进它,但是一般来说,如果你有一个人的名称列的MySQL表,RDBMS会惹恼查询这和图表的效果不太好。



好的,这是缺点。有什么好处?让我们来看看这个查询:

  MATCH n- [r:foo | bar * .. 5]  - > m返回m; 

这是一个完全不同的野兽。查询的真正动作是在n和m之间匹配可变长度路径。我们将如何在关系中这样做?我们可以设置一个nodes和edges表,然后在它们之间添加一个PK / FK关系。然后,您可以编写一个递归连接两个表的SQL查询,以遍历该路径。相信我,我已经尝试这在SQL,它需要向导级技能来表达1和5跳之间的查询的一部分。此外,RDMBS将对此查询执行像狗一样,因为它不是非常有选择性,并且递归查询是相当昂贵的,做所有这些重复连接。



在这样的查询中,neo4j会踢RDBMS的屁股。



任意查询 - 世界上没有任何系统适用于任意查询,也就是说,所有查询。系统有优势和弱点。 Neo4J可以执行任意查询,但不能保证对于某些类型的查询,它会比某些替代方法表现更好。但是这个观察是一般的 - 对MySQL,MongoDB和你选择的任何事情也是如此。



好的,底线和观察:


  1. 图形数据库在RDMBS(和其他人)性能较差的查询类别上运行良好。

  2. 在质量/批量查询的高性能调整,如我提供的例子。他们可以做他们,你可以调整他们的表现,改善的东西,但他们永远不会像RDBMS一样好。

  3. 这是因为从根本上来说,他们如何思考/存储数据。

  4. 那么你该怎么办呢?如果你的问题包括很多关系/路径遍历类型的问题,图是一个大胜利! (即,您的数据是一个图形,并且遍历关系对您很重要)。如果您的问题是扫描大型对象集合,那么关系模型可能更合适。

在其区域中使用工具强度。不要使用neo4j像关系数据库,或者它会执行以及如果你试图使用螺丝刀敲钉子。 :)


I was reading a paper published by Neo4J (a while ago): http://dist.neo4j.org/neo-technology-introduction.pdf

and on the 2nd to last page the Drawbacks section states that Neo4J is not good for arbitrary queries.

Say I had Nodes of users with the following properties: NAME, AGE, GENDER

And the following relationships: LIKE (points to Sports, Technology, etc. NODE) and FRIEND (Points to another USER).

Is Neo4J not very efficient in querying something similar to:

Find FRIENDS (of given node) that LIKE Sports, Tech, & Reading that were OVER_THE_AGE 21.

Therefore, you must first find the FRIEND edges of USER1 and then find the LIKE edges of friends and determine if that node was called Sports and you must determine if the age property of the given friend is > 21.

Is this a poor data model to begin with? And especially for graph databases? The reason for the LIKE relationship is in the event that you want to find all people who LIKE Sports.

What would be the better database choice for this? Redis, Cassandra, HBase, PostgreSQL? And Why?

Does anyone have any empirical data regarding this?

解决方案

This is a general question about the nature of graph databases. Hopefully one of the neo4j devs will jump in here, but here is my understanding.

You can think of any database as being "naturally indexed" in a certain way. In a relational database, when you look up a record in storage, generally the next record is stored right next to it in storage. We might call this a "natural index" because if what you want to do is scan through a bunch of records, the relational structure is just fundamentally set up to make that perform really well.

Graph databases on the other hand are generally naturally indexed by relationships. (Neo4J devs, jump in if this needs refinement in terms of how neo4j does storage on disk). This means that in general, graph databases traverse relationships very quickly, but perform less well on mass/bulk queries.

Now, we're only talking about relative performance. Here's an example of an RDBMS style query. I'd expect MySQL to blow away neo4j in performance on this query:

MATCH n WHERE n.name='Abe' RETURN n;

Note that this exploits no relationships at all, and forces the DB to scan ALL nodes. You could improve this by narrowing it down to a certain label, or by indexing on name, but in general, if you had a MySQL table of "people" with a "name" column, an RDBMS is going to kick ass on queries like this, and graph is going to do less well.

OK, so that's the downside. What's the upside? Let's take a look at this query:

MATCH n-[r:foo|bar*..5]->m RETURN m;

This is an entirely different beast. The real action of the query is in matching a variable length path between n and m. How would we do this in relational? We might set up a "nodes" and "edges" table, then add a PK/FK relationship between them. You then could write an SQL query that recursively joined the two tables to traverse that "path". Believe me, I have tried this in SQL, and it requires wizard-level skill to express the "between 1 and 5 hops" part of that query. Also, RDMBS will perform like a dog on this query, because it's not terribly selective, and the recursive query is quite expensive, doing all those repetitive joins.

On queries like this, neo4j is going to kick RDBMS's ass.

So -- on your question about arbitrary queries -- no system in the world is good at arbitrary queries, that is to say, all queries. Systems have strengths and weaknesses. Neo4J can execute arbitrary queries, but there's no guarantee that for some class of queries, it will perform better than some alternative. But that observation is general - the same is true of MySQL, MongoDB, and anything else you choose.

OK, so bottom lines, and observations:

  1. Graph databases perform well on a class of queries where RDMBS (and others) perform poorly.
  2. Graph databases aren't tuned for high performance on mass/bulk queries like the example I provided. They can do them, and you can tune their performance to improve things there, but they're never going to be as good as an RDBMS
  3. This is because of fundamentally how they're laid out, how they think about/store the data.
  4. So what should you do? If your problem consists of a lot of relationship/path traversal type problems, graph is a big win! (I.e., your data is a graph, and traversing relationships is important to you). If your problem consists of scanning large collections of objects, then the relational model is probably a better fit.

Use tools in their area of strength. Don't use neo4j like a relational database, or it will perform about as well as if you tried to use a screwdriver to pound nails. :)

这篇关于使用Neo4j执行任意查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆