Neo4j:标签还是索引属性? [英] Neo4j: label vs. indexed property?

查看:56
本文介绍了Neo4j:标签还是索引属性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设您是Twitter,并且:

Suppose you're Twitter, and:

  • 您有(:User)(:Tweet)节点;
  • 推文可能会被标记;和
  • 您要查询当前正在审核的已标记推文列表.
  • You have (:User) and (:Tweet) nodes;
  • Tweets can get flagged; and
  • You want to query the list of flagged tweets currently awaiting moderation.

您可以为这些推文添加标签,例如:AwaitingModeration,或添加属性并建立索引,例如isAwaitingModeration = true|false.

You can either add a label for those tweets, e.g. :AwaitingModeration, or add and index a property, e.g. isAwaitingModeration = true|false.

一种选择在本质上优于另一种吗?

Is one option inherently better than the other?

我知道最好的答案可能是尝试同时对两个:)进行负载测试,但是Neo4j的实现POV中有什么可以使一个选项更健壮或更适合于这种查询?

I know the best answer is probably to try and load test both :), but is there anything from Neo4j's implementation POV that makes one option more robust or suited for this kind of query?

它是否取决于在任何给定时刻处于此状态的推文的数量?如果是10s与1000s,有什么区别吗?

Does it depend on the volume of tweets in this state at any given moment? If it's in the 10s vs. the 1000s, does that make a difference?

我的印象是标签更适合于大量节点,而索引属性更适合于较小的节点(理想情况下是唯一节点),但是我不确定这是否真的成立.

My impression is that labels are better suited for a large volume of nodes, whereas indexed properties are better for smaller volumes (ideally, unique nodes), but I'm not sure if that's actually true.

谢谢!

推荐答案

更新:跟进当我们为客户建模数据集时,这是一个常见的问题,而对于Active/NonActive实体来说,这是一个典型的用例.

This is a common question when we model datasets for customers and a typical use case for Active/NonActive entities.

这是一些有关我对Neo4j2.1.6有效的经验的反馈:

This is a little feedback about what I've experienced valid for Neo4j2.1.6 :

要点1.在标签或索引属性上的匹配与返回节点之间,数据库访问不会有差异

Point 1. You will not have difference in db accesses between matching on a label or on an indexed property and return the nodes

点2.例如,当此类节点位于模式结尾时,就会遇到差异.

Point 2. The difference will be encountered when such nodes are at the end of a pattern, for example

MATCH (n:User {id:1})
WITH n
MATCH (n)-[:WRITTEN]->(post:Post)
WHERE post.published = true
RETURN n, collect(post) as posts;

-

PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc'
> WITH n
> MATCH (n)-[:WRITTEN]->(post:BlogPost)
> WHERE post.active = true
> RETURN n, size(collect(post)) as posts;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n                                                                                                                                                         | posts |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node[118]{_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"hayes.shaina@yahoo.com"} | 1     |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row

ColumnFilter(0)
  |
  +Extract
    |
    +ColumnFilter(1)
      |
      +EagerAggregation
        |
        +Filter
          |
          +SimplePatternMatcher
            |
            +SchemaIndex

+----------------------+------+--------+----------------------+----------------------------------------------------------------------------+
|             Operator | Rows | DbHits |          Identifiers |                                                                      Other |
+----------------------+------+--------+----------------------+----------------------------------------------------------------------------+
|      ColumnFilter(0) |    1 |      0 |                      |                                                      keep columns n, posts |
|              Extract |    1 |      0 |                      |                                                                      posts |
|      ColumnFilter(1) |    1 |      0 |                      |                                           keep columns n,   AGGREGATION153 |
|     EagerAggregation |    1 |      0 |                      |                                                                          n |
|               Filter |    1 |      3 |                      | (hasLabel(post:BlogPost(1)) AND Property(post,active(8)) == {  AUTOBOOL1}) |
| SimplePatternMatcher |    1 |     12 | n, post,   UNNAMED84 |                                                                            |
|          SchemaIndex |    1 |      2 |                 n, n |                                                {  AUTOSTRING0}; :User(_id) |
+----------------------+------+--------+----------------------+----------------------------------------------------------------------------+

Total database accesses: 17

在这种情况下,Cypher将不使用索引:Post(published).

In this case, Cypher will not make use of the index :Post(published).

因此,如果您有ActivePost标签(例如, :

Thus the use of labels is more performant in the case you have a ActivePost label for e.g. :

neo4j-sh (?)$ PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc'
> WITH n
> MATCH (n)-[:WRITTEN]->(post:ActivePost)
> RETURN n, size(collect(post)) as posts;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n                                                                                                                                                         | posts |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node[118]{_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"hayes.shaina@yahoo.com"} | 1     |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row

ColumnFilter(0)
  |
  +Extract
    |
    +ColumnFilter(1)
      |
      +EagerAggregation
        |
        +Filter
          |
          +SimplePatternMatcher
            |
            +SchemaIndex

+----------------------+------+--------+----------------------+----------------------------------+
|             Operator | Rows | DbHits |          Identifiers |                            Other |
+----------------------+------+--------+----------------------+----------------------------------+
|      ColumnFilter(0) |    1 |      0 |                      |            keep columns n, posts |
|              Extract |    1 |      0 |                      |                            posts |
|      ColumnFilter(1) |    1 |      0 |                      | keep columns n,   AGGREGATION130 |
|     EagerAggregation |    1 |      0 |                      |                                n |
|               Filter |    1 |      1 |                      |     hasLabel(post:ActivePost(2)) |
| SimplePatternMatcher |    1 |      4 | n, post,   UNNAMED84 |                                  |
|          SchemaIndex |    1 |      2 |                 n, n |      {  AUTOSTRING0}; :User(_id) |
+----------------------+------+--------+----------------------+----------------------------------+

Total database accesses: 7

要点3.始终对正值使用标签,这意味着在上述情况下,具有草稿标签将迫使您执行以下查询:

Point 3. Always use labels for positives, meaning for the case above, having a Draft label will force you to execute the following query :

MATCH (n:User {id:1})
WITH n
MATCH (n)-[:POST]->(post:Post)
WHERE NOT post :Draft
RETURN n, collect(post) as posts;

这意味着Cypher将打开每个节点标签标头并对其进行过滤.

Meaning that Cypher will open each node label headers and do a filter on it.

要点4 .避免在多个标签上进行匹配

Point 4. Avoid having the need to match on multiple labels

MATCH (n:User {id:1})
WITH n
MATCH (n)-[:POST]->(post:Post:ActivePost)
RETURN n, collect(post) as posts;

neo4j-sh (?)$ PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc'
> WITH n
> MATCH (n)-[:WRITTEN]->(post:BlogPost:ActivePost)
> RETURN n, size(collect(post)) as posts;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n                                                                                                                                                         | posts |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node[118]{_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"hayes.shaina@yahoo.com"} | 1     |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row

ColumnFilter(0)
  |
  +Extract
    |
    +ColumnFilter(1)
      |
      +EagerAggregation
        |
        +Filter
          |
          +SimplePatternMatcher
            |
            +SchemaIndex

+----------------------+------+--------+----------------------+---------------------------------------------------------------+
|             Operator | Rows | DbHits |          Identifiers |                                                         Other |
+----------------------+------+--------+----------------------+---------------------------------------------------------------+
|      ColumnFilter(0) |    1 |      0 |                      |                                         keep columns n, posts |
|              Extract |    1 |      0 |                      |                                                         posts |
|      ColumnFilter(1) |    1 |      0 |                      |                              keep columns n,   AGGREGATION139 |
|     EagerAggregation |    1 |      0 |                      |                                                             n |
|               Filter |    1 |      2 |                      | (hasLabel(post:BlogPost(1)) AND hasLabel(post:ActivePost(2))) |
| SimplePatternMatcher |    1 |      8 | n, post,   UNNAMED84 |                                                               |
|          SchemaIndex |    1 |      2 |                 n, n |                                   {  AUTOSTRING0}; :User(_id) |
+----------------------+------+--------+----------------------+---------------------------------------------------------------+

Total database accesses: 12

这将对Cypher产生与第3点相同的过程.

This will result in the same process for Cypher that on point 3.

要点5 .如果可能,请通过键入正确的命名关系避免匹配标签

Point 5. If possible, avoid the need to match on labels by having well typed named relationships

MATCH (n:User {id:1})
WITH n
MATCH (n)-[:PUBLISHED]->(p)
RETURN n, collect(p) as posts

-

MATCH (n:User {id:1})
WITH n
MATCH (n)-[:DRAFTED]->(post)
RETURN n, collect(post) as posts;

neo4j-sh (?)$ PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc'
> WITH n
> MATCH (n)-[:DRAFTED]->(post)
> RETURN n, size(collect(post)) as posts;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n                                                                                                                                                         | posts |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node[118]{_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"hayes.shaina@yahoo.com"} | 3     |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row

ColumnFilter(0)
  |
  +Extract
    |
    +ColumnFilter(1)
      |
      +EagerAggregation
        |
        +SimplePatternMatcher
          |
          +SchemaIndex

+----------------------+------+--------+----------------------+----------------------------------+
|             Operator | Rows | DbHits |          Identifiers |                            Other |
+----------------------+------+--------+----------------------+----------------------------------+
|      ColumnFilter(0) |    1 |      0 |                      |            keep columns n, posts |
|              Extract |    1 |      0 |                      |                            posts |
|      ColumnFilter(1) |    1 |      0 |                      | keep columns n,   AGGREGATION119 |
|     EagerAggregation |    1 |      0 |                      |                                n |
| SimplePatternMatcher |    3 |      0 | n, post,   UNNAMED84 |                                  |
|          SchemaIndex |    1 |      2 |                 n, n |      {  AUTOSTRING0}; :User(_id) |
+----------------------+------+--------+----------------------+----------------------------------+

Total database accesses: 2

将具有更高的性能,因为它将使用图的所有功能,并且只需遵循节点的关系,不会导致数据库访问比匹配用户节点更多,因此不会对标签进行过滤.

Will be more performant, because it will use all the power of the graph and just follow the relationships from the node resulting in no more db accesses than matching the user node and thus no filtering on labels.

这是我的0,02欧元

这篇关于Neo4j:标签还是索引属性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆