Neo4j:标签与索引属性? [英] Neo4j: label vs. indexed property?

查看:16
本文介绍了Neo4j:标签与索引属性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设您是 Twitter,并且:

Suppose you're Twitter, and:

  • 你有 (:User)(:Tweet) 节点;
  • 推文可能会被标记;和
  • 您想查询列表当前等待审核的已标记推文.
  • You have (:User) and (:Tweet) nodes;
  • Tweets can get flagged; and
  • You want to query the list of flagged tweets currently awaiting moderation.

您可以为这些推文添加标签,例如:AwaitingModeration,或者添加和索引一个属性,例如isAwaitingModeration = true|false.

You can either add a label for those tweets, e.g. :AwaitingModeration, or add and index a property, e.g. isAwaitingModeration = true|false.

一种选择是否天生比另一种更好?

Is one option inherently better than the other?

我知道最好的答案可能是尝试对两者进行负载测试 :),但是 Neo4j 的实现 POV 中是否有任何使一个选项更健壮或更适合此类查询的内容?

I know the best answer is probably to try and load test both :), but is there anything from Neo4j's implementation POV that makes one option more robust or suited for this kind of query?

它是否取决于在任何给定时刻处于此状态的推文数量?如果是 10 年代与 1000 年代,这有什么不同吗?

Does it depend on the volume of tweets in this state at any given moment? If it's in the 10s vs. the 1000s, does that make a difference?

我的印象是标签更适合大量节点,而索引属性更适合较小体积(理想情况下,唯一节点),但我不确定这是否属实.

My impression is that labels are better suited for a large volume of nodes, whereas indexed properties are better for smaller volumes (ideally, unique nodes), but I'm not sure if that's actually true.

谢谢!

推荐答案

更新: 跟进 博客文章已发布.

这是我们为客户建模数据集时的常见问题,也是活动/非活动实体的典型用例.

This is a common question when we model datasets for customers and a typical use case for Active/NonActive entities.

这是关于我对 Neo4j2.1.6 的体验的一些反馈:

This is a little feedback about what I've experienced valid for Neo4j2.1.6 :

第 1 点.在标签或索引属性上进行匹配并返回节点之间的数据库访问不会有差异

Point 1. You will not have difference in db accesses between matching on a label or on an indexed property and return the nodes

第2点.当这样的节点在一个模式的末尾时会遇到差异,例如

Point 2. The difference will be encountered when such nodes are at the end of a pattern, for example

MATCH (n:User {id:1})
WITH n
MATCH (n)-[:WRITTEN]->(post:Post)
WHERE post.published = true
RETURN n, collect(post) as posts;

-

PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc'
> WITH n
> MATCH (n)-[:WRITTEN]->(post:BlogPost)
> WHERE post.active = true
> RETURN n, size(collect(post)) as posts;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n                                                                                                                                                         | posts |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node[118]{_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"hayes.shaina@yahoo.com"} | 1     |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row

ColumnFilter(0)
  |
  +Extract
    |
    +ColumnFilter(1)
      |
      +EagerAggregation
        |
        +Filter
          |
          +SimplePatternMatcher
            |
            +SchemaIndex

+----------------------+------+--------+----------------------+----------------------------------------------------------------------------+
|             Operator | Rows | DbHits |          Identifiers |                                                                      Other |
+----------------------+------+--------+----------------------+----------------------------------------------------------------------------+
|      ColumnFilter(0) |    1 |      0 |                      |                                                      keep columns n, posts |
|              Extract |    1 |      0 |                      |                                                                      posts |
|      ColumnFilter(1) |    1 |      0 |                      |                                           keep columns n,   AGGREGATION153 |
|     EagerAggregation |    1 |      0 |                      |                                                                          n |
|               Filter |    1 |      3 |                      | (hasLabel(post:BlogPost(1)) AND Property(post,active(8)) == {  AUTOBOOL1}) |
| SimplePatternMatcher |    1 |     12 | n, post,   UNNAMED84 |                                                                            |
|          SchemaIndex |    1 |      2 |                 n, n |                                                {  AUTOSTRING0}; :User(_id) |
+----------------------+------+--------+----------------------+----------------------------------------------------------------------------+

Total database accesses: 17

在这种情况下,Cypher 不会使用索引 :Post(published).

In this case, Cypher will not make use of the index :Post(published).

因此,在您有 ActivePost 标签的情况下,标签的使用效率更高,例如:

Thus the use of labels is more performant in the case you have a ActivePost label for e.g. :

neo4j-sh (?)$ PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc'
> WITH n
> MATCH (n)-[:WRITTEN]->(post:ActivePost)
> RETURN n, size(collect(post)) as posts;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n                                                                                                                                                         | posts |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node[118]{_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"hayes.shaina@yahoo.com"} | 1     |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row

ColumnFilter(0)
  |
  +Extract
    |
    +ColumnFilter(1)
      |
      +EagerAggregation
        |
        +Filter
          |
          +SimplePatternMatcher
            |
            +SchemaIndex

+----------------------+------+--------+----------------------+----------------------------------+
|             Operator | Rows | DbHits |          Identifiers |                            Other |
+----------------------+------+--------+----------------------+----------------------------------+
|      ColumnFilter(0) |    1 |      0 |                      |            keep columns n, posts |
|              Extract |    1 |      0 |                      |                            posts |
|      ColumnFilter(1) |    1 |      0 |                      | keep columns n,   AGGREGATION130 |
|     EagerAggregation |    1 |      0 |                      |                                n |
|               Filter |    1 |      1 |                      |     hasLabel(post:ActivePost(2)) |
| SimplePatternMatcher |    1 |      4 | n, post,   UNNAMED84 |                                  |
|          SchemaIndex |    1 |      2 |                 n, n |      {  AUTOSTRING0}; :User(_id) |
+----------------------+------+--------+----------------------+----------------------------------+

Total database accesses: 7

第 3 点. 始终使用标签表示肯定,这意味着对于上述情况,拥有草稿标签将迫使您执行以下查询:

Point 3. Always use labels for positives, meaning for the case above, having a Draft label will force you to execute the following query :

MATCH (n:User {id:1})
WITH n
MATCH (n)-[:POST]->(post:Post)
WHERE NOT post :Draft
RETURN n, collect(post) as posts;

意味着 Cypher 将打开每个节点标签标题并对其进行过滤.

Meaning that Cypher will open each node label headers and do a filter on it.

第 4 点.避免需要匹配多个标签

MATCH (n:User {id:1})
WITH n
MATCH (n)-[:POST]->(post:Post:ActivePost)
RETURN n, collect(post) as posts;

neo4j-sh (?)$ PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc'
> WITH n
> MATCH (n)-[:WRITTEN]->(post:BlogPost:ActivePost)
> RETURN n, size(collect(post)) as posts;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n                                                                                                                                                         | posts |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node[118]{_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"hayes.shaina@yahoo.com"} | 1     |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row

ColumnFilter(0)
  |
  +Extract
    |
    +ColumnFilter(1)
      |
      +EagerAggregation
        |
        +Filter
          |
          +SimplePatternMatcher
            |
            +SchemaIndex

+----------------------+------+--------+----------------------+---------------------------------------------------------------+
|             Operator | Rows | DbHits |          Identifiers |                                                         Other |
+----------------------+------+--------+----------------------+---------------------------------------------------------------+
|      ColumnFilter(0) |    1 |      0 |                      |                                         keep columns n, posts |
|              Extract |    1 |      0 |                      |                                                         posts |
|      ColumnFilter(1) |    1 |      0 |                      |                              keep columns n,   AGGREGATION139 |
|     EagerAggregation |    1 |      0 |                      |                                                             n |
|               Filter |    1 |      2 |                      | (hasLabel(post:BlogPost(1)) AND hasLabel(post:ActivePost(2))) |
| SimplePatternMatcher |    1 |      8 | n, post,   UNNAMED84 |                                                               |
|          SchemaIndex |    1 |      2 |                 n, n |                                   {  AUTOSTRING0}; :User(_id) |
+----------------------+------+--------+----------------------+---------------------------------------------------------------+

Total database accesses: 12

这将导致 Cypher 的过程与第 3 点相同.

This will result in the same process for Cypher that on point 3.

第 5 点.如果可能,通过类型良好的命名关系避免匹配标签

Point 5. If possible, avoid the need to match on labels by having well typed named relationships

MATCH (n:User {id:1})
WITH n
MATCH (n)-[:PUBLISHED]->(p)
RETURN n, collect(p) as posts

-

MATCH (n:User {id:1})
WITH n
MATCH (n)-[:DRAFTED]->(post)
RETURN n, collect(post) as posts;

neo4j-sh (?)$ PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc'
> WITH n
> MATCH (n)-[:DRAFTED]->(post)
> RETURN n, size(collect(post)) as posts;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n                                                                                                                                                         | posts |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node[118]{_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"hayes.shaina@yahoo.com"} | 3     |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row

ColumnFilter(0)
  |
  +Extract
    |
    +ColumnFilter(1)
      |
      +EagerAggregation
        |
        +SimplePatternMatcher
          |
          +SchemaIndex

+----------------------+------+--------+----------------------+----------------------------------+
|             Operator | Rows | DbHits |          Identifiers |                            Other |
+----------------------+------+--------+----------------------+----------------------------------+
|      ColumnFilter(0) |    1 |      0 |                      |            keep columns n, posts |
|              Extract |    1 |      0 |                      |                            posts |
|      ColumnFilter(1) |    1 |      0 |                      | keep columns n,   AGGREGATION119 |
|     EagerAggregation |    1 |      0 |                      |                                n |
| SimplePatternMatcher |    3 |      0 | n, post,   UNNAMED84 |                                  |
|          SchemaIndex |    1 |      2 |                 n, n |      {  AUTOSTRING0}; :User(_id) |
+----------------------+------+--------+----------------------+----------------------------------+

Total database accesses: 2

性能会更高,因为它将使用图的所有功能,并且只遵循来自节点的关系,从而不会比匹配用户节点更多的数据库访问,因此不会对标签进行过滤.

Will be more performant, because it will use all the power of the graph and just follow the relationships from the node resulting in no more db accesses than matching the user node and thus no filtering on labels.

这是我的 0,02€

这篇关于Neo4j:标签与索引属性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆