Postgres物化路径-使用ltree有什么好处? [英] Postgres Materialized Path - What are the benefits of using ltree?

查看:280
本文介绍了Postgres物化路径-使用ltree有什么好处?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

材料化路径是一种用于表示SQL中层次结构的方法.每个节点都包含路径本身及其所有祖先(grandparent/parent/self).

Materialized Path is a method for representing hierarchy in SQL. Each node contains the path itself and all its ancestors (grandparent/parent/self).

MP的django-treebeard实现( docs ):

  1. 路径的每一步都是固定长度,以保持一致的性能.

  1. Each step of the path is a fixed length for consistent performance.

每个节点包含depthnumchild字段(快速读取,而写入成本最低).

Each node contains depth and numchild fields (fast reads at minimal cost to writes).

已为路径字段建立索引(使用标准b树索引):

The path field is indexed (with a standard b-tree index):

物化路径方法在数据库中大量使用LIKE,并使用WHERE path LIKE'002003%'之类的子句.如果您认为LIKE太慢,那是对的,但是在这种情况下,路径字段已在数据库中建立索引,并且所有不以%字符开头的LIKE子句都将使用索引.这就是使物化路径方法如此之快的原因.

The materialized path approach makes heavy use of LIKE in your database, with clauses like WHERE path LIKE '002003%'. If you think that LIKE is too slow, you’re right, but in this case the path field is indexed in the database, and all LIKE clauses that don’t start with a % character will use the index. This is what makes the materialized path approach so fast.

get_ancestors的实现(链接):

匹配具有包含当前路径子集的路径的节点(steplen是步骤的固定长度).

Match nodes with a path that contains a subset of the current path (steplen is the fixed length of a step).

paths = [
    self.path[0:pos]
    for pos in range(0, len(self.path), self.steplen)[1:]
]
return get_result_class(self.__class__).objects.filter(
    path__in=paths).order_by('depth')

get_descendants的实现(链接):

匹配深度大于self的节点和以当前路径开头的路径.

Match nodes with a depth greater than self and a path which starts with current path.

return cls.objects.filter(
    path__startswith=parent.path,
    depth__gte=parent.depth
).order_by(
    'path'
)

此方法的潜在缺点:

  1. 深度嵌套的层次结构会导致路径变长,从而影响读取性能.
  2. 移动节点需要更新所有后代的路径.

Postgres包括ltree扩展名,该扩展名提供了自定义 GiST 索引(文档).

Postgres includes the ltree extension which provides a custom GiST index (docs).

我不清楚ltree相对于django-treebeard的实现有哪些好处.该文章认为,只有ltree可以回答get_ancestors问题,但是如前所述,弄清楚节点的祖先(或后代)是微不足道的.

I am not clear which benefits ltree provides over django-treebeard's implementation. This article argues that only ltree can answer the get_ancestors question, but as demonstrated earlier, figuring out the ancestors (or descendants) of a node is trivial.

[此外,如果找到了这个Django ltree库- https://github. com/mariocesar/django-ltree] .

[As an aside, if found this Django ltree library - https://github.com/mariocesar/django-ltree].

两种方法都使用索引(django-treebeard使用b树,ltree使用自定义GiST).我有兴趣了解ltree GiST的实现,以及对于这种特殊用例(物化路径),为什么它可能比标准b树更有效的索引.

Both approaches use an index (django-treebeard uses b-tree, ltree uses a custom GiST). I am interested in understanding the implementation of the ltree GiST and why it might be a more efficient index than a standard b-tree for this particular use case (materialized path).

其他链接

哪些选项可用于在关系数据库中存储分层数据?

https://news.ycombinator.com/item?id=709970

推荐答案

TL; DR 可重用的标签,复杂的搜索模式和针对多个后代节点(或路径未经过的单个节点)的祖先搜索尚未被检索到)无法使用物化路径索引来完成.

TL;DR Reusable labels, complex search patterns, and ancestry searches against multiple descendant nodes (or a single node whose path hasn't yet been retrieved) can't be accomplished using a materialized path index.

对于那些对血腥细节感兴趣的人...

For those interested in the gory details...

首先,仅当您不重用节点描述中的任何标签时,您的问题才有意义.如果是的话,l树实际上是两者中的唯一选择.但是物化路径实现通常不需要它,因此我们将其放在一边.

Firstly, your question is only relevant if you are not reusing any labels in your node description. If you were, the l-tree is really the only option of the two. But materialized path implementations don't typically need this, so let's put that aside.

一个明显的区别是l树提供给您的搜索类型的灵活性.考虑以下示例(来自您问题中链接的ltree文档):

One obvious difference will be in the flexibility in the types of searches that l-tree gives you. Consider these examples (from the ltree docs linked in your question):

foo         Match the exact label path foo
*.foo.*     Match any label path containing the label foo
*.foo       Match any label path whose last label is foo

第一个查询显然可以通过实例化路径来实现.最后一个也是可以实现的,您可以在其中将查询作为同级查找进行调整.但是,不能通过单个索引查找直接实现中间情况.您要么必须将此分解为两个查询(所有后代+所有祖先),要么求助于表扫描.

The first query is obviously achievable with materialized path. The last is also achievable, where you'd adjust the query as a sibling lookup. The middle case, however, isn't directly achievable with a single index lookup. You'd either have to break this up into two queries (all descendants + all ancestors), or resort to a table scan.

然后真的有像这样的复杂查询(同样来自文档):

And then there are really complex queries like this one (also from the docs):

Top.*{0,2}.sport*@.!football|tennis.Russ*|Spain

物化路径索引在这里将毫无用处,因此需要全表扫描来处理.如果要以SARGable查询的形式执行此操作,则l-tree是唯一的选择.

A materialized path index would be useless here, and a full table scan would be required to handle this. l-tree is the only option if you want to perform this as a SARGable query.

但是对于标准的分层操作,请找到以下任何一项:

But for the standard hierarchical operations, finding any of:

  • 父母
  • 孩子
  • 后裔
  • 根节点
  • 叶节点

物化路径与l树一样有效.与上面链接的文章相反,使用b树搜索同一个祖先的所有后代非常可行.如果您的索引准备正确,则查询格式WHERE path LIKE 'A.%'是SARGable的(我必须使用varchar_pattern_ops明确标记我的路径索引,以使其正常工作.)

materialized path will work just as well as l-tree. Contrary to the article linked above, searching for all descendants of a common ancestor is very doable using a b-tree. The query format WHERE path LIKE 'A.%' is SARGable provided your index is prepared properly (I had to explicitly tag my path index with varchar_pattern_ops to get this to work).

此列表中缺少的是查找后代的所有祖先.不幸的是,查询格式WHERE 'A.B.C.D' LIKE path || '.%'不会使用索引.一些库实现的一种解决方法是从路径中解析祖先节点,然后直接查询它们:WHERE id IN ('A', 'B', 'C').但是,仅当您针对已检索其路径的特定节点的祖先时才有效. l树将在这一方面取胜.

What is missing from this list is finding all ancestors for a descendant. The query format WHERE 'A.B.C.D' LIKE path || '.%' is unfortunately not going to use the index. One workaround that some libraries implement is to parse out the ancestor nodes from the path, and query them directly: WHERE id IN ('A', 'B', 'C'). However, this will only work if you're targeting ancestors of a specific node whose path you have already retrieved. l-tree is going to win on this one.

这篇关于Postgres物化路径-使用ltree有什么好处?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆