Postgres物化路径-使用ltree有什么好处? [英] Postgres Materialized Path - What are the benefits of using ltree?
问题描述
材料化路径是一种用于表示SQL中层次结构的方法.每个节点都包含路径本身及其所有祖先(grandparent/parent/self
).
Materialized Path is a method for representing hierarchy in SQL. Each node contains the path itself and all its ancestors (grandparent/parent/self
).
MP的django-treebeard
实现( docs ):
-
路径的每一步都是固定长度,以保持一致的性能.
Each step of the path is a fixed length for consistent performance.
每个节点包含depth
和numchild
字段(快速读取,而写入成本最低).
Each node contains depth
and numchild
fields (fast reads at minimal cost to writes).
已为路径字段建立索引(使用标准b树索引):
The path field is indexed (with a standard b-tree index):
物化路径方法在数据库中大量使用LIKE,并使用WHERE path LIKE'002003%'之类的子句.如果您认为LIKE太慢,那是对的,但是在这种情况下,路径字段已在数据库中建立索引,并且所有不以%字符开头的LIKE子句都将使用索引.这就是使物化路径方法如此之快的原因.
The materialized path approach makes heavy use of LIKE in your database, with clauses like WHERE path LIKE '002003%'. If you think that LIKE is too slow, you’re right, but in this case the path field is indexed in the database, and all LIKE clauses that don’t start with a % character will use the index. This is what makes the materialized path approach so fast.
get_ancestors
的实现(链接):
匹配具有包含当前路径子集的路径的节点(steplen
是步骤的固定长度).
Match nodes with a path that contains a subset of the current path (steplen
is the fixed length of a step).
paths = [
self.path[0:pos]
for pos in range(0, len(self.path), self.steplen)[1:]
]
return get_result_class(self.__class__).objects.filter(
path__in=paths).order_by('depth')
get_descendants
的实现(链接):
匹配深度大于self的节点和以当前路径开头的路径.
Match nodes with a depth greater than self and a path which starts with current path.
return cls.objects.filter(
path__startswith=parent.path,
depth__gte=parent.depth
).order_by(
'path'
)
此方法的潜在缺点:
- 深度嵌套的层次结构会导致路径变长,从而影响读取性能.
- 移动节点需要更新所有后代的路径.
Postgres包括ltree
扩展名,该扩展名提供了自定义 GiST 索引(文档).
Postgres includes the ltree
extension which provides a custom GiST index (docs).
我不清楚ltree
相对于django-treebeard
的实现有哪些好处.该文章认为,只有ltree
可以回答get_ancestors
问题,但是如前所述,弄清楚节点的祖先(或后代)是微不足道的.
I am not clear which benefits ltree
provides over django-treebeard
's implementation. This article argues that only ltree
can answer the get_ancestors
question, but as demonstrated earlier, figuring out the ancestors (or descendants) of a node is trivial.
[此外,如果找到了这个Django ltree
库- https://github. com/mariocesar/django-ltree] .
[As an aside, if found this Django ltree
library - https://github.com/mariocesar/django-ltree].
两种方法都使用索引(django-treebeard
使用b树,ltree
使用自定义GiST).我有兴趣了解ltree
GiST的实现,以及对于这种特殊用例(物化路径),为什么它可能比标准b树更有效的索引.
Both approaches use an index (django-treebeard
uses b-tree, ltree
uses a custom GiST). I am interested in understanding the implementation of the ltree
GiST and why it might be a more efficient index than a standard b-tree for this particular use case (materialized path).
其他链接
https://news.ycombinator.com/item?id=709970
推荐答案
TL; DR 可重用的标签,复杂的搜索模式和针对多个后代节点(或路径未经过的单个节点)的祖先搜索尚未被检索到)无法使用物化路径索引来完成.
TL;DR Reusable labels, complex search patterns, and ancestry searches against multiple descendant nodes (or a single node whose path hasn't yet been retrieved) can't be accomplished using a materialized path index.
对于那些对血腥细节感兴趣的人...
For those interested in the gory details...
首先,仅当您不重用节点描述中的任何标签时,您的问题才有意义.如果是的话,l树实际上是两者中的唯一选择.但是物化路径实现通常不需要它,因此我们将其放在一边.
Firstly, your question is only relevant if you are not reusing any labels in your node description. If you were, the l-tree is really the only option of the two. But materialized path implementations don't typically need this, so let's put that aside.
一个明显的区别是l树提供给您的搜索类型的灵活性.考虑以下示例(来自您问题中链接的ltree
文档):
One obvious difference will be in the flexibility in the types of searches that l-tree gives you. Consider these examples (from the ltree
docs linked in your question):
foo Match the exact label path foo
*.foo.* Match any label path containing the label foo
*.foo Match any label path whose last label is foo
第一个查询显然可以通过实例化路径来实现.最后一个也是可以实现的,您可以在其中将查询作为同级查找进行调整.但是,不能通过单个索引查找直接实现中间情况.您要么必须将此分解为两个查询(所有后代+所有祖先),要么求助于表扫描.
The first query is obviously achievable with materialized path. The last is also achievable, where you'd adjust the query as a sibling lookup. The middle case, however, isn't directly achievable with a single index lookup. You'd either have to break this up into two queries (all descendants + all ancestors), or resort to a table scan.
然后真的有像这样的复杂查询(同样来自文档):
And then there are really complex queries like this one (also from the docs):
Top.*{0,2}.sport*@.!football|tennis.Russ*|Spain
物化路径索引在这里将毫无用处,因此需要全表扫描来处理.如果要以SARGable查询的形式执行此操作,则l-tree是唯一的选择.
A materialized path index would be useless here, and a full table scan would be required to handle this. l-tree is the only option if you want to perform this as a SARGable query.
但是对于标准的分层操作,请找到以下任何一项:
But for the standard hierarchical operations, finding any of:
- 父母
- 孩子
- 后裔
- 根节点
- 叶节点
物化路径与l树一样有效.与上面链接的文章相反,使用b树搜索同一个祖先的所有后代非常可行.如果您的索引准备正确,则查询格式WHERE path LIKE 'A.%'
是SARGable的(我必须使用varchar_pattern_ops
明确标记我的路径索引,以使其正常工作.)
materialized path will work just as well as l-tree. Contrary to the article linked above, searching for all descendants of a common ancestor is very doable using a b-tree. The query format WHERE path LIKE 'A.%'
is SARGable provided your index is prepared properly (I had to explicitly tag my path index with varchar_pattern_ops
to get this to work).
此列表中缺少的是查找后代的所有祖先.不幸的是,查询格式WHERE 'A.B.C.D' LIKE path || '.%'
不会使用索引.一些库实现的一种解决方法是从路径中解析祖先节点,然后直接查询它们:WHERE id IN ('A', 'B', 'C')
.但是,仅当您针对已检索其路径的特定节点的祖先时才有效. l树将在这一方面取胜.
What is missing from this list is finding all ancestors for a descendant. The query format WHERE 'A.B.C.D' LIKE path || '.%'
is unfortunately not going to use the index. One workaround that some libraries implement is to parse out the ancestor nodes from the path, and query them directly: WHERE id IN ('A', 'B', 'C')
. However, this will only work if you're targeting ancestors of a specific node whose path you have already retrieved. l-tree is going to win on this one.
这篇关于Postgres物化路径-使用ltree有什么好处?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!