高效的笛卡尔积算法 [英] Efficient Cartesian Product algorithm

查看:673
本文介绍了高效的笛卡尔积算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

能否有人请证实对我来说是更有效的笛​​卡尔积算法比我现在用一个(假如有)。我环顾四周SO和GOOGLE了一下,但什么都看不到明显的,所以我可能失去了一些东西。

Can somebody please demonstrate for me a more efficient Cartesian product algorithm than the one I am using currently (assuming there is one). I've looked around SO and googled a bit but can't see anything obvious so I could be missing something.

foreach (int i in is) {
   foreach (int j in js) {
      //Pair i and j
   }
}

这是我做什么我的code一个高度简化的版本。两个整数查找其用于检索来自两个查找1 /多个对象和所有对象被配对在一起形成的新对象的密钥。

This is a highly simplified version of what I do in my code. The two integers are lookup keys which are used to retrieve one/more objects and all the objects from the two lookups are paired together into new objects.

的code在一个更大的更复杂的系统这小块成为一个主要性能瓶颈,因为数据集它操作过秤。一些这可以容易通过改善用于存储对象​​中的数据结构来减轻和查找涉及但主要的问题,我感到仍是笛卡尔乘积本身

This small block of code in a much larger more complex system becomes a major performance bottleneck as the dataset it's operating over scales. Some of this could likely be mitigated by improving the data structures used to store the objects and the lookups involved but the main issue I feel is still the computation of the Cartesian product itself.

修改

所以,在我的具体的算法使用的一些更多的背景,看看是否有可能,我可以回应马克的评论使用任何技巧。整个系统是在台图形数据处理SPARQL查询一个SPARQL查询引擎,SPARQL是一个基于图形的语言,所以每个查询包含了一系列的匹配对阵图(S)模式。在随后的两个图案具有没有公共变量(它们是不相交的),有必要计算由两个图案制作来获得一组可能的解决方案的总体查询溶液的笛卡尔乘积的情况。可以有任意数量的图案,我也可以来计算笛卡尔乘积多次从而导致在可能的解决办法相当迅速扩大,如果查询是由一系列不相交的图案。

So some more background on my specific usage of the algorithm to see if there may be any tricks that I can use in response to Marc's comment. The overall system is a SPARQL query engine which processes SPARQL queries over sets of Graph data, SPARQL is a pattern based language so each query consists of a series of patterns which are matched against the Graph(s). In the case where two subsequent patterns have no common variables (they are disjoint) it is necessary to compute the Cartesian product of the solutions produced by the two patterns to get the set of possible solutions for the overall query. There may be any number of patterns and I may have to compute Cartesian products multiple times which can lead to a fairly exponential expansion in possible solutions if the query is composed of a series of disjoint patterns.

不知是否有可能适用于任何技巧

Somehow from the existing answers I doubt whether there any tricks that could apply

更新

所以,我想我会发布一个更新了我,以尽量减少需要做笛卡尔积,因此一般优化查询引擎来实现。注意,这并不总是能完全消除需要的产品,但它几乎总是能够优化这样两组接合的尺寸要小得多。

So I thought I would post an update on what I implemented in order to minimise the need to do Cartesian products and thus optimise the query engine generally. Note that it's not always possible to completely eliminate the need for products but it's nearly always possible to optimise so the size of the two sets being joined is much smaller.

由于每个BGP(基本图形模式),这是执行为块(本质)的引擎是免费的重新排列BGP中的模式以获得最佳性能的一组三重模式的。例如,考虑下面的BGP:

Since each BGP (Basic Graph Pattern) which is a set of Triple Patterns is executed as a block (in essence) the engine is free to reorder patterns within a BGP for optimal performance. For example consider the following BGP:

?a :someProperty ?b .
?c :anotherProperty ?d .
?b a :Class .

执行的作为是查询需要笛卡尔乘积由于第一图案的结果是从所述第二图案不相交,以便所述第一两种模式的结果是它们的单个结果笛卡尔乘积。这个结果将包含远远的结果比我们实际需要,因为第三模式限制了第一图案的可能的结果,但我们并没有采取这种限制,直到后来。但是,如果我们重新排列,像这样:

Executed as is the query requires a cartesian product since the results of the first pattern are disjoint from the second pattern so the results of the first two patterns is a cartesian product of their individual results. This result will contain far more results than we actually need since the third pattern restricts the possible results of the first pattern but we don't apply this restriction till afterwards. But if we reorder like so:

?b a :Class .
?a :someProperty ?b .
?c :anotherProperty ?d .

我们仍然需要一个笛卡尔积得到最终的结果,因为在第2和第3的模式仍然是不相交的,但通过重新排序,我们限制的第二个模式意味着我们的笛卡尔积的大小的结果的大小会小很多

We'll still need a cartesian product to get the final results since the 2nd and 3rd patterns are still disjoint but by reordering we restrict the size of the results of the 2nd pattern meaning the size of our cartesian product will be much smaller.

有,我们做一些其他各种的优化,但我不打算将其发布到这里,因为它开始进入SPARQL引擎内部的相当详细的讨论。如果有人有兴趣进一步的细节只是发表评论或给我鸣叫@RobVesse

There are some various other optimisations we make but I'm not going to post them here as it starts to get into fairly detailed discussion of SPARQL engine internals. If anyone is interested in further details just leave a comment or send me a tweet @RobVesse

推荐答案

笛卡尔乘积的复杂度为O( N 2 ),没有捷径。

The complexity of cartesian product is O(n2), there is no shortcut.

在特定的情况下,为了你迭代你的两个轴是重要的。例如,如果你的code为访问每一个插槽阵列 - 或者每一个像素的图像 - 那么你应该尝试访问的自然秩序的插槽。图像通常是奠定在扫描线,因此在像素上任何的的是相邻的。因此,你应该迭代的的上外环和 X 的在内。

In specific cases, the order you iterate your two axis is important. For example, if your code is visiting every slot in an array — or every pixel in an in image — then you should try to visit the slots in natural order. An image is typically laid out in ‘scanlines’, so the pixels on any Y are adjacent. Therefore, you should iterate over the Y on the outer loop and the X on the inner.

无论你需要的笛卡儿积或wherther是一种更有效的算法取决于问题你解决。

Whether you need the cartesian product or wherther is a more efficient algorithm depends on the problem you are solving.

这篇关于高效的笛卡尔积算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆