可以蛮力算法的规模有多大? [英] Can brute force algorithms scale?

查看:130
本文介绍了可以蛮力算法的规模有多大?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有我解决的试验和错误(我认为这就是所谓的蛮力)一道数学题,并且程序正常工作时,有几个选择,但我添加更多的变量/数据需要更长的时间和更长运行

I have a math problem that I solve by trial and error (I think this is called brute force), and the program works fine when there are a few options, but as I add more variables/data it takes longer and longer to run.

我的问题是,虽然,工程样机,这是与成千上万的变量和大型数据集的有用的;所以,我不知道是否有可能扩大蛮力算法。我如何能接近缩放呢?

My problem is although, the prototype works, it is useful with thousands of variables and large data sets; so, I'm wondering if it is possible to scale brute force algorithms. How can I approach scaling it?

我开始学习和玩 Hadoop的(和的 HBase的);虽然它看起来很有希望,我想验证我想要做的也不是不可能的。

I was starting to learn and play around with Hadoop (and HBase); although it looks promising, I wanted to verify that what I'm trying to do isn't impossible.

如果有帮助,我用Java编写的程序(并可以使用它,如果可能的话),但最终将它移植到Python,因为我感觉更舒服。

If it helps, I wrote the program in Java (and can use it if possible), but ended up porting it to Python, because I feel more comfortable with it.

更新:为了提供更深入的了解,我想我会增加的code的简化版本,以获得的想法。基本上,如果我知道的和是100,我试图找到可以等于它的变量的所有组合。这很简单,我的版本,我可能会使用较大的数字,还有更多变数。这是不定的,我相信没有算法存在的解决它没有暴力。

Update: To provide more insight, I think I'll add a simplified version of the code to get the idea. Basically if I know the sum is 100, I am trying to find all combinations of the variables that could equal it. This is simple, in my version I may use larger numbers and many more variables. It's the Diophantine, and I believe there is no algorithm that exists to solve it without brute force.

int sum = 100;
int a1 = 20;
int a2 = 5;
int a3 = 10;
for (int i = 0; i * a1 <= sum; i++) {
    for (int j = 0; i * a1 + j * a2 <= sum; j++) {
        for (int k = 0; i * a1 + j * a2 + k * a3 <= sum; k++) {
            if (i * a1 + j * a2 + k * a3 == sum) {
              System.out.println(i + "," + j + "," + k);
            }
        }
    }   
}

我是新来的节目,我很抱歉,如果我没有正确地制定了这个问题。这更是一个普遍的问题。

I am new to programming, and I am sorry if I'm not framing this question correctly. This is more of a general question.

推荐答案

通常情况下,你可以量化如何算法将通过使用大O表示法来分析其发展速度扩展。当你说你的算法通过蛮力,目前还不清楚到什么程度就会扩大。如果你的蛮力解决方案可通过列出一组数据的所有可能的子集或组合,那么几乎可以肯定不会缩放(它会渐进复杂度为O(2 N )或为O(n !), 分别)。如果你的蛮力解决方案的工作原理是找出所有元素对和检查每个,它可以合理地很好地扩展(O(N 2 ))。如果没有了解自己的算法工作,但更多的信息,就很难说了。

Typically, you can quantify how well an algorithm will scale by using big-O notation to analyze its growth rate. When you say that your algorithm works by "brute force," it's unclear to what extent it will scale. If your "brute force" solution works by listing all possible subsets or combinations of a set of data, then it almost certainly will not scale (it will have asymptotic complexity O(2n) or O(n!), respectively). If your brute force solution works by finding all pairs of elements and checking each, it may scale reasonably well (O(n2)). Without more information about how your algorithm works, though, it's difficult to say.

您可能想看看约大O 这个优秀的帖子作为一个起点,如何推理程序的长期scalablility。通常来说,任何有增长率为O(n log n)的,为O(n),O(log n)的,或O(1)规模非常出色,增长率为O(n 2 什么)或为O(n 3 )将扩大到一个点,并与增速Ø任何东西(2 N )或更高版本将无法扩展的。

You may want to look at this excellent post about big-O as a starting point for how to reason about the long-term scalablility of your program. Typically speaking, anything that has growth rate O(n log n), O(n), O(log n), or O(1) scale extremely well, anything with growth rate O(n2) or O(n3) will scale up to a point, and anything with growth rate O(2n) or higher will not scale at all.

另一种选择是查找你正在试图解决,看看如何充分研究它的问题。有些问题目前已知有很大的解决方案,而如果你是其中之一,它可能是值得看别人纷纷拿出。或许有一个非常干净的,非强力解决方案,扩展真的很好!一些其它问题推测有完全没有可伸缩算法(所谓的 NP难问题 的)。如果是这样的话,那么你就应该是pretty的信心,有没有办法让一个可扩展的方法。

Another option would be to look up the problem you're trying to solve to see how well-studied it is. Some problems are known to have great solutions, and if yours is one of them it might be worth seeing what others have come up with. Perhaps there is a very clean, non-brute-force solution that scales really well! Some other problems are conjectured to have no scalable algorithms at all (the so-called NP-hard problems). If that's the case, then you should be pretty confident that there's no way to get a scalable approach.

最后,你可以随时问在这里堆栈溢出的新问题,描述你想要做什么,并要求输入。也许社会可以帮助您更有效地比你最初预期的解决您的问题!

And finally, you can always ask a new question here at Stack Overflow describing what you're trying to do and asking for input. Maybe the community can help you solve your problem more efficiently than you initially expected!

编辑:既然你正在试图解决的问题的描述,现在你正在做一个用于每个变量循环从0到你想要的目标的数量。在该算法的复杂度为O(U K ),其中k是变量的数目和U是总和。这种方法将不可以很好地进行缩放的。在上述情况下引入每一个新变量将使得algori2thm运行100倍慢,这绝对不会规模非常好,如果你想100变量!

Given the description of the problem that you are trying to solve, right now you are doing one for loop per variable from 0 up to the number you're trying to target. The complexity of this algorithm is O(Uk), where k is the number of variables and U is the sum. This approach will not scale very well at all. Introducing each new variable in the above case will make the algori2thm run 100 times slower, which definitely will not scale very well if you want 100 variables!

不过,我认为有一个相当不错的算法,其运行时间为O(U 2 K),使用O(屋)的内存,以解决这个问题。直觉如下:假设我们要总结1,2,和4中得到10有很多方法可以做到这一点:

However, I think that there is a fairly good algorithm whose runtime is O(U2k) that uses O(Uk) memory to solve the problem. The intuition is as follows: Suppose that we want to sum up 1, 2, and 4 to get 10. There are many ways to do this:

2 * 4 +  1 * 2 +  0 * 1
2 * 4 +  0 * 2 +  2 * 1
1 * 4 +  3 * 2 +  0 * 1
1 * 4 +  2 * 2 +  2 * 1
1 * 4 +  1 * 2 +  4 * 1
1 * 4 +  0 * 2 +  6 * 1
0 * 4 +  5 * 2 +  0 * 1
0 * 4 +  4 * 2 +  2 * 1
0 * 4 +  3 * 2 +  4 * 1
0 * 4 +  2 * 2 +  6 * 1
0 * 4 +  1 * 2 +  8 * 1
0 * 4 +  0 * 2 + 10 * 1

关键的观察是,我们可以写所有这些出来的款项,但更重要的是,作为款项,其中的每一项都是不超过previous内有较大:

The key observation is that we can write all of these out as sums, but more importantly, as sums where each term in the sum is no greater than the previous term:

2 * 4 +  1 * 2 +  0 * 1 = 4 + 4 + 2
2 * 4 +  0 * 2 +  2 * 1 = 4 + 4 + 1 + 1
1 * 4 +  3 * 2 +  0 * 1 = 4 + 2 + 2 + 2
1 * 4 +  2 * 2 +  2 * 1 = 4 + 2 + 2 + 1 + 1
1 * 4 +  1 * 2 +  4 * 1 = 4 + 2 + 1 + 1 + 1 + 1
1 * 4 +  0 * 2 +  6 * 1 = 4 + 1 + 1 + 1 + 1 + 1 + 1
0 * 4 +  5 * 2 +  0 * 1 = 2 + 2 + 2 + 2 + 2
0 * 4 +  4 * 2 +  2 * 1 = 2 + 2 + 2 + 2 + 1 + 1
0 * 4 +  3 * 2 +  4 * 1 = 2 + 2 + 2 + 1 + 1 + 1 + 1
0 * 4 +  2 * 2 +  6 * 1 = 2 + 2 + 1 + 1 + 1 + 1 + 1 + 1
0 * 4 +  1 * 2 +  8 * 1 = 2 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1
0 * 4 +  0 * 2 + 10 * 1 = 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1

因此​​,这提供了有关如何生成所有可能的方式来总结目标一个有趣的想法。这样做是为了解决第一个系数,然后生成所有可能的方式,使之和的休息锻炼身体。换句话说,我们可以认为这个问题递归。如果我们列出顺序变量为x <子> 1 ,X 2 ,...,X <子> N ,那么我们可以尝试修复某些特定的系数对于x <子> 1 ,然后求解总结和的问题 - C_1 X_1 只用x 2 ,... ,X <子> N 。

So this gives an interesting idea about how to generate all possible ways to sum up to the target. The idea is to fix the first coefficient, then generate all possible ways to make the rest of the sum work out. In other words, we can think about the problem recursively. If we list the variables in order as x1, x2, ..., xn, then we can try fixing some particular coefficient for x1, then solving the problem of summing up sum - c_1 x_1 using just x2, ..., xn.

到目前为止,这似乎并不是所有的花哨的 - 事实上,它是$ P $你在做什么pcisely以上 - 但有一个窍门,我们可以使用。只要我们将要思考这个问题递归,让我们想想以相反的方式的问题。而不是开始,并试图打破它,相反,如果我们开始用什么0,并试图建立的一切,我们可以?

So far this doesn't seem all that fancy - in fact, it's precisely what you're doing above - but there is one trick we can use. As long as we're going to be thinking about this problem recursively, let's think about the problem in the opposite manner. Rather than starting with sum and trying to break it down, what if instead we started with 0 and tried to build up everything that we could?

这里的想法。假设我们已经事先知道所有我们可以只用款项出现x <子> 1 的数字。那么对于之间每隔数k 0和,包容性,我们可以K掉出现x 2 和X <子> 1 出的任意组合,其中k - ç 2 X 2 的东西,可以做出来的第X <子> 1 的组合。但由于我们pcomputed这个$ P $,我们就可以遍历了对C 2 ,计算k的所有可能的法律价值 - ç 2 X 2 < /分>,看看我们是否知道如何做到这一点。布尔值,假设我们存储一个巨大的ûX(k + 1)表,使得表项[X,Y]卖场我们可以总结第一y值,具有包容性,总结了以precisely U盘的方法?,我们可以在表中有效地填充。这就是所谓的动态规划的,是一个功能强大的算法工具。

Here's the idea. Suppose that we already know in advance all the numbers we can make using just sums of x1. Then for every number k between 0 and sum, inclusive, we can make k out of x2 and x1 out of any combination where k - c2 x2 is something that can be made out of combinations of x1. But since we've precomputed this, we can just iterate up over all possible legal values of c2, compute k - c2 x2, and see if we know how to make it. Assuming we store a giant U x (k + 1) table of boolean values such that table entry [x, y] stores "can we sum up the first y values, inclusive, in a way that sums up to precisely U?," we can fill in the table efficiently. This is called dynamic programming and is a powerful algorithmic tool.

更具体地说,这里是如何这可能工作。鉴于K个变量,创建成U x值(K + 1)表笔。然后,设置T [0] [0] =真和T [X] [0] = FALSE对于所有的x> 0这里的理由是,T [0] [0]表示可以得到使用的总和为零第一个零变量的线性组合?答案是肯定的(空之和为零!),但取得的任何变量没有线性组合的任何其他款项,我们绝对无法做到这一点。

More concretely, here's how this might work. Given k variables, create a U x (k + 1) table T of values. Then, set T[0][0] = true and T[x][0] = false for all x > 0. The rationale here is that T[0][0] means "can we get the sum zero using a linear combination of the first zero variables?" and the answer is definitely yes (the empty sum is zero!), but for any other sum made of no a linear combination of no variables we definitely cannot make it.

现在,对于i = 1 .. K,我们将尽量填写T [X] [I]的值。请记住,T [X] [I]表示:我们可以使x作为第i个变量的线性组合?好吧,我们知道我们可以做到这一点,如果有一定的系数C,这样的k - CX <子>我可以使用出现x <子> 1 的线性组合进行,X <子> 2 ,...,X <子>我 - 1 。但是,对于任何C,这只是T是否[X - CX <子>我] [我 - 1]是真实的。因此,我们可以说

Now, for i = 1 .. k, we'll try to fill in the values of T[x][i]. Remember that T[x][i] means "can we make x as a linear combination of the first i variables?" Well, we know that we can do this if there is some coefficient c such that k - c xi can be made using a linear combination of x1, x2, ..., xi - 1. But for any c, that's just whether T[x - c xi][i - 1] is true. Thus we can say

for i = 1 to k
    for z = 0 to sum:
        for c = 1 to z / x_i:
            if T[z - c * x_i][i - 1] is true:
                set T[z][i] to true

检查的循环,我们看到了外部循环运行k次,内循环运行每次迭代的总和倍,最内层循环也运行在大多数每次迭代的总和次。他们的产品是(用我们的符号上面)O(U 2 K),这是比将O更好的(U K )算法,你原本。

Inspecting the loops, we see that the outer loop runs k times, the inner loop runs sum times per iteration, and the innermost loop runs also at most sum times per iteration. Their product is (using our notation from above) O(U2 k), which is way better than the O(Uk) algorithm that you had originally.

但你如何使用这些信息列出了所有可能的方式来总结一下目标?这里的关键是要认识到,你可以使用表,以避免浪费大量的精力在搜索时,很多人都行不通的每一个可能的组合。

But how do you use this information to list off all of the possible ways to sum up to the target? The trick here is to realize that you can use the table to avoid wasting a huge amount of effort searching over every possible combination when many of them aren't going to work.

让我们看一个例子。假设我们有这个表完全计算,并要列​​出了所有的解决方案。一种想法是考虑列出所有的解决方案,其中最后一个变量的系数是零,那么,当最后一个变量是1等。与以前具有的方法的问题是,对于某些系数可能没有在所有的任何解决方案。但是,与我们上面构建的表格中,我们可以修剪掉那些分支。例如,假设我们想看看是否有一些开始使用X <子>的任何解决方案氏/分>有系数0。这意味着,我们要问,如果有任何的方式来总结的线性组合前k - 1变量,使这些值的总和。这是可能的,当且仅当T [总和] [K - 1]是真实的。如果这是真的,那么我们就可以递归地尝试,总结了以办法分配系数的值的其余部分。如果没有,那么我们跳过这个系数,并继续到下一个。

Let's see an example. Suppose that we have this table completely computed and want to list off all solutions. One idea is to think about listing all solutions where the coefficient of the last variable is zero, then when the last variable is one, etc. The issue with the approach you had before is that for some coefficients there might not be any solutions at all. But with the table we have constructed above, we can prune out those branches. For example, suppose that we want to see if there are any solutions that start with xk having coefficient 0. This means that we're asking if there are any ways to sum up a linear combination of the first k - 1 variables so that the sum of those values is sum. This is possible if and only if T[sum][k - 1] is true. If it is true, then we can recursively try assigning coefficients to the rest of the values in a way that sums up to sum. If not, then we skip this coefficient and go on to the next.

递归,这看起来是这样的:

Recursively, this looks something like this:

function RecursivelyListAllThatWork(k, sum) // Using last k variables, make sum
    /* Base case: If we've assigned all the variables correctly, list this
     * solution.
     */
    if k == 0:
        print what we have so far
        return

    /* Recursive step: Try all coefficients, but only if they work. */
    for c = 0 to sum / x_k:
       if T[sum - c * x_k][k - 1] is true:
           mark the coefficient of x_k to be c
           call RecursivelyListAllThatWork(k - 1, sum - c * x_k)
           unmark the coefficient of x_k

这递归将列出所有的工作方案,在我们刚刚构建跳过了大量无用功表中的值。一旦你建立了这个表,可以通过外包出去的工作到多台计算机,让他们每个列表的整体解决方案的一个子集,并处理它们全部并行瓜分这项工作了。

This recursively will list all the solutions that work, using the values in the table we just constructed to skip a huge amount of wasted effort. Once you've built this table, you could divvy this work up by farming out the task to multiple computers, having them each list a subset of the total solutions, and processing them all in parallel.

希望这有助于!

这篇关于可以蛮力算法的规模有多大?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆