发现长模式 [英] Discover long patterns

查看：91 发布时间：2015/11/30 15:50:02 algorithm dynamic-programming

本文介绍了发现长模式的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

由于数字的排序列表，我想找个时间最长的序列，其中连续元素之间的差异几何级数增长。因此，如果该列表是

Given a sorted list of numbers, I would like to find the longest subsequence where the differences between successive elements are geometrically increasing. So if the list is

 1, 2, 3, 4, 7, 15, 27, 30, 31, 81

那么序列是 1，3，7，15，31 。另外考虑 1，2，5，6，11，15，23，41，47 具有子 5，11，23，47 带= 3和k = 2。

then the subsequence is 1, 3, 7, 15, 31. Alternatively consider 1, 2, 5, 6, 11, 15, 23, 41, 47 which has subsequence 5, 11, 23, 47 with a = 3 and k = 2.

可以这样为O解决（的 N 的²）的时间？其中n是列表的长度。

Can this be solved in O(n²) time? Where n is the length of the list.

我既在一般情况下感兴趣的，其中的差异的进展是 AK ， AK 的²， AK ³等，其中两个的在和 K 的是整数，而在特殊情况下的在的＆NBSP =＆NBSP; 1，所以差的进展是的 K 的 K 的²， K 的³等。

I am interested both in the general case where the progression of differences is ak, ak², ak³, etc., where both a and k are integers, and in the special case where a = 1, so the progression of difference is k, k², k³, etc.

推荐答案

更新

我已经提出，它平均需要O（M + N ^ 2）和O内存需求（M + N）算法的改进。主要表现是一样的，以下协议描述的，但计算可能的因素，K为ECH性差异D，I preLOAD表。此表只需不到一秒钟被并购构建= 10 ^ 7。

I have made an improvement of the algorithm that it takes an average of O(M + N^2) and memory needs of O(M+N). Mainly is the same that the protocol described below, but to calculate the possible factors A,K for ech diference D, I preload a table. This table takes less than a second to be constructed for M=10^7.

我已经做了C实现的时间不超过10分，以解决N = 10 ^ 5 diferent随机整数元素。

I have made a C implementation that takes less than 10minutes to solve N=10^5 diferent random integer elements.

下面是源$ C $ C在C：要执行只是做：GCC -O3 -o findgeo findgeo.c

Here is the source code in C: To execute just do: gcc -O3 -o findgeo findgeo.c

#include <stdio.h> #include <stdlib.h> #include <math.h> #include <memory.h> #include <time.h> struct Factor { int a; int k; struct Factor *next; }; struct Factor *factors = 0; int factorsL=0; void ConstructFactors(int R) { int a,k,C; int R2; struct Factor *f; float seconds; clock_t end; clock_t start = clock(); if (factors) free(factors); factors = malloc (sizeof(struct Factor) *((R>>1) + 1)); R2 = R>>1 ; for (a=0;a<=R2;a++) { factors[a].a= a; factors[a].k=1; factors[a].next=NULL; } factorsL=R2+1; R2 = floor(sqrt(R)); for (k=2; k<=R2; k++) { a=1; C=a*k*(k+1); while (C<R) { C >>= 1; f=malloc(sizeof(struct Factor)); *f=factors[C]; factors[C].a=a; factors[C].k=k; factors[C].next=f; a++; C=a*k*(k+1); } } end = clock(); seconds = (float)(end - start) / CLOCKS_PER_SEC; printf("Construct Table: %f\n",seconds); } void DestructFactors() { int i; struct Factor *f; for (i=0;i<factorsL;i++) { while (factors[i].next) { f=factors[i].next->next; free(factors[i].next); factors[i].next=f; } } free(factors); factors=NULL; factorsL=0; } int ipow(int base, int exp) { int result = 1; while (exp) { if (exp & 1) result *= base; exp >>= 1; base *= base; } return result; } void findGeo(int **bestSolution, int *bestSolutionL,int *Arr, int L) { int i,j,D; int mustExistToBeBetter; int R=Arr[L-1]-Arr[0]; int *possibleSolution; int possibleSolutionL=0; int exp; int NextVal; int idx; int kMax,aMax; float seconds; clock_t end; clock_t start = clock(); kMax = floor(sqrt(R)); aMax = floor(R/2); ConstructFactors(R); *bestSolutionL=2; *bestSolution=malloc(0); possibleSolution = malloc(sizeof(int)*(R+1)); struct Factor *f; int *H=malloc(sizeof(int)*(R+1)); memset(H,0, sizeof(int)*(R+1)); for (i=0;i<L;i++) { H[ Arr[i]-Arr[0] ]=1; } for (i=0; i<L-2;i++) { for (j=i+2; j<L; j++) { D=Arr[j]-Arr[i]; if (D & 1) continue; f = factors + (D >>1); while (f) { idx=Arr[i] + f->a * f->k - Arr[0]; if ((f->k <= kMax)&& (f->a<aMax)&&(idx<=R)&&H[idx]) { if (f->k ==1) { mustExistToBeBetter = Arr[i] + f->a * (*bestSolutionL); } else { mustExistToBeBetter = Arr[i] + f->a * f->k * (ipow(f->k,*bestSolutionL) - 1)/(f->k-1); } if (mustExistToBeBetter< Arr[L-1]+1) { idx= floor(mustExistToBeBetter - Arr[0]); } else { idx = R+1; } if ((idx<=R)&&H[idx]) { possibleSolution[0]=Arr[i]; possibleSolution[1]=Arr[i] + f->a*f->k; possibleSolution[2]=Arr[j]; possibleSolutionL=3; exp = f->k * f->k * f->k; NextVal = Arr[j] + f->a * exp; idx=NextVal - Arr[0]; while ( (idx<=R) && H[idx]) { possibleSolution[possibleSolutionL]=NextVal; possibleSolutionL++; exp = exp * f->k; NextVal = NextVal + f->a * exp; idx=NextVal - Arr[0]; } if (possibleSolutionL > *bestSolutionL) { free(*bestSolution); *bestSolution = possibleSolution; possibleSolution = malloc(sizeof(int)*(R+1)); *bestSolutionL=possibleSolutionL; kMax= floor( pow (R, 1/ (*bestSolutionL) )); aMax= floor(R / (*bestSolutionL)); } } } f=f->next; } } } if (*bestSolutionL == 2) { free(*bestSolution); possibleSolutionL=0; for (i=0; (i<2)&&(i<L); i++ ) { possibleSolution[possibleSolutionL]=Arr[i]; possibleSolutionL++; } *bestSolution = possibleSolution; *bestSolutionL=possibleSolutionL; } else { free(possibleSolution); } DestructFactors(); free(H); end = clock(); seconds = (float)(end - start) / CLOCKS_PER_SEC; printf("findGeo: %f\n",seconds); } int compareInt (const void * a, const void * b) { return *(int *)a - *(int *)b; } int main(void) { int N=100000; int R=10000000; int *A = malloc(sizeof(int)*N); int *Sol; int SolL; int i; int *S=malloc(sizeof(int)*R); for (i=0;i<R;i++) S[i]=i+1; for (i=0;i<N;i++) { int r = rand() % (R-i); A[i]=S[r]; S[r]=S[R-i-1]; } free(S); qsort(A,N,sizeof(int),compareInt); /* int step = floor(R/N); A[0]=1; for (i=1;i<N;i++) { A[i]=A[i-1]+step; } */ findGeo(&Sol,&SolL,A,N); printf("["); for (i=0;i<SolL;i++) { if (i>0) printf(","); printf("%d",Sol[i]); } printf("]\n"); printf("Size: %d\n",SolL); free(Sol); free(A); return EXIT_SUCCESS; }

笔画演示

我会努力证明我所提出的算法是的平均为均匀分布的随机序列。我不是一个数学家，我不习惯做这样的示范，所以请填写纠正我的错误，你可以看到。

I will try to demonstrate that the algorithm that I proposed is in average for an equally distributed random sequence. I’m not a mathematician and I am not used to do this kind of demonstrations, so please fill free to correct me any error that you can see.

有4缩进环路，两个第一是N ^ 2因子。的M是为可能的因素表的计算）。

There are 4 indented loops, the two firsts are the N^2 factor. The M is for the calculation of the possible factors table).

第三回路的每对执行只在平均一次。你可以看到这个检查pre计算的因素表的大小。它的大小是M时，N-> INF。因此，对于每一对的平均值的步骤是M / M = 1。

The third loop is executed only once in average for each pair. You can see this checking the size of the pre-calculated factors table. It’s size is M when N->inf. So the average steps for each pair is M/M=1.

因此，证明发生了以检查所述环路。（而其中，遍历取得了良好的序列被执行小于或等于O（N ^ 2）对所有的对。

So the proof happens to check that the forth loop. (The one that traverses the good made sequences is executed less that or equal O(N^2) for all the pairs.

要证明，我将考虑两种情况：一种其中M >> N和其他其中M〜= N其中M是初始阵列的最大差值：M = S（N）-S（1）。

To demonstrate that, I will consider two cases: one where M>>N and other where M ~= N. Where M is the maximum difference of the initial array: M= S(n)-S(1).

有关第一种情况下，（M >> N）找到一个重合的概率为p = N / M。启动一个序列，它必须重合在第二和B + 1个元素，其中b是最好的序列的长度直到现在。因此，循环将进入倍。而这一系列的平均长度（假设无限系列）。这样的次该循环将被执行的总次数是。这是接近0当M >> N。这里的问题是，当M〜= N

For the first case, (M>>N) the probability to find a coincidence is p=N/M. To start a sequence, it must coincide the second and the b+1 element where b is the length of the best sequence until now. So the loop will enter times. And the average length of this series (supposing an infinite series) is . So the total number of times that the loop will be executed is . And this is close to 0 when M>>N. The problem here is when M~=N.

现在让我们考虑这种情况下，M〜= N。让我们考虑b是最好的序列长度直到现在。为的情况下A = K = 1，则该序列必须启动铌之前，所以序列的数量将是铌，和时间，这将去的循环将是最大的（Nb）的* B。

Now lets consider this case where M~=N. Lets consider that b is the best sequence length until now. For the case A=k=1, then the sequence must start before N-b, so the number of sequences will be N-b, and the times that will go for the loop will be a maximum of (N-b)*b.

有关A> 1和k = 1，我们可以推断其中，d是M / N（数字之间的平均距离）。如果我们对所有A的1至DN / B，那么我们看到的顶限：

For A>1 and k=1 we can extrapolate to where d is M/N (the average distance between numbers). If we add for all A’s from 1 to dN/b then we see a top limit of:

有关，其中k> = 2，我们看到的序列前必须启动的情况下，所以循环将进入一个平均并添加所有从1到牛顿/ K ^ B，它给出了一个上限

For the cases where k>=2, we see that the sequence must start before , So the loop will enter an average of and adding for all As from 1 to dN/k^b, it gives a limit of

下面，最坏的情况是，当b为最小。因为我们正在考虑的最小系列，让我们考虑为b的非常最坏情况= 2所以通行证对于给定的k中第四循环的数量将小于

Here, the worst case is when b is minimum. Because we are considering minimum series, lets consider a very worst case of b= 2 so the number of passes for the 4th loop for a given k will be less than

。

.

如果我们加上所有的k的2至无穷将是：

And if we add all k’s from 2 to infinite will be:

于是将所有的通行证对于k = 1和k> = 2，我们有一个最大的：

So adding all the passes for k=1 and k>=2, we have a maximum of:

注意，D = M / N = 1 / P。

Note that d=M/N=1/p.

因此，我们有两个限制，一说去无限时，D = 1 / P = M / N变为1等的去无限当D去无限。因此，我们的极限是最小两个，而最坏的情况是，当这两个equetions交叉。因此，如果我们解方程：

So we have two limits, One that goes to infinite when d=1/p=M/N goes to 1 and other that goes to infinite when d goes to infinite. So our limit is the minimum of both, and the worst case is when both equetions cross. So if we solve the equation:

我们可以看到，最大的是当d = 1.353

we see that the maximum is when d=1.353

因此，它是表明第四环路将被处理总共小于1.55N ^ 2倍。

So it is demonstrated that the forth loops will be processed less than 1.55N^2 times in total.

当然，这是平均情况下。对于最坏的情况下，我无法找到一个方法来产生一系列的第四环比O（N ^ 2）高，我坚信，他们不存在，但我不是一个数学家来证明这一点。

Of course, this is for the average case. For the worst case I am not able to find a way to generate series whose forth loop are higher than O(N^2), and I strongly believe that they does not exist, but I am not a mathematician to prove it.

旧答案

下面是在平均为O（（N ^ 2）* cube_root（M）），其中M是该阵列的第一个和最后一个元件之间的差别的溶液。和O存储器要求（M + N）。

Here is a solution in average of O((n^2)*cube_root(M)) where M is the difference between the first and last element of the array. And memory requirements of O(M+N).

1.-构造的长度为M的排列H，使得M [I - S [0]] =真，如果我存在初始阵列中，假如果它不存在

1.- Construct an array H of length M so that M[i - S[0]]=true if i exists in the initial array and false if it does not exist.

2：对于每一对数组S [J]，S [I]做的：

2.- For each pair in the array S[j], S[i] do:

2.1检查是否它可以是一种可能的解决方案，在第一和第三元件。要做到这一点，计算所有可能的A，满足方程ķ对S（I）= S（J）+ AK + AK ^ 2。检查<一href="http://stackoverflow.com/questions/18286012/given-a-natural-number-a-i-want-to-find-all-the-pairs-of-natural-numbers-b-c">this SO质疑，看看如何解决这个问题。并检查存在的第二个元素：S [I] + A * K

2.1 Check if it can be the first and third elements of a possible solution. To do so, calculate all possible A,K pairs that meet the equation S(i) = S(j) + AK + AK^2. Check this SO question to see how to solve this problem. And check that exist the second element: S[i]+ A*K

2.2检查还存在的元素一个位置进一步指出，最好的解决办法，我们有。举例来说，如果我们有到现在为止最好的解决办法就是4个元素，然后检查存在的元素A [J] + A * K + A * K ^ 2 + A * K ^ 3 + A * K ^ 4

2.2 Check also that exist the element one position further that the best solution that we have. For example, if the best solution that we have until now is 4 elements long then check that exist the element A[j] + A*K + A*K^2 + A*K^3 + A*K^4

2.3如果2.1和2.2是真实的，然后循环多久是这个系列并设置为bestSolution直到现在较长的最后一个。

2.3 If 2.1 and 2.2 are true, then iterate how long is this series and set as the bestSolution until now is is longer that the last.

下面是code在javascript：

Here is the code in javascript:

function getAKs(A) { if (A / 2 != Math.floor(A / 2)) return []; var solution = []; var i; var SR3 = Math.pow(A, 1 / 3); for (i = 1; i <= SR3; i++) { var B, C; C = i; B = A / (C * (C + 1)); if (B == Math.floor(B)) { solution.push([B, C]); } B = i; C = (-1 + Math.sqrt(1 + 4 * A / B)) / 2; if (C == Math.floor(C)) { solution.push([B, C]); } } return solution; } function getBestGeometricSequence(S) { var i, j, k; var bestSolution = []; var H = Array(S[S.length-1]-S[0]); for (i = 0; i < S.length; i++) H[S[i] - S[0]] = true; for (i = 0; i < S.length; i++) { for (j = 0; j < i; j++) { var PossibleAKs = getAKs(S[i] - S[j]); for (k = 0; k < PossibleAKs.length; k++) { var A = PossibleAKs[k][0]; var K = PossibleAKs[k][17]; var mustExistToBeBetter; if (K==1) { mustExistToBeBetter = S[j] + A * bestSolution.length; } else { mustExistToBeBetter = S[j] + A * K * (Math.pow(K,bestSolution.length) - 1)/(K-1); } if ((H[S[j] + A * K - S[0]]) && (H[mustExistToBeBetter - S[0]])) { var possibleSolution=[S[j],S[j] + A * K,S[i]]; exp = K * K * K; var NextVal = S[i] + A * exp; while (H[NextVal - S[0]] === true) { possibleSolution.push(NextVal); exp = exp * K; NextVal = NextVal + A * exp; } if (possibleSolution.length > bestSolution.length) { bestSolution = possibleSolution; } } } } } return bestSolution; } //var A= [ 1, 2, 3,5,7, 15, 27, 30,31, 81]; var A=[]; for (i=1;i<=3000;i++) { A.push(i); } var sol=getBestGeometricSequence(A); $("#result").html(JSON.stringify(sol));

您可以在这里检查code： http://jsfiddle.net/6yHyR/1/

You can check the code here: http://jsfiddle.net/6yHyR/1/

我保持其他的解决办法，因为我认为它仍然是好于当n M是非常大的。

I maintain the other solution because I believe that it is still better when M is very big compared to N.

这篇关于发现长模式的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！