查找排序整型数组的交集组 [英] Find intersection group of sorted integer arrays

查看:99
本文介绍了查找排序整型数组的交集组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

让我们有一些短期的整数数组排序的,我们需要找到路口等于或多于predefined不变。 这里是code和它说明什么,我想要做的更好的话,我可以用言语解释。 现在的问题是速度。我的code正在非常缓慢。这需要对2000的元素阵列约15秒(在我的机器很慢)。 Ofcourse我可以实现我自己的正交法和parallize code,但它给出了一个非常有限的改善。执行时间成长为N ^ 2或东西,已经为500K阵列它需要一个非常非常长的时间。所以,我怎么可以重写算法的性能比较好?我不仅限于C#语言,也许CPU或GPU拥有这样的工作好特别说明。

 示例:

输入:
1,3,7,8
2,3,8,10
3,10,11,12,13,14

minSupport = 1

输出:

1和2:2,8
1和3:3
2和3:3,10
 


  VAR minSupport = 2;
    VAR随机=新的随机(DateTime.Now.Millisecond);

    //数字是每个数组都是独一无二的
    VAR sortedArrays = Enumerable.Range(0,2000)
    。选择(X => Enumerable.Range(0,30)。选择(T => random.Next(1000))是不同的()
    .ToList())了ToList()。
    VAR的结果=新名单,其中,INT []>();
    VAR resultIntersection =新的名单,其中,名单,其中,INT>>();

    的foreach(在sortedArrays VAR阵列)
    {
        的Array.Sort();
    }

    变种SW = Stopwatch.StartNew();

    //****主要部分*****//

    的for(int i = 0; I< sortedArrays.Count-1;我++)
    {
        对于(INT J = I + 1; J< sortedArrays.Count; J ++)
        {
            VAR相交= sortedArrays [I] .Intersect(sortedArrays [J])了ToList()。
            如果(intersect.Count()> = minSupport)
            {
                result.Add(新[] {I,J});
                resultIntersection.Add(交叉);
            }
        }
    }

    // ***************** //

    sw.Stop();

    Console.WriteLine(sw.Elapsed);
 

编辑:

现在大约需要9秒VS 15秒,老算法在2000元。嗯... ofcourse它的速度不够快。

  // ****主要部分***** //

    //这个号码(最多值阵列包含)是已知的
    VAR包括maxValue = 1000;

    VAR reverseIndexDict =新字典< INT,列表和LT; INT>>();

    的for(int i = 0; I< maxValue(最大值);我++)
    {
        reverseIndexDict [我] =新的名单,其中,INT>();
    }

    的for(int i = 0; I< sortedArrays.Count;我++)
    {
        对于(INT J = 0; J< sortedArrays [I] .Count之间; J ++)
        {
            。reverseIndexDict [sortedArrays [I] [J]加入(我);
        }
    }

    VAR tempArr =新的名单,其中,INT>();
    的for(int i = 0; I< sortedArrays.Count;我++)
    {
        tempArr.Clear();
        对于(INT J = 0; J< sortedArrays [I] .Count之间; J ++)
        {
            tempArr.AddRange(reverseIndexDict [J]);
        }

        result.AddRange(tempArr.GroupBy(X => X)。凡(X => x.Count()> = minSupport)。选择(X =>新建[] {我,x.Key})。了ToList());

    }

    结果= result.Where(X =>!×〔0] = X [1])。了ToList();


    的for(int i = 0; I< result.Count;我++)
    {
        resultIntersection.Add(sortedArrays [结果[I] [0]相交(sortedArrays [结果[I] [1])了ToList());
    }



    // ***************** //
 

编辑:

有些improvent。

  // ****主要部分***** //

    //这个号码(最多值阵列包含)是已知的
    VAR包括maxValue = 1000;

    VAR reverseIndexDict =新的名单,其中,INT> maxValue(最大值)];

    的for(int i = 0; I< maxValue(最大值);我++)
    {
        reverseIndexDict [我] =新的名单,其中,INT>();
    }

    的for(int i = 0; I< sortedArrays.Count;我++)
    {
        对于(INT J = 0; J< sortedArrays [I] .Count之间; J ++)
        {
            。reverseIndexDict [sortedArrays [I] [J]加入(我);
        }
    }



    的for(int i = 0; I< sortedArrays.Count;我++)
    {
        VAR tempArr =新字典< INT,列表和LT; INT>>();

        对于(INT J = 0; J< sortedArrays [I] .Count之间; J ++)
        {
            VAR sortedArraysij = sortedArrays [I] [J]。


            对于(INT K = 0; K< reverseIndexDict [sortedArraysij] .Count之间; k ++)
            {
                如果(!tempArr.ContainsKey(reverseIndexDict [sortedArraysij] [K]))
                {
                    tempArr [reverseIndexDict [sortedArraysij] [K] =新的[] {} sortedArraysij .ToList();
                }
                其他
                {
                   tempArr [reverseIndexDict [sortedArraysij] [K]添加(sortedArrays [I] [J])。
                }

            }
        }


        对于(INT J = 0; J< reverseIndexDict.Length; J ++)
        {
            如果(reverseIndexDict [J] .Count之间的> = minSupport)
            {
                result.Add(新[] {I,J});
                resultIntersection.Add(reverseIndexDict [J]);
            }
        }

    }

    //这里我们要过滤集合

    // ***************** //
 

解决方案

有两种解决方法:

  1. 让我们假设你有3个排序的数组,你必须找到它们之间的交叉点。遍历第一阵列和运行上的两个阵列的其余部分在第一阵列的元件的二进制搜索。如果在两个列表中的相应的二进制搜索了积极的,然后增加路口的计数器。

     的结果=名单
    在数组1元素:
        STATUS1 =的binarySearch(元素,ARRAY2)
        状态2 =的binarySearch(元素,ARRAY2)
        状态=状态和放大器;状态
        如果状态==真:
            算上++
            如果count == MAX_INTERSECTION:
                result.append(元)
                打破
     

    时间复杂度:N * M *日志(N),
    在这里,
    N =数组中的元素数
    M =数组数

  2. 该解决方案仅如果数组数为正整数。计算最大和最小数目在所有的排序完毕数组的总元素。因为它是排序的,我们可以通过测量给定的排序阵列的开始和结束元素确定它。让最多的是最大和最小的数字是分钟。创建规模最大的一个阵列 - 分钟,用零填充它。让我们假设你有3个数组,现在就开始遍历第一阵列并进入相应的指标,并增加在pviously创建的数组的$ P $ a的值。如下所述:

     元素是5阵列1中,New_array [5] + = 1
     

    遍历所有三个已排序的列表,并执行上述操作。在结束遍历new_array查找值等于3,这些索引的交集的结果。

    时间复杂度:O(N)+ O(N)+ ... = O(N)
    空间复杂度:O(maximum_element - minimum_element)
    在这里,
    N =数组中的元素数量。

Let's we have some integer short sorted arrays and we need to find intersection equal or more then predefined constant. Here is code and it demonstrates what i want to do better then i can explain it in words. The problem is SPEED. My code is working very slow. It takes about 15 sec on 2000 elements array(on my slow machine). Ofcourse i can implement my own intersection method and parallize code but it give a very limited improvement. Execution time growing as N^2 or something and already for 500k arrays it takes a very very long time. So how can i rewrite algorithm for better perfomance? I am not limited c# language maybe CPU or GPU has good special instructions for such job.

Example:

Input:
1,3,7,8
2,3,8,10
3,10,11,12,13,14

minSupport = 1

Output:

1 and 2: 2, 8
1 and 3: 3
2 and 3: 3, 10


    var minSupport = 2;
    var random = new Random(DateTime.Now.Millisecond);

    // Numbers is each array are unique
    var sortedArrays = Enumerable.Range(0,2000)
    .Select(x => Enumerable.Range(0,30).Select(t => random.Next(1000)).Distinct()
    .ToList()).ToList();
    var result = new List<int[]>();
    var resultIntersection = new List<List<int>>();

    foreach (var array in sortedArrays)
    {
        array.Sort();
    }

    var sw = Stopwatch.StartNew();

    //****MAIN PART*****//

    for (int i = 0; i < sortedArrays.Count-1; i++)
    {
        for (int j = i+1; j < sortedArrays.Count; j++)
        {
            var intersect = sortedArrays[i].Intersect(sortedArrays[j]).ToList();
            if(intersect.Count()>=minSupport)
            {
                result.Add( new []{i,j});
                resultIntersection.Add(intersect);
            }
        }
    }

    //*****************//

    sw.Stop();

    Console.WriteLine(sw.Elapsed);

EDIT:

Now it takes about 9 sec vs 15 sec with old algorithm on 2000 elements. Well...ofcourse it is not fast enough.

//****MAIN PART*****//

    // This number(max value which array can contains) is known
    var maxValue = 1000;

    var reverseIndexDict = new Dictionary<int,List<int>>();

    for (int i = 0; i < maxValue; i++)
    {
        reverseIndexDict[i] = new List<int>();
    }

    for (int i = 0; i < sortedArrays.Count; i++)
    {
        for (int j = 0; j < sortedArrays[i].Count; j++)
        {
            reverseIndexDict[sortedArrays[i][j]].Add(i);
        }
    }

    var tempArr = new List<int>();
    for (int i = 0; i < sortedArrays.Count; i++)
    {
        tempArr.Clear();
        for (int j = 0; j < sortedArrays[i].Count; j++)
        {
            tempArr.AddRange(reverseIndexDict[j]);
        }

        result.AddRange(tempArr.GroupBy(x => x).Where(x => x.Count()>=minSupport).Select(x => new[]{i,x.Key}).ToList());

    }

    result = result.Where(x => x[0]!=x[1]).ToList();


    for (int i = 0; i < result.Count; i++)
    {
        resultIntersection.Add(sortedArrays[result[i][0]].Intersect(sortedArrays[result[i][1]]).ToList());
    }



    //*****************//

EDIT:

Some improvent.

//****MAIN PART*****//

    // This number(max value which array can contains) is known
    var maxValue = 1000;

    var reverseIndexDict = new List<int>[maxValue];

    for (int i = 0; i < maxValue; i++)
    {
        reverseIndexDict[i] = new List<int>();
    }

    for (int i = 0; i < sortedArrays.Count; i++)
    {
        for (int j = 0; j < sortedArrays[i].Count; j++)
        {
            reverseIndexDict[sortedArrays[i][j]].Add(i);
        }
    }



    for (int i = 0; i < sortedArrays.Count; i++)
    {
        var tempArr = new Dictionary<int, List<int>>();

        for (int j = 0; j < sortedArrays[i].Count; j++)
        {
            var sortedArraysij = sortedArrays[i][j];


            for (int k = 0; k < reverseIndexDict[sortedArraysij].Count; k++)
            {
                if(!tempArr.ContainsKey(reverseIndexDict[sortedArraysij][k]))
                {
                    tempArr[reverseIndexDict[sortedArraysij][k]] = new[]{sortedArraysij}.ToList();
                }
                else
                {
                   tempArr[reverseIndexDict[sortedArraysij][k]].Add(sortedArrays[i][j]);
                }

            }
        }


        for (int j = 0; j < reverseIndexDict.Length; j++)
        {
            if(reverseIndexDict[j].Count>=minSupport)
            {
                result.Add(new[]{i,j});
                resultIntersection.Add(reverseIndexDict[j]);
            }
        }

    }

    // and here we are filtering collections

    //*****************//

解决方案

There are two solutions:

  1. Let us suppose you have 3 sorted arrays and you have to find the intersection between them. Traverse the first array and run a binary search on the rest of the two arrays for the element in first array. If the respective binary search on two list gave positive, then increment the counter of intersection.

    result = List
    for element in Array1:
        status1 = binarySearch(element, Array2)
        status2 = binarySearch(element, Array2)
        status = status & status
        if status == True:
            count++
            if count == MAX_INTERSECTION:
                result.append(element)
                break
    

    Time Complexity : N * M * Log(N),
    where,
    N = Number of element in the array
    M = Number of arrays

  2. This solution works only if the number in the arrays are positive integers. Calculate the maximum and the minimum number out of the total elements in all the sorted arrays. As it is sorted, we can determine it by surveying the start and end element of the sorted arrays given. Let the greatest number be max and the lowest number be min. Create an array of size max - min and fill it with zero. Let us suppose you have 3 Arrays, now start traversing the first array and and go to the respective index and increment the value in the previously created array. As mentioned below:

    element is 5 in Array 1, the New_array[5]+=1
    

    Traverse all the three sorted list and perform the operation mentioned above. At the end traverse the new_array and look for value equal to 3, these indexes are the intersection result.

    Time Complexity : O(N) + O(N) + .. = O(N)
    Space Complexity : O(maximum_element - minimum_element)
    where,
    N = number of elements in the array.

这篇关于查找排序整型数组的交集组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆