比较两个排序INT数组 [英] Comparing two sorted int arrays
问题描述
我有几百万固定大小(100),INT阵列。每个数组进行排序,并具有独特的元素。对于每个阵列,我想找到这有70%的通用元素所有阵列。现在,我得到大约100万的比较(使用Arrays.binarySearch())每秒,这对我们来说是太慢了。
I have millions of fixed-size (100) int arrays. Each array is sorted and has unique elements. For each array, I want to find all arrays which have 70% common elements. Right now I am getting around 1 million comparisons (using Arrays.binarySearch()) per second, which is too slow for us.
谁能推荐一个更好的搜索算法?
Can anyone recommend a better searching algorithm?
推荐答案
这样的事情应该做的工作(前提是数组排序,并含有独特的元素):
Something like this should do the job (provided that the arrays are sorted and contain unique elements):
public static boolean atLeastNMatchingElements(final int n,
final int[] arr1,
final int[] arr2){
/* check assumptions */
assert (arr1.length == arr2.length);
final int arrLength = arr2.length;
{ /* optimization */
final int maxOffset = Math.max(arrLength - n, 0);
if(arr1[maxOffset] < arr2[0] || arr2[maxOffset] < arr1[0]){
return false;
}
}
int arr2Offset = 0;
int matches = 0;
/* declare variables only once, outside loop */
int compResult; int candidate;
for(int i = 0; i < arrLength; i++){
candidate = arr1[i];
while(arr2Offset < arrLength - 1){
compResult = arr2[arr2Offset] - candidate;
if(compResult > 0){
break;
} else{
arr2Offset++;
if(compResult == 0){
matches++;
break;
}
}
}
if(matches == n){
return true;
}
/* optimization */
else if(matches < n - (arrLength - arr2Offset)){
return false;
}
}
return false;
}
示例用法:
public static void main(final String[] args){
final int[] arr1 = new int[100];
final int[] arr2 = new int[100];
int x = 0, y = 0;
for(int i = 0; i < 100; i++){
if(i % 10 == 0){
x++;
}
if(i % 12 == 0){
y++;
}
arr1[i] = x;
arr2[i] = y;
x++;
y++;
}
System.out.println(atLeastNMatchingElements(70, arr1, arr2));
System.out.println(atLeastNMatchingElements(95, arr1, arr2));
}
输出:
真正的结果
假
true
false
我现在试图优化上述code。请检查code块是否标记为
Premature Optimizations™
I have now tried to optimize the above code. Please check whether the code blocks marked as
/* optimization */
做一个明显的区别。优化后,我会重构code把它降低到一个或两个return语句。
make a noticeable difference. After optimization, I would refactor the code to get it down to one or two return statements.
这篇关于比较两个排序INT数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!