有效地生成唯一的整数对 [英] Efficiently generating unique pairs of integers

查看:160
本文介绍了有效地生成唯一的整数对的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在MATLAB中,我想在 [1,m] 范围内生成 n 对随机整数,每对都是独一无二的。为了唯一性,我认为对中数字的顺序是无关紧要的,因此 [3,10] 等于 [10,3]
此外,每对应由两个不同的整数组成;即 [3,4] 即可,但 [3,3] 将被拒绝。
编辑:每个可能的对应该选择具有相同的可能性。



(显然对参数的约束是 n< = m(m-1)/ 2 。)



我能够在<$时成功执行此操作c $ c> m 很小,如下:

  m = 500; n = 10; %设置参数

A =((1:m)'* ones(1,m));每列的%具有数字1 - > m
idxs1 = squareform(tril(A', - 1))';
idxs2 = squareform(tril(A,-1))';
all_pairs = [idxs1,idxs2]; %这包含所有可能的对

idx_to_use = randperm(size(all_pairs,1),n); %选择随机n对
对= all_pairs(idx_to_use,:)

对=

254 414
247 334
111 146
207 297
45 390
229 411
9 16
75 395
12 338
25 442

然而,矩阵 A 的大小 mxm ,意思是当 m 变大(例如超过10,000)时,MATLAB内存不足。



<我考虑生成一堆随机数 randi(m,[n,2]),并反复拒绝重复的行,但我担心陷入困境 n 接近 m(m-1)/ 2 时的循环。



是否有更简单,更清晰的方法来生成唯一的不同整数对?

解决方案

以正确的方式查看时容易,轻松。



你希望生成n对整数,[p,q],这样p和q位于区间[1,m]和p

有多少个可能的对?对的总数仅为m *(m-1)/ 2。 (即,从1到m-1的数字之和。)



因此我们可以生成范围为[1,m *(m-1)的n个随机整数)/ 2]。 Randperm做得很好。 (较旧的matlab版本不允许第二个参数为randperm。)

  k = randperm(m / 2 *(m-1) )中,n); 

(请注意,我用滑稽的方式用m写了这个表达式,或许除以2一个奇怪的地方。这可以避免某些m值接近上限的精度问题。)



现在,如果我们将每个可能的对[p,q]与其中一个相关联在k中的整数,我们可以向后工作,从k中生成的整数到一对[p,q]。因此,该列表中的前几对是:

  {[1,2],[1,3],[2, 3],[1,4],[2,4],[3,4],...,[m-1,m]} 

我们可以把它们看作是一个严格的上三角形数组中的元素,大小为m乘m,因此那些元素位于主对角线上方。

  q = floor(sqrt(8 *(k-1)+ 1)/ 2 + 1/2); 
p = k - q。*(q-1)/ 2;

看到这些公式从k中的展开元素中恢复p和q。我们可以说服自己这确实有效,但也许一个简单的方法就是这个测试:

  k = 1:21 ; 
q = floor(sqrt(8 *(k-1)+ 1)/ 2 + 3/2);
p = k - (q-1)。*(q-2)/ 2;
[k; p; q]'

ans =
1 1 2
2 1 3
3 2 3
4 1 4
5 2 4
6 3 4
7 1 5
8 2 5
9 3 5
10 4 5
11 1 6
12 2 6
13 3 6
14 4 6
15 5 6
16 1 7
17 2 7
18 3 7
19 4 7
20 5 7
21 6 7

另一种方式测试它是为了表明所有对都是为一个小案例生成的。

  m = 5; 
n = 10;
k = randperm(m / 2 *(m-1),n);
q = floor(sqrt(8 *(k-1)+ 1)/ 2 + 3/2);
p = k - (q-1)。*(q-2)/ 2;

sortrows([p; q]',[2 1])$ ​​b $ b ans =
1 2
1 3
2 3
1 4
2 4
3 4
1 5
2 5
3 5
4 5

是的,看起来一切都很完美。现在尝试使用m和n的一些大数来测试使用的时间。

  tic 
m = 1e6;
n = 100000;
k = randperm(m / 2 *(m-1),n);
q = floor(sqrt(8 *(k-1)+ 1)/ 2 + 3/2);
p = k - (q-1)。*(q-2)/ 2;
toc

经过的时间是0.014689秒。

此方案适用于大约1e8的m,因为双精度错误导致失败精确。在m / 2 *(m-1)超过2 ^ 53之前,精确限制应为m不大于134217728。一个很好的功能是不需要拒绝重复对。


In MATLAB, I would like to generate n pairs of random integers in the range [1, m], where each pair is unique. For uniqueness, I consider the order of the numbers in the pair to be irrelevant such that [3, 10] is equal to [10, 3]. Also, each pair should consist of two distinct integers; i.e. [3, 4] is ok but [3, 3] would be rejected. EDIT: Each possible pair should be chosen with equal likelihood.

(Obviously a constraint on the parameters is that n <= m(m-1)/2.)

I have been able to successfully do this when m is small, like so:

m = 500; n = 10;                   % setting parameters

A = ((1:m)'*ones(1, m));           % each column has the numbers 1 -> m
idxs1 = squareform(tril(A', -1))'; 
idxs2 = squareform(tril(A, -1))';   
all_pairs = [idxs1, idxs2];        % this contains all possible pairs

idx_to_use = randperm( size(all_pairs, 1), n );  % choosing random n pairs
pairs = all_pairs(idx_to_use, :)       

pairs =

   254   414
   247   334
   111   146
   207   297
    45   390
   229   411
     9    16
    75   395
    12   338
    25   442

However, the matrix A is of size m x m, meaning when m becomes large (e.g. upwards of 10,000), MATLAB runs out of memory.

I considered generating a load of random numbers randi(m, [n, 2]), and repeatedly rejecting the rows which repeated, but I was concerned about getting stuck in a loop when n was close to m(m-1)/2.

Is there an easier, cleaner way of generating unique pairs of distinct integers?

解决方案

Easy, peasy, when viewed in the proper way.

You wish to generate n pairs of integers, [p,q], such that p and q lie in the interval [1,m], and p

How many possible pairs are there? The total number of pairs is just m*(m-1)/2. (I.e., the sum of the numbers from 1 to m-1.)

So we could generate n random integers in the range [1,m*(m-1)/2]. Randperm does this nicely. (Older matlab releases do not allow the second argument to randperm.)

k = randperm(m/2*(m-1),n);

(Note that I've written this expression with m in a funny way, dividing by 2 in perhaps a strange place. This avoids precision problems for some values of m near the upper limits.)

Now, if we associate each possible pair [p,q] with one of the integers in k, we can work backwards, from the integers generated in k, to a pair [p,q]. Thus the first few pairs in that list are:

{[1,2], [1,3], [2,3], [1,4], [2,4], [3,4], ..., [m-1,m]}

We can think of them as the elements in a strictly upper triangular array of size m by m, thus those elements above the main diagonal.

q = floor(sqrt(8*(k-1) + 1)/2 + 1/2);
p = k - q.*(q-1)/2;

See that these formulas recover p and q from the unrolled elements in k. We can convince ourselves that this does indeed work, but perhaps a simple way here is just this test:

k = 1:21;
q = floor(sqrt(8*(k-1) + 1)/2 + 3/2);
p = k - (q-1).*(q-2)/2;
[k;p;q]'

ans =
     1     1     2
     2     1     3
     3     2     3
     4     1     4
     5     2     4
     6     3     4
     7     1     5
     8     2     5
     9     3     5
    10     4     5
    11     1     6
    12     2     6
    13     3     6
    14     4     6
    15     5     6
    16     1     7
    17     2     7
    18     3     7
    19     4     7
    20     5     7
    21     6     7

Another way of testing it is to show that all pairs get generated for a small case.

m = 5;
n = 10;
k = randperm(m/2*(m-1),n);
q = floor(sqrt(8*(k-1) + 1)/2 + 3/2);
p = k - (q-1).*(q-2)/2;

sortrows([p;q]',[2 1])
ans =
     1     2
     1     3
     2     3
     1     4
     2     4
     3     4
     1     5
     2     5
     3     5
     4     5

Yup, it looks like everything works perfectly. Now try it for some large numbers for m and n to test the time used.

tic
m = 1e6;
n = 100000;
k = randperm(m/2*(m-1),n);
q = floor(sqrt(8*(k-1) + 1)/2 + 3/2);
p = k - (q-1).*(q-2)/2;
toc

Elapsed time is 0.014689 seconds.

This scheme will work for m as large as roughly 1e8, before it fails due to precision errors in double precision. The exact limit should be m no larger than 134217728 before m/2*(m-1) exceeds 2^53. A nice feature is that no rejection for repeat pairs need be done.

这篇关于有效地生成唯一的整数对的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆