生成一个不属于给定 40 亿的整数 [英] Generate an integer that is not among four billion given ones

查看:26
本文介绍了生成一个不属于给定 40 亿的整数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我收到了这个面试问题:

<块引用>

给定一个包含 40 亿个整数的输入文件,提供一个算法来生成一个不包含在文件中的整数.假设您有 1 GB 内存.如果您只有 10MB 的内存,请跟进您会怎么做.

我的分析:

文件大小为 4×109×4 字节 = 16GB.

我们可以进行外部排序,从而让我们知道整数的范围.

我的问题是在已排序的大整数集中检测缺失整数的最佳方法是什么?

我的理解(阅读所有答案后):

假设我们谈论的是 32 位整数,则有 232 = 4*109 个不同的整数.

案例 1:我们有 1GB = 1 * 109 * 8 位 = 80 亿位内存.

解决方案:

如果我们使用一位代表一个不同的整数,就足够了.我们不需要排序.

实施:

int radix = 8;字节[]位域=新字节[0xffffffff/基数];void F() 抛出 FileNotFoundException{扫描仪 in = new Scanner(new FileReader("a.txt"));while(in.hasNextInt()){int n = in.nextInt();位域[n/radix] |= (1 << (n%radix));}for(int i = 0; i< bitfield.lenght; i++){for(int j =0; j

案例 2:10MB 内存 = 10 * 106 * 8 位 = 8000 万位

<块引用>

解决方案:

对于所有可能的 16 位前缀,有 216 个整数 = 65536,我们需要 216 * 4 * 8 = 200 万位.我们需要构建 65536 个桶.对于每个桶,我们需要 4 个字节来保存所有的可能性,因为最坏的情况是所有 40 亿个整数都属于同一个桶.

  1. 通过第一次通过文件构建每个存储桶的计数器.
  2. 扫描桶,找到第一个命中少于 65536 的桶.
  3. 构建新的存储桶,其高 16 位前缀是我们在步骤 2 中找到的通过文件的第二遍
  4. 扫描step3中构建的bucket,找到第一个没有的bucket成功了.

代码与上面的非常相似.

结论:我们通过增加文件传递来减少内存.

<小时>

对那些迟到的人的澄清:正如所问的,这个问题并没有说文件中没有包含一个整数——至少大多数人不是这样解释它的.不过,评论线程中的许多评论都与任务的这种变化有关.不幸的是,将其引入到评论线程的评论后来被其作者删除,所以现在看来​​,对它的孤立回复只是误解了一切.非常混乱,抱歉.

解决方案

假设整数"表示 32 位:10 MB 的空间足以让您数一数有多少个数字在具有任何给定 16 位前缀的输入文件中,一次通过输入文件的所有可能的 16 位前缀.至少有一个桶被命中的次数少于 216 次.再做一遍以找出该桶中哪些可能的数字已被使用.

如果这意味着超过 32 位,但仍然是有界大小:执行上述操作,忽略所有碰巧落在(有符号或无符号;您的选择)32 位范围之外的输入数字.

如果整数"表示数学整数:通读输入一次并跟踪您见过的最长数字的最大数字长度.完成后,输出最大值加一一个多一位的随机数.(文件中的一个数字可能是一个需要超过 10 MB 才能准确表示的 bignum,但是如果输入是一个文件,那么您至少可以表示任何适合的长度它).

I have been given this interview question:

Given an input file with four billion integers, provide an algorithm to generate an integer which is not contained in the file. Assume you have 1 GB memory. Follow up with what you would do if you have only 10 MB of memory.

My analysis:

The size of the file is 4×109×4 bytes = 16 GB.

We can do external sorting, thus letting us know the range of the integers.

My question is what is the best way to detect the missing integer in the sorted big integer sets?

My understanding (after reading all the answers):

Assuming we are talking about 32-bit integers, there are 232 = 4*109 distinct integers.

Case 1: we have 1 GB = 1 * 109 * 8 bits = 8 billion bits memory.

Solution:

If we use one bit representing one distinct integer, it is enough. we don't need sort.

Implementation:

int radix = 8;
byte[] bitfield = new byte[0xffffffff/radix];
void F() throws FileNotFoundException{
    Scanner in = new Scanner(new FileReader("a.txt"));
    while(in.hasNextInt()){
        int n = in.nextInt();
        bitfield[n/radix] |= (1 << (n%radix));
    }

    for(int i = 0; i< bitfield.lenght; i++){
        for(int j =0; j<radix; j++){
            if( (bitfield[i] & (1<<j)) == 0) System.out.print(i*radix+j);
        }
    }
}

Case 2: 10 MB memory = 10 * 106 * 8 bits = 80 million bits

Solution:

For all possible 16-bit prefixes, there are 216 number of integers = 65536, we need 216 * 4 * 8 = 2 million bits. We need build 65536 buckets. For each bucket, we need 4 bytes holding all possibilities because the worst case is all the 4 billion integers belong to the same bucket.

  1. Build the counter of each bucket through the first pass through the file.
  2. Scan the buckets, find the first one who has less than 65536 hit.
  3. Build new buckets whose high 16-bit prefixes are we found in step2 through second pass of the file
  4. Scan the buckets built in step3, find the first bucket which doesnt have a hit.

The code is very similar to above one.

Conclusion: We decrease memory through increasing file pass.


A clarification for those arriving late: The question, as asked, does not say that there is exactly one integer that is not contained in the file—at least that's not how most people interpret it. Many comments in the comment thread are about that variation of the task, though. Unfortunately the comment that introduced it to the comment thread was later deleted by its author, so now it looks like the orphaned replies to it just misunderstood everything. It's very confusing, sorry.

解决方案

Assuming that "integer" means 32 bits: 10 MB of space is more than enough for you to count how many numbers there are in the input file with any given 16-bit prefix, for all possible 16-bit prefixes in one pass through the input file. At least one of the buckets will have be hit less than 216 times. Do a second pass to find of which of the possible numbers in that bucket are used already.

If it means more than 32 bits, but still of bounded size: Do as above, ignoring all input numbers that happen to fall outside the (signed or unsigned; your choice) 32-bit range.

If "integer" means mathematical integer: Read through the input once and keep track of the largest number length of the longest number you've ever seen. When you're done, output the maximum plus one a random number that has one more digit. (One of the numbers in the file may be a bignum that takes more than 10 MB to represent exactly, but if the input is a file, then you can at least represent the length of anything that fits in it).

这篇关于生成一个不属于给定 40 亿的整数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆