算法问题的文件搜索索引 [英] Algorithm Question on File Search Indexing

查看:93
本文介绍了算法问题的文件搜索索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有一个问题,我有解决的办法也。但我不明白的解决方案。请帮助一些组例子和淋浴的一些经验。

  

问题

     

由于含有大约3亿的社会保障号码(9位数字),发现一个9位数的数字,是不是在文件中的文件。你有无限的硬盘空间的RAM,但只有2MB在您的处置。

     

答案

     

在第一步骤中,我们建立一个数组2 ^ 16被初始化为0,并在文件中的每个数字,我们以它的16个最高显著位来索引到该阵列并递增数目

的整数      

由于有小于2 ^ 32号的文件中,存在势必会小于2 ^ 16阵列中的(至少)一个号码。这告诉我们,有至少一个数字的可能数目与高位比特之间缺少

     

在第二遍,我们只能只专注于满足这一标准,用大小为2的位向量^ 16,以确定丢失号码的一个数字。

解决方案

为使说明更简单,让我们假设你有两个数字,每一个数字是介于0和3的列表,但你不能放过16位记为各16可能数量,是否已经遇到它。你要做的就是创建一个数组 4 3位整数和 A [1] ,您存储多少个号码的第一个数字你遇到过。 (二位整数是不够的,因为你需要的值0,4和它们之间的所有数字。)

如果你有文件

  

00,12,03,31,01,32,02

您的阵列是这样的:

  

4,1,0,2

现在你知道,从0开始的所有数字是在该文件中,但对于每个剩余的,有至少一个缺少的。让我们挑1.我们知道有至少一个数字从1开始,是不是在文件中。因此,产生4位的阵列,每个数字从1开始设置相应位,并在最后,挑了未设置位之一,在我们的例子中,如果可能是0。现在我们有解决方案:10

在这种情况下,使用该方法是12位和16位之间的差。随着你的号码,这是32 KB和119 MB之间的差异。

There is one question and I have the solution to it also. But I couldn't understand the solution. Kindly help with some set of examples and shower some experience.

Question

Given a file containing roughly 300 million social security numbers (9-digit numbers), find a 9-digit number that is not in the file. You have unlimited drive space but only 2MB of RAM at your disposal.

Answer

In the first step, we build an array 2^16 integers that is initialized to 0 and for every number in the file, we take its 16 most significant bits to index into this array and increment the number.

Since there are less than 2^32 numbers in the file, there is bound to be (at least) one number in the array that is less than 2^16. This tells us that there is at least one number missing among the possible numbers with those upper bits.

In the second pass, we can focus only only on the numbers that match this criterion and use a bit-vector of size 2^16 to identify one of the missing numbers.

解决方案

To make the explanation simpler, let's say you have a list of two-digit numbers, where each digit is between 0 and 3, but you can't spare the 16 bits to remember for each of the 16 possible numbers, whether you have already encountered it. What you do is to create an array a of 4 3-bit integers and in a[i], you store how many numbers with the first digit i you encountered. (Two-bit integers wouldn't be enough, because you need the values 0, 4 and all numbers between them.)

If you had the file

00, 12, 03, 31, 01, 32, 02

your array would look like this:

4, 1, 0, 2

Now you know that all numbers starting with 0 are in the file, but for each of the remaining, there is at least one missing. Let's pick 1. We know there is at least one number starting with 1 that is not in the file. So, create an array of 4 bits, for each number starting with 1 set the appropriate bit and in the end, pick one of the bits that wasn't set, in our example if could be 0. Now we have the solution: 10.

In this case, using this method is the difference between 12 bits and 16 bits. With your numbers, it's the difference between 32 kB and 119 MB.

这篇关于算法问题的文件搜索索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆