C库为COM pressing连续的正整数 [英] C Library for compressing sequential positive integers

查看：150 发布时间：2016/8/19 16:19:20 c database data-structures encoding compression

本文介绍了C库为COM pressing连续的正整数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个在磁盘阵列的字符串创建索引的很常见的问题。总之，我需要每个字符串的位置存储在中盘重新presentation。例如，如下一个非常幼稚的解决办法是一个索引数组：

I have the very common problem of creating an index for an in-disk array of strings. In short, I need to store the position of each string in the in-disk representation. For example, a very naive solution would be an index array as follows:

UINT64 IDX [] = {0，20，500，1024 ...，103434};

uint64 idx[] = { 0, 20, 500, 1024, ..., 103434 };

这表示，所述第一字符串是在位置0，第二个20位，在500位的第三和103434位的第n个

Which says that the first string is at position 0, the second at position 20, the third at position 500 and the nth at position 103434.

的位置始终是按顺序的非负64比特整数。虽然数字可以通过任何差别有所不同，在实践中我希望典型的区别是范围之内，从2 ^ 8比2 ^ 20。我预计该指数在内存中mmap'ed和职位将随机访问（假设均匀分布）。

The positions are always non-negative 64 bits integers in sequential order. Although the numbers could vary by any difference, in practice I expect the typical difference to be inside the range from 2^8 to 2^20. I expect this index to be mmap'ed in memory, and the positions will be accessed randomly (assume uniform distribution).

我在想我自己写code做某种块增量编码或其他更复杂的编码方式，但也有编码/解码速度和空间之间的许多不同的取舍，我宁愿得到图书馆的工作为出发点，甚至定居的东西没有任何自定义。

I was thinking about writing my own code for doing some sort of block delta encoding or other more sophisticated encoding, but there are so many different trade-offs between encoding/decoding speed and space that I would rather get a working library as a starting point and maybe even settle for something without any customizations.

任何提示？ C库将是理想的，但在C ++人们也将让我跑了一些初步的基准。

Any hints? A c library would be ideal, but a c++ one would also allow me to run some initial benchmarks.

一个一些细节如果你还在下面。这将被用来建立类似于国家开发银行（ http://cr.yp.to/cdb/cdbmake.html <库/ A>）顶部的库CMPH（ http://cmph.sf.net ）。总之，它是一个基于读取内存中的一个小指标只关联地图的大磁盘。

A few more details if you are still following. This will be used to build a library similar to cdb (http://cr.yp.to/cdb/cdbmake.html) on top the library cmph (http://cmph.sf.net). In short, it is for a large disk based read only associative map with a small index in memory.

既然是一个图书馆，我没有在输入控件，但我要优化典型的用例有数亿值的，在几千字节范围平均值的大小，并在2 ^ 31最大值

Since it is a library, I don't have control over input, but the typical use case that I want to optimize have millions of hundreds of values, average value size in the few kilobytes ranges and maximum value at 2^31.

有关的记录，如果我没有找到准备用我打算实施的64个整数与指定块到目前为止抵消初始字节的块增量编码库。该块本身将与树索引，让我O（日志（N / 64））访问时间。有太多的其他选择，我会preFER不讨论这些问题。我真的很期待准备使用code，而不是如何实现编码的想法。我会很高兴和大家一起分享我做什么，一旦我有工作。

For the record, if I don't find a library ready to use I intend to implement delta encoding in blocks of 64 integers with the initial bytes specifying the block offset so far. The blocks themselves would be indexed with a tree, giving me O(log (n/64)) access time. There are way too many other options and I would prefer to not discuss them. I am really looking forward ready to use code rather than ideas on how to implement the encoding. I will be glad to share with everyone what I did once I have it working.

我AP preciate你们的帮助，让我知道，如果您有任何疑问。

I appreciate your help and let me know if you have any doubts.

C库为COM pressing连续的正整数 [英] C Library for compressing sequential positive integers

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录关闭

C库为COM pressing连续的正整数 [英] C Library for compressing sequential positive integers

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录 关闭

登录关闭