记录和丛集 [英] Records and Clusters

查看:58
本文介绍了记录和丛集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

出于多种原因,我决定编写自己的平衡二叉树库.我的库的一个功能是群集"和重新生成"功能,在其他免费提供的库中没有见过.

我要攻击的问题是,当元素随机添加到树中时,为了保持树平衡而自动发生的树旋转趋向于将边缘指针散布到整个位置.最终结果是,在一棵大树中,典型的遍历可能需要从整个物理内存(或磁盘页面)中读取节点:缓存的使用效率非常低.通过在物理内存中将几个靠近根节点"的子树聚集在一起,可以通过始终将始终被访问的节点保持在物理内存中(或在同一磁盘页面上)彼此相邻来显着加快树遍历.例如,在适当聚类的树中,其簇大到足以容纳4个级别的记录(1 + 2 + 4 + 8 = 15个记录),高度为16的树(> 45000个记录)只能在4个簇中遍历读取而不是非群集树可能需要的16次读取.

我不知道这是一种衡量树群集外"程度的好方法.我希望能够确定何时需要重新整理一棵树.我敢肯定有很好的理论可以解决这个问题.这个想法似乎并不那么困难,但是我还没有遇到解决方案.我花了一些时间来寻找一种方法,大概花了几个小时来思考它.还没有.

有什么想法或有用的链接吗?

For many varied reasons, I decided to write my own balanced binary tree library. One feature of my library that I haven''t seen in other freely available libraries is "clusters", and a "recluster" function.

The problem that I am attacking is that as elements are randomly added to the tree, the tree rotations that automatically happen to keep the tree balanced tend to spread the edge pointers all over the place. The end result is that in a large tree, a typical traversal can require reading nodes from all over physical memory (or disk pages): a very inefficient use of cache. By clustering several subtree "near the root nodes" together in physical memory, tree traversals can be sped-up significantly by keeping nodes that always get visited near one another within physical memory (or on the same disk page). For example, in a properly clustered tree, with clusters large enough to hold 4 levels of records (1+2+4+8 = 15 records), a tree of height 16 (>45000 records) can be traversed in only 4 cluster reads instead of 16 reads that might be required of a non-clustered tree.

What I don''t know is a good way to measure how "out-of-cluster" a tree is. I''d like to be able to determine when a tree needs to be reclustered. I''m sure there is good theory out there for this; and the idea doesn''t seem to be that difficult, but I haven''t run onto the solution yet. I''ve spent a little bit of time googling for a method, and probably a couple of hours thinking about it. Nothing yet.

Any ideas or helpful links?

推荐答案

您是否考虑过Mix软件"The C Database Tool Chest"实现的B +树. Mix仍在网络上-Google混合软件".注意,术语数据库"实际上只是一个B +树.数据库是一棵包含多个键和记录的内存或文件块树.

我在85年来就得到了它,并且从此以后就一直使用它.我得到了源软件(不知道这是否仍然可用).不是开放源代码,但是可以使用已发布的任何程序而不会产生任何使用费,唯一的限制是必须将其链接到对象中,没有DLL,并且源代码和库都不能被释放-仅包含嵌入式的可执行文件代码.使用该源代码,您可以进行所需的任何修改.

我对其进行了修改(最初是16位-使用Borland C进行编译),现在可以使用VS2008从而达到32位并支持> 4GB文件.进行重大重写以释放现在为32位的INT使用量-在16位中,INT为16位.与结构布局存在许多冲突,并且32位版本无法读取旧的16位数据库,反之亦然.我建立了我的版本,可以在其中声明所需的INT大小,因此只能使用一个转换器.我创建了两个转换器(16位和32位),它们可以读取适当的数据库(以升序排列)并输出包含其数据的平面文件,然后读入平面文件并写入另一个位大小的数据库. br>
戴夫.
Have you ever considered B+ trees as implemented by "The C Database Tool Chest", Mix Software. Mix is still on the web - Google "Mix Software". Note that the term "database" is, in fact, just a B+ tree. The database is a tree of memory or file blocks which contain multiple keys and records.

I got this in ''85 and have used it ever since. I got the source software (don''t know if this is still available). Not Open Source, but any programs that are released can be used without any royalty, only restriction is that it must be linked into an object, no DLLs, and the neither the source nor the library can be released - only the executable with the embedded code. With the source, you can make any modifications you want.

I modified it (originally 16bit - used Borland C to compile) to now use VS2008 thus 32 bit and supports >4GB files. Major rewrite to unwind the INT usage which now is 32 bit - in 16 bit an INT was 16 bit. Many conflicts with structure layout, and old 16 bit databases couldn''t be read with the 32 bit version, nor the other way around. I built my version where I could declare which size of INT I wanted so I had only one converter to work with. I created two converters (16 bit and 32 bit) which could read the appropriate data base (in ascending order) and output a flat file with the data it contained, and then read in the flat file and write the other bit sized database.

Dave.


这篇关于记录和丛集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆