DBMS中的阻塞因子 [英] Blocking factor in a DBMS

查看:178
本文介绍了DBMS中的阻塞因子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

DBMS中的阻塞因素是什么,

What is the blocking factor in a DBMS,

我看到的是,这是每个记录的块的所有值(因此B / R floor)其中B是块大小,R是记录。我只是想知道,有人可以告诉我使用它的主要原因,以及它是否是实际FLOORED?

The bit I looked at said it was the floored value of blocks per record (so B/R floor), where B is block size and R is records. I was just wondering, can someone tell me the main reason its used, and also whether it is actually FLOORED?

我对FLOORED的理解是1.5得到1.0,

My understanding of FLOORED is 1.5 gets floored to 1.0, for anyone that is wondering.

尊敬的,

James

推荐答案

是的,这意味着整个记录

Yes, it means how many whole records fit into a block.

(块是基础存储系统(hdd,san fs等)的最小数据单元, )

(A block is the smallest unit of data that the underlying storage system (hdd, san fs, etc) is willing to deal in. It's size is traditionally 512 bytes for hard drives.)

这是因为如果100和一半的记录适合,一个只存储100条记录

It is floored because if 100 and a half record would fit, one only stores 100 records per block.

阻塞因子在许多与dbms相关的计算中使用得相当频繁。

Blocking factor is pretty heavily used in many dbms related calculations.

例如:

我们有10 000 000条记录。
每条记录的长度为80字节。
每条记录包含一个唯一键(Lets说社会保险号)。
我们希望通过他们的社会安全号码查找某人快。

We have 10 000 000 records. Each record is 80 bytes long. Each record contains an unique key (Lets say social security numbers). We want looking up someone by their social security number to be fast.

我们需要一些东西来衡量效果。
需要最多时间的事情是从硬盘上询问一个块。
你知道,这是一个机械设备。它必须重新定位其头部和blabla,
,所以它真的是一个慢的操作相比,CPU的速度,
或与操作内存(RAM)访问速度有多快。
好​​吧,让我们说,我们通过多少磁盘访问来衡量一个操作的性能。我们想要最小化磁盘访问的数量。
好​​,现在我们知道如何判断某事是否慢或快。

We need something to measure performance by. The thing that takes the most time is asking a block from the harddisk. You know, it is a mechanical device. It has to reposition its head, and blabla, so it really a slow operation when compared to how fast the CPU is, or compared to how fast operative memory(RAM) access is. Okay, lets say that we measure the performance of an operation by how many disk accesses it takes. We want to minimize the number of disk accesses. Okay, now we know how to tell if something is slow or fast.

很多磁盘访问 - >坏

Many disk accesses -> bad

很少的磁盘访问 - >良好

Very few disk accesses -> good

说在我们的假想hw,每个块是5000字节。我们想计算我们需要多少块。首先,我们需要知道单个块中有多少个记录:

Lets say that on our imaginary hw, each block is 5000 byte. We want calculate how many blocks we need. First, we need to know how many records fit into a single block:

阻塞因子 = floored((Block size)/(Record size)) = floored(5000/80) = 62.5) = 62记录/块

Blocking factor = floored((Block size)/(Record size)) = floored(5000/80) = floored(62.5) = 62 record/block

需要 ceiled(10000000/62)= ceiled(161290.32)= 161291 块才能存储所有数据。

And we have 10000000 records, so we need ceiled(10000000/62)=ceiled(161290.32)=161291 blocks to store all that data.

如果读取所有块以通过密钥(社会保险号)查找单个记录,那么将需要161291磁盘访问。不好。

If one were to read all the blocks to find a single record by the key (social security number), then that would take 161291 disk accesses. Not good.

我们可以做得更好。让我们构建一个索引文件。我们将构建一个稀疏索引

We can do better. Lets build an index file. We will build a sparse index.


数据库中的稀疏索引是数据文件中每个块的具有键对和指针
的文件。此文件中的每个键都与
相关联,并将一个特定的指针指向排序数据文件中的块。在
具有重复键的聚类索引中,稀疏索引指向每个块中的
最低搜索键。

A sparse index in databases is a file with pairs of keys and pointers for every block in the data file. Every key in this file is associated with a particular pointer to the block in the sorted data file. In clustered indices with duplicate keys, the sparse index points to the lowest search key in each block.

好的,所以我们将在每个块的索引文件中有一个指针和一个键。
我们假设在我们的假想hw,一个指针是4个字节长,在我们的假想世界中,一个社会保障号(key)占用6个字节。

Okay, so we will have a pointer and a key in our index file for each block. Lets say that on our imaginary hw, a pointer is 4 bytes long, and in our imaginary world a social security number (key) takes up 6 bytes.

因此,我们将为索引中的每个块存储一个10字节长的键指针对。
这些对中有多少适合单个块?

So we are going to store one 10 byte long key-pointer pair for each block in our index. How many of these pairs fit into a single block?

Blocking factor of the index file = floored(5000/10) = 500

...所以这意味着500个键指针对适合单个块。我们需要存储161291这些,因此索引文件将占用 ceiled(161291/500)= 323

... so this means that 500 key-pointer pairs fit into a single block. And we need to store 161291 of these, so the index file will take up ceiled(161291/500)=323 blocks

索引文件按键排序,因此我们可以在其中进行二分查找,以找到包含记录的块的指针。在索引文件中执行二进制搜索最多需要 ceiled(log2(323))= 9 磁盘访问权限。我们还需要 +1 磁盘访问来实际读取索引记录所指向的数据块。

The index file is ordered by key, so we can do binary search in it to find the pointer to the block which contains the record. Doing binary search in the index file costs at most ceiled(log2(323))=9 disk acceses. We also need +1 disk access to actually read the data block which the index record points to.

哇,我们得到我们的查找工作在10磁盘访问。这真的很棒。我们甚至可以做得更好。 :)

好的,因此您可以看到在此计算中例如使用阻塞因子。

Okay, so you can see that blocking factor is heavy used for example in this calculation.

这篇关于DBMS中的阻塞因子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆