B树,数据库,顺序插入与随机插入以及速度.随机胜出 [英] B-trees, databases, sequential vs. random inserts, and speed. Random is winning

查看:207
本文介绍了B树,数据库,顺序插入与随机插入以及速度.随机胜出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

编辑

@Remus更正了我的测试模式.您可以在下面的答案中看到更正的版本.

我建议用DECIMAL(29,0)替换INT,结果是:

十进制:2133
GUID:1836

即使插入的行略多一些,随机插入仍然是赢家.

尽管有解释说明随机插入比顺序插入慢,但这些基准测试表明它们显然更快.我得到的解释与基准不同.因此,我的问题仍然集中在b树,顺序插入和速度上.

...

我从经验中知道,当数据按顺序添加到b树时(无论方向如何),b树的性能都很差.但是,当随机添加数据时,可以获得最佳性能.

这很容易用RB-Tree来演示.顺序写入导致最大数量的树平衡被执行.

我知道很少有数据库使用二叉树,而是使用n阶平衡树.从逻辑上讲,我认为它们在顺序输入方面的命运与二叉树相似.

这激发了我的好奇心.

如果是这样,则可以推断出写入顺序ID(例如在IDENTITY(1,1)中)将导致树的多次重新平衡.我已经看到许多帖子反对GUID,因为它们会引起随机写入".我从不使用GUID,但令我吃惊的是,这个坏"点实际上是一个点.

因此,我决定对其进行测试.这是我的代码:

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[T1](
    [ID] [int] NOT NULL
 CONSTRAINT [T1_1] PRIMARY KEY CLUSTERED ([ID] ASC) 
)
GO

CREATE TABLE [dbo].[T2](
    [ID] [uniqueidentifier] NOT NULL
 CONSTRAINT [T2_1] PRIMARY KEY CLUSTERED ([ID] ASC)
)

GO

declare @i int, @t1 datetime, @t2 datetime, @t3 datetime, @c char(300)

set @t1 = GETDATE()
set @i = 1

while @i < 2000 begin
    insert into T2 values (NEWID(), @c)
    set @i = @i + 1
end

set @t2 = GETDATE()
WAITFOR delay '0:0:10'
set @t3 = GETDATE()
set @i = 1

while @i < 2000 begin
    insert into T1 values (@i, @c)
    set @i = @i + 1
end

select DATEDIFF(ms, @t1, @t2) AS [Int], DATEDIFF(ms, @t3, getdate()) AS [GUID]

drop table T1
drop table T2

请注意,对于行的大量额外内容,我不会减少创建GUID nor 的任何时间.我的机器上的结果如下:

整数:17,340毫秒 GUID:6,746毫秒

这意味着在此测试中, 16个字节的随机插入 4个字节的顺序插入快了将近3倍. >

有人想对此发表评论吗?

Ps.我知道这不是问题.这是讨论的邀请,并且与学习最佳编程有关.

解决方案

您没有测量INSERT速度.您正在测量日志刷新性能.由于您在每次INSERT之后进行提交,因此所有这些测试都在等待提交以强化日志.这与INSERT性能几乎无关.而且,当SET NOCOUNT为OFF ...

时,请勿发布性能"测量值

因此,在没有适当大小的数据,批处理提交和预先种植的数据库的情况下,让我们在没有不必要的服务器-客户端闲聊的情况下尝试以下操作:

:setvar dbname testdb
:setvar testsize 1000000
:setvar batchsize 1000

use master;
go

if db_id('$(dbname)') is not null
begin
    drop database [$(dbname)];
end
go

create database [$(dbname)] 
    on (name='test_data', filename='c:\temp\test_data.mdf', size=10gb)
    log on (name='test_log', filename='c:\temp\test_log.ldf', size=100mb);
go

use [$(dbname)];
go  

CREATE TABLE [dbo].[T1](
    [ID] [int] NOT NULL
 CONSTRAINT [T1_1] PRIMARY KEY CLUSTERED ([ID] ASC) 
)
GO

CREATE TABLE [dbo].[T2](
    [ID] [uniqueidentifier] NOT NULL
 CONSTRAINT [T2_1] PRIMARY KEY CLUSTERED ([ID] ASC)
)
GO

set nocount on;
go

declare @i int, @t1 datetime, @t2 datetime

set @t1 = GETDATE()
set @i = 1

begin transaction;
while @i < $(testsize) begin
    insert into T1 values (@i)
    set @i = @i + 1
    if @i % $(batchsize) = 0
    begin
        commit;
        begin transaction;
    end
end
commit

set @t2 = GETDATE()
set @i = 1
begin transaction
while @i < $(testsize) begin
    insert into T2 values (NEWID())
    set @i = @i + 1
    if @i % $(batchsize) = 0
    begin
        commit;
        begin transaction;
    end
end
commit

select DATEDIFF(ms, @t1, @t2) AS [Int], DATEDIFF(ms, @t2, getdate()) AS [UID]

drop table T1
drop table T2

INTS:18秒
向导:23秒

QED

EDIT

@Remus corrected my test pattern. You can see the corrected version on his answer below.

I took the suggestion of replacing the INT with DECIMAL(29,0) and the results were:

Decimal: 2133
GUID: 1836

Random inserts are still winning, even with a fractionally bigger row.

Despite explanations that indicate random inserts are slower than sequential ones, these benchmarks show they are apparently faster. The explanations I'm getting aren't agreeing with the benchmarks. Therefore, my question remains focused on b-trees, sequential inserts, and speed.

...

I know from experience that b-trees have awful performance when data is added to them sequentially (regardless of the direction). However, when data is added randomly, best performance is obtained.

This is easy to demonstrate with the likes of an RB-Tree. Sequential writes cause a maximum number of tree balances to be performed.

I know very few databases use binary trees, but rather used n-order balanced trees. I logically assume they suffer a similar fate to binary trees when it comes to sequential inputs.

This sparked my curiosity.

If this is so, then one could deduce that writing sequential IDs (such as in IDENTITY(1,1)) would cause multiple re-balances of the tree to occur. I have seen many posts argue against GUIDs as "these will cause random writes". I never use GUIDs, but it struck me that this "bad" point was in fact a good point.

So I decided to test it. Here is my code:

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[T1](
    [ID] [int] NOT NULL
 CONSTRAINT [T1_1] PRIMARY KEY CLUSTERED ([ID] ASC) 
)
GO

CREATE TABLE [dbo].[T2](
    [ID] [uniqueidentifier] NOT NULL
 CONSTRAINT [T2_1] PRIMARY KEY CLUSTERED ([ID] ASC)
)

GO

declare @i int, @t1 datetime, @t2 datetime, @t3 datetime, @c char(300)

set @t1 = GETDATE()
set @i = 1

while @i < 2000 begin
    insert into T2 values (NEWID(), @c)
    set @i = @i + 1
end

set @t2 = GETDATE()
WAITFOR delay '0:0:10'
set @t3 = GETDATE()
set @i = 1

while @i < 2000 begin
    insert into T1 values (@i, @c)
    set @i = @i + 1
end

select DATEDIFF(ms, @t1, @t2) AS [Int], DATEDIFF(ms, @t3, getdate()) AS [GUID]

drop table T1
drop table T2

Note that I am not subtracting any time for the creation of the GUID nor for the considerably extra size of the row. The results on my machine were as follows:

Int: 17,340 ms GUID: 6,746 ms

This means that in this test, random inserts of 16 bytes was almost 3 times faster than sequential inserts of 4 bytes.

Would anyone like to comment on this?

Ps. I get that this isn't a question. It's an invite to discussion, and that is relevant to learning optimum programming.

解决方案

You are not measuring the INSERT speed. You are measuring your log flush performance. Since you commit after each INSERT, all those tests are doing are sitting around waiting for commit to harden the log. That is hardly relevant for INSERT performance. And please don't post 'performance' measurements when SET NOCOUNT is OFF...

So lets try this without unnecessary server-client chatter, with a properly sized data, batch commits and pre-grown databases:

:setvar dbname testdb
:setvar testsize 1000000
:setvar batchsize 1000

use master;
go

if db_id('$(dbname)') is not null
begin
    drop database [$(dbname)];
end
go

create database [$(dbname)] 
    on (name='test_data', filename='c:\temp\test_data.mdf', size=10gb)
    log on (name='test_log', filename='c:\temp\test_log.ldf', size=100mb);
go

use [$(dbname)];
go  

CREATE TABLE [dbo].[T1](
    [ID] [int] NOT NULL
 CONSTRAINT [T1_1] PRIMARY KEY CLUSTERED ([ID] ASC) 
)
GO

CREATE TABLE [dbo].[T2](
    [ID] [uniqueidentifier] NOT NULL
 CONSTRAINT [T2_1] PRIMARY KEY CLUSTERED ([ID] ASC)
)
GO

set nocount on;
go

declare @i int, @t1 datetime, @t2 datetime

set @t1 = GETDATE()
set @i = 1

begin transaction;
while @i < $(testsize) begin
    insert into T1 values (@i)
    set @i = @i + 1
    if @i % $(batchsize) = 0
    begin
        commit;
        begin transaction;
    end
end
commit

set @t2 = GETDATE()
set @i = 1
begin transaction
while @i < $(testsize) begin
    insert into T2 values (NEWID())
    set @i = @i + 1
    if @i % $(batchsize) = 0
    begin
        commit;
        begin transaction;
    end
end
commit

select DATEDIFF(ms, @t1, @t2) AS [Int], DATEDIFF(ms, @t2, getdate()) AS [UID]

drop table T1
drop table T2

INTS: 18s
GUIDS: 23s

QED

这篇关于B树,数据库,顺序插入与随机插入以及速度.随机胜出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆