索引性能 BigInt 与 VarChar [英] Indexing Performance BigInt vs VarChar

查看:99
本文介绍了索引性能 BigInt 与 VarChar的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个数据仓库中的 FACT 表

This is a FACT Table in a Data Warehouse

它有一个复合索引如下

ALTER TABLE [dbo].[Fact_Data] 
ADD  CONSTRAINT [PK_Fact_Data] 
PRIMARY KEY CLUSTERED 
(
    [Column1_VarChar_10] ASC,
    [Column2_VarChar_10] ASC,
    [Column3_Int] ASC,
    [Column4_Int] ASC,
    [Column5_VarChar_10] ASC,
    [Column6_VarChar_10] ASC,
    [Column7_DateTime] ASC,
    [Column8_DateTime] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, 
SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, 
ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON
) ON [PRIMARY]
GO

在这个结构中,所有的 varchar 10 列都只有数字值.在查询和索引方面,更改这个 7800 万行结构以保存 BIGINT 而不是 VARCHAR 对我有好处吗?

In this structure, all of the varchar 10 columns have numeric values only. Is it going to be beneficial for me to change this 78 million row structure to hold BIGINT instead of VARCHAR in terms of querying and indexing?

我应该考虑的任何其他好处/缺点吗?

Any other benefits / drawbacks that I should consider?

推荐答案

你应该DEFINITELY引入一个代理INT IDENTITY()主键!!INT 已经为您提供了多达 20 亿行——这还不够吗??

You should DEFINITELY introduce a surrogate INT IDENTITY() primary key!! INT already gives you potentially up to 2 billion rows - isn't that enough??

SQL Server 上的这个主键/聚集键的大小最多为 64 字节(而不是 4,对于 INT) - 这将使您的聚集索引和所有非聚集索引膨胀得无法识别.整个集群键(您的所有 8 列)将包含在该表的每个非集群索引的每一页中 - 肯定会浪费大量空间.

This primary key / clustered key on SQL Server will be up to 64 bytes in size (instead of 4, for a INT) - which will make your clustered index AND all your non-clustered index be bloated beyond recognition. The whole clustering key (all your 8 columns) will be included on every single page of every single non-clustered index on that table - wasting lots and lots of space for sure.

因此,在任何给定的索引表上,您将拥有最多 16 倍的具有代理 INT 聚集键的条目——这意味着 I/O 更少,阅读索引页浪费的时间更少.

So on any given index table, you would have up to 16 times more entries with a surrogate INT clustered key - that means lots less I/O, lots less time wasted reading index pages.

想象一下试图建立与该表的外键关系......任何子表都必须将主键的所有8列作为外键列,并指定所有每个连接中有 8 列 - 真是一场噩梦!

And just imagine trying to establish a foreign-key relationship to that table.... any child table would have to have all 8 columns of your primary key as foreign key columns, and specify all 8 columns in every join - what a nightmare!!

在 7800 万行时,即使只是将集群键更改为 INT IDENTITY 也可以为您每行节省多达 60 个字节 - 仅此一项就可以节省多达 4 GB 的磁盘空间(以及服务器中的 RAM 使用量).这甚至还没有开始计算非聚集索引的节省......

At 78 million rows, even just changing the clustering key to INT IDENTITY will save you up to 60 bytes per row - that alone would come out to be up to 4 GByte of disk space (and RAM usage in your server). And that's not even beginning to calculate the savings on the non-clustered indices.......

当然,是的,我也会将 VARCHAR(10) 更改为 INT 或 BIGINT - 如果它是数字,请将字段类型设为数字 - 将其保留在 VARCHAR(10) 真的没有意义.但仅此一项不会在速度或性能方面产生巨大差异 - 它只会使处理数据变得更加容易(例如,在比较值等时不必总是转换为数字类型).

And of course, yes, I would also change the VARCHAR(10) to INT or BIGINT - if it's a number, make the field type numeric - no point in leaving it at VARCHAR(10), really. But that alone is not going to make a huge difference in terms of speed or performance - it just makes working with the data that much easier (don't have to always cast around to numeric types when e.g. comparing values and so forth).

马克

这篇关于索引性能 BigInt 与 VarChar的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆