压缩，碎片整理，回收空间，收缩数据库与收缩文件 [英] compression, defragmentation, reclaiming space, shrinkdatabase vs. shrinkfile

查看：109 发布时间：2020/5/8 21:55:34 sql sql-server tsql memory-management storage

本文介绍了压缩，碎片整理，回收空间，收缩数据库与收缩文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

[1]声明:

从堆中删除数据时，页面上的数据不会被压缩(回收).并且如果堆页面的所有行都被删除，则通常无法回收整个页面"
"ALTER INDEX重建和重组选项不能用于对堆中的空间进行碎片整理和回收(但是它们可以用于对堆中的非聚集索引进行碎片整理).
如果要在SQL Server 2005中对堆进行碎片整理，则有以下三种选择:
- 1)在堆上创建一个聚集索引，然后删除该聚集索引；
- 2)使用SELECT INTO将旧表复制到新表；或
- 3)使用BCP或SSIS将数据从旧表移动到新表.
  在SQL Server 2008中，更改了ALTER TABLE命令，因此它现在具有重建堆的能力."

请解释一下:

MS SQL Server 2005中的压缩，碎片整理，回收空间，收缩文件和收缩数据库之间有什么区别?
收缩文件和收缩数据库在MS SQL Server 2005中完成什么工作?

更新:
该问题的灵感来自于[2]中的讨论-如何在MS SQL Server 2005中缩小数据库?

Update2 :@PerformanceDBA，
ga！您在短短一周内就获得了500多种收益.这太了不起了！

您的图表
再次感谢您抽出宝贵的时间.
我待会再问，不在这里.
内在并不是我最主要的事情，也不是最简单的事情.

它非常简洁，通常颂歌不会引起任何疑问或疑问.

我希望使用一些工具，描述/说明，技巧来提出自己的疑问，问题和讨论.
例如，Plz看到了我的问题:

它们基本上是我所要求的内容的重复内容，但不能在stackoverflow.com中进行讨论

Update3 :@PerformanceDBA，
再次感谢，我的问题的主要目的是确定解决具体问题的方法(基于和避免什么)，这些文档，文章，讨论，答案等相互矛盾，有助于您发现问题.

目前，我在这一领域没有其他问题(无法解决并阻止我).

[1]
布拉德·麦基(Brad McGehee).布拉德肯定的索引指南 (2009年6月11日)
http://www.simple-talk .com/sql/database-administration/brads-sure-guide-to-indexes/
[2]
对问题的答案和反馈将数据库缩小到其初始大小以下"
https://stackoverflow.com/questions/3543884/shrink-一个数据库低于其初始大小/3774639#3774639

解决方案

一个多月没有人碰过.

前三个答案实际上在

回收空间

这是正确的术语. MS不会从页面中删除已删除的行，也不会从扩展区中删除已删除的页面.回收空间是一项通过堆操作并删除未使用的(a)行和(b)页面的操作.当然，这会更改RowId，因此必须重新构建所有非聚集索引.

压缩

在粘贴的文本中:与回收空间相同.

碎片整理

完全清除未使用空间的操作.共有三个级别:

I.所有对象之间的数据库(AllocationUnits)

II.对象(范围和页面)，页面链，拆分页面，溢出页面

III.仅堆(无聚簇索引)，帖子的主题

收缩文件

完全不同的操作，以减少在设备(文件)上分配的空间.这会删除未使用的AllocationUnits(因此会缩小")，但与进行碎片整理的AllocationUnits不同.

收缩数据库

对数据库执行相同操作；数据库在所有设备上使用的所有设备分配.

回复评论

SSC的发布者毫无头绪，不会直接解决您的问题.

没有集群表(CREATE CLUSTERED TABLE失败)

有诸如聚集索引之类的东西(CREATE CLUSTERED INDEX成功)

根据我的图表，它是一个单一的物理结构；聚集索引包含行，从而消除了堆

没有聚集索引的地方，有两个物理结构:堆和单独的非聚集索引

现在，在使用DBCC进行深入研究之前，DBCC的水平太低，无知的人们无法识别，更不用说解释其原因和理由了，您需要了解并确认上述内容:

创建一个Table_CI(我们打算添加一个CI，但仍然没有诸如簇表之类的东西)

为其添加唯一的聚集索引UC_PK

添加几行

创建表堆

向其添加唯一的非聚集索引NC_PK

添加几行

SELECT * FROM sysindexes，其中id = OBJECT_ID("Table_CI")

SELECT * FROM sysindexes WHERE id = OBJECT_ID("Heap")

请注意，每个sysindexes条目都是完整的，独立的数据存储结构(请参阅各列)

考虑输出

与我的图表比较

与宇宙中的垃圾相比

将来，我将不再回答有关宇宙中混乱的垃圾以及其他网站上不正确和错误消息的帖子的问题(我不在乎他们是否是MS认证专家，他们已经证明自己无能力检查他们的数据库并确定正确的信息)

有一个原因我一直不愿意创建准确的图表(手册，图片以及MS的所有可用信息都是垃圾；请您不要从:authority查找准确的信息，因为"authority" "在技术上已经破产).

甚至盖尔终于到处走走 我怀疑在摆弄底层内部知识之前，您将从更多的索引总体体系结构中受益.

除了，没有任何东西.这不会造成混乱，非技术性和不一致.

我有一个烦恼来创建准确的图表的原因.

返回到DBCC.盖尔是完全不正确的.在聚集索引(包括行)中，单个页面包含行.是的，行.那就是索引的叶子级别.有一个B树，它位于页面顶部，但是它很小，您看不到它.查看sysindexes输出.根和首页指针指向页面；这是聚簇索引的根.当您潜入大海时，您需要知道要寻找的东西以及在哪里可以找到它，否则您将找不到想要的东西，并且您会因自己偶然发现的漂浮物和喷射物质而分心

现在查看NCI和堆的两种独立结构.
噢，MS已从使用OAM术语更改为IAM，其中数据结构是索引.这带来了混乱.就数据结构(sysindexes中的条目)而言，它们都是对象.它们可能是也可能不是索引).关键是，谁在乎，我们知道它是什么，它是一个ObjectAllocationMap……如果您在NCI上看，gee，它是一个IndexObjectAllocationMap；如果您正在查看堆，则为HeapObjectAllocMap.我将考虑在CI情况下的情况.在追踪或使用它时(找到属于该对象的页面并不重要，它们都是对象.这样做时，您需要知道，某些对象具有PageChain，而其他对象则没有PageChain(另一个您的问题).配置项拥有它们； NCI和堆没有.

Gail Shaw:我怀疑这些内部结构是否在任何地方都有记录.毕竟，我们使用的是未记录的功能.索引的定义取决于您询问的人和外观.

ROTFLMAO.我的侧面受伤了，我看不清其后的帖子.这些应该是聪明的人吗?在IT世界中工作?定义CHANGE?一天中的温度或时间如何?那就是SQL Server Central?不是偏僻地区吗?

当MS从Sybase窃取SQL Server时，该文档非常可靠.当然，在每个主要发行版中，他们都重写"了该文档，文档变得越来越虚弱和蓬松(回想一下我们在另一篇文章中的讨论).现在，我们拥有可爱的图片，从技术上来说，这些图片会让人们感觉很好，但不准确.这就是为什么像你这样认真的人有问题的原因.图片甚至与手册中的文字不符.

无论如何，定义不会改变.那就是定义的定义.它们在任何情况下都是正确的.嗯，您正在使用的um功能是已记录的普通功能.自1987年以来一直如此.除非MS在某个地方丢失了它，而且没人能找到它.您必须问一个过去的Sybase Guru，谁记得他们获取的代码中确切的数据结构.而且，如果您真的很幸运，他将及时了解MoronSociety在2000年，2005年和2008年引入的差异.他甚至可能只有一个准确的图表，可以与您的机器上的sysindexes和DBCC的输出相匹配.如果找到他，请亲吻他的戒指，并给他淋浴.锁上你的女儿.

(不严重，我的身边正在杀死我，欢乐满溢).

现在您知道为什么我不回答有关宇宙中混乱垃圾的问题了吗??在MoronSociety中，只有这么多白痴.

-----

再次锁定:

扫描:
索引扫描是对索引中所有叶子页的完整读取.当对聚集索引进行索引扫描时，是除名称之外的所有表扫描.
当查询处理器完成索引扫描时，无论是否返回所有行，它始终是索引中所有叶子页的完整读取.这绝不是部分扫描.
扫描不仅涉及读取索引的叶级别，还读取较高级别的页面作为索引扫描的一部分."

一定有一个原因使她以快风而得名.她写书"吗?是的，幻想小说.热风是给气球爱好者而不是IT专业人员的.

完整和完整的摘要.索引扫描的全部要点以及为什么它适合表扫描，因为它试图避免表扫描，因此: -引擎(执行查询树)可以直接转到索引(此时为聚集或非聚集) -导航到B树以找到开始的地方(到现在为止，这与获得几行(即不扫描)大体相同) -B树(从任何良好的TECHNICAL图表来看)只有几页，每页包含很多索引条目，因此它非常快 -这是根加非叶水平 -直到找到符合条件的叶级条目 -从那时起，它会依次通过该索引的LEAF级别进行扫描(胖蓝色箭头)

现在是NCI了，如果您还记得功课，那意味着叶子级页面上充满了index_leaf_level_entry + CI_key

所以它在NCI叶级别上顺序扫描(这就是为什么仅在NCI的叶级别上有一个PageChain，以便它可以浏览的原因)

但是在HEAP上到处跳转以获取数据行

，但对于CI，叶级是数据行(数据页，只有数据行，这就是为什么您看不到其中的索引"的原因；非叶级CI页是纯索引)仅包含index_entries的页面)

因此，当它使用PageChain顺序地对索引leaf_level进行扫描时，它会顺序地扫描数据，它们是相同的操作(绿色绿色箭头)

没有堆

不跳来跳去

为进行比较，然后进行表扫描(仅适用于MS): -堆上没有PageChain -别无选择，只能从头开始 -并读取每个数据页 -其中许多将分散(包含已删除或转发的行留下的未使用空间) -而其他人将完全是空的

整个意图是，优化器已经决定不进行表(堆)扫描，而是可以进行索引扫描(因为它需要的数据少于整个数据范围，并且可以找到数据通过某个索引的起点).如果您查看SHOWPLAN，即使是检索一个唯一的PK行，它也会显示"INDEX SCAN".这意味着，它将首先导航B-树，以找到至少一行.然后，它可能会扫描叶级别，直到找到终点为止.如果是覆盖查询，则永远不会转到数据行.

集群索引无可替代.
[1] states:

"When data is deleted from a heap, the data on the page is not compressed (reclaimed). And should all of the rows of a heap page are deleted, often the entire page cannot be reclaimed"

"The ALTER INDEX rebuild and reorganize options cannot be used to defragment and reclaim space in a heap (but they can used to defragment non-clustered indexes on a heap).
If you want to defragment a heap in SQL Server 2005, you have three options:

1) create a clustered index on the heap, then drop the clustered index;

2) Use SELECT INTO to copy the old table to a new table; or

3) use BCP or SSIS to move the data from the old table to a new table.
In SQL Server 2008, the ALTER TABLE command has been changed so that it now has the ability to rebuild heap"

Plz explain me:
What are the difference between compression, (de)fragmentation, reclaiming the space, shrinkfile and shrinkdatabase in MS SQL Server 2005?
What does shrinkfile and shrinkdatabase accomplish in MS SQL Server 2005?

Update:
The question was inspired by discussion in [2] - how to shrink database in MS SQL Server 2005?

Update2: @PerformanceDBA,
Congats! You've gained over 500+ in just a week. This is remarkable!

Your diagram
Thanks, once more, for your time.
I shall ask later and not here.
Internals is not my primary preoccupation and not easiest one.

It is very succinct and generally odes not invoke any doubts or questions.

I'd prefer some tool, descriptions/instructions, technique around which to develop my doubts, question and discussion.
Plz see, for ex., my questions:

http://www.sqlservercentral.com/Forums/Topic1013693-373-2.aspx#bm1014385

http://www.sqlservercentral.com/Forums/Topic1013975-373-1.aspx

They are basically duplicates of what I asked but cannot discuss in stackoverflow.com

Update3: @PerformanceDBA,
thanks, once more, the main purpose of my questions was to determine the ways how to resolve concrete questions (basing on as well as avoiding what ) having contradictory docs, articles, discussions, answers, etc. which you helped to detect.

Currently I do not have further (unresolvable and blocking me) questions in this area.

[1]
Brad McGehee. Brad's Sure Guide to Indexes (11 June 2009)
http://www.simple-talk.com/sql/database-administration/brads-sure-guide-to-indexes/
[2]
Answers and feedback to question "Shrink a database below its initial size"
https://stackoverflow.com/questions/3543884/shrink-a-database-below-its-initial-size/3774639#3774639
解决方案
No one touched this in over one month.

The answers to the first three are actually in the Diagram I Made for You, which you have not bothered to digest and ask questions about ... it is often used as a platform for discussion.

(That is a condensed version of my much more elaborate Sybase diagrams, which I have butchered for the MS context. There is a link at the bottom of that doc, if you want the full Sybase set.)

Therefore I am not going to spend much time on you either. And please do not ask for links to "reference sites", there ain't no such thing (what is available is non-technical rubbish), which is precisely why I have my own diagrams; there are very few people who understand MS SQL Internals.

reclaiming the space

That is the correct term. MS does not remove deleted rows from the page, or deleted pages from the extent. Reclaiming the space is an operation that goes through the Heap and removes the unused (a) rows and (b) pages. Of course that changes the RowIds, so all Nonclustered indices have to be rebuilt.

compression

In the context of the pasted text: same as Reclaiming space.

defragmentation

the operation of full scale removal of unused space. There are three Levels:

I. Database (AllocationUnits), across all objects

II. Object (Extent & Page), Page Chains, Split Pages, Overflow Pages

III. Heap Only (No Clustered index), the subject of the post

shrinkfile

Quite a different operation, to reduce the space allocated on a Device (File). This removes unused AllocationUnits (hence 'shrink') but it is not the same a de-fragmenting AllocationUnits.

shrinkdatabase

To do the same for a Database; All Devices Allocations used by the database across all Devices.

Response to Comments

The poster at SSC is clueless and does not address your question directly.

there is no such thing as a Clustered table (CREATE CLUSTERED TABLE fails)

there is such a thing as a Clustered index (CREATE CLUSTERED INDEX succeeds)

as per my diagrams, it is a single physical structure; the clustered index INCLUDES the rows and thus the Heap is eliminated

where there is no Clustered index, there are two physical structures: a Heap and a separate Nonclustered Index

Now before you go diving into them with DBCC, which is too low level, and clueless folks cannot identify, let alone explain, the whys and wherefores, you need to understand and confirm the above:

create a Table_CI (we are intending to add a CI, there is still no such thing as a Clustered Table)

add an unique clustered index to it UC_PK

add a few rows

create a table Heap

add an unique Nonclustered index to it NC_PK

add a few rows

SELECT * FROM sysindexes WHERE id = OBJECT_ID("Table_CI")

SELECT * FROM sysindexes WHERE id = OBJECT_ID("Heap")

note that each sysindexes entry is a complete, independent, data storage structure (look at the columns)

contemplate the output

compare with my diagram

compare with the rubbish in the universe

In future, I will not answer questions about the confused rubbish in the universe, and the incorrect and misinformed posts on other sites (I do not care if they are MS Certified Professionals, they have proved that they are incapable of inspecting their databases and determining the correct information)

There is a reason I have bothered to create accurate diagrams (the manuals, pictures, and all available info fro MS, is all rubbish; no use for you to look for accurate info from the :authority", because the "authority" is technically bankrupt).

Even Gail finally gets around to I suspect you'd benefit from more reading on overall architecture of indexes before fiddling with the low level internals.

Except, there isn't any. That are not confusing, non-technical, and inconsistent.

There is a reason I have bothered to create accurate diagrams.

Back to the DBCCs. Gail is simply incorrect. In a Clustered Index (which includes the rows), the single page contains rows. Yes, rows. that is the leaf level of the index. There is a B-tree, it lives in the top of the page, but it is so small and you can't see it. Look at the sysindexes output. The root and firstpage pointer IS pointing to the page; that is the root of the Clustered Index. When you dive into the ocean, you need to know what to look for, AND where to find it, otherwise you won't find what you are looking for, and you will get distracted by the flotsam and jetsam that you do find by accident.

Now look at the TWO SEPARATE STRUCTURES for the NCI and the Heap.

Oh, and MS has changed from using the OAM terminology to the IAM where the data structure is an index. That introduces confusion. In terms of data structures (entries in sysindexes), they are all Objects; they might or might not be Indices). The point is, who cares, we know what it is, it is an ObjectAllocationMap ... if you are looking at at NCI, gee, it is an IndexObjectAllocationMap; if you are looking at a Heap, it is a HeapObjectAllocMap. I will let you ponder what it is in the case of a CI. In chasing it down, or in using it (finding the pages that belong to the OBJECT, it does not matter, they are all Objects. When doing that, you need to know, some objects have a PageChain and others do not (another of your questions). CIs have them; NCIs and Heaps do not.

Gail Shaw: "I doubt these kinds of internals are documented anywhere. After all, we're using undocumented features. Definition of index depends who you ask and where you look.

ROTFLMAO. My sides hurt, I could not read the posts that followed it. These are supposed to be intelligent human beings ? Working in the IT world ? Definitions CHANGE ? What with the temperature or the time of day ? And that was SQL Server Central ? Not the backwoods ?

When MS stole SQL Server from Sybase, the documentation was rock solid. Of coure, with each major release, they "rewrite" it, and the docs get weaker and more fluffy (recall our discussion in another post). Now we have cute pictures that make people feel good but are inaccurate, technically. Which is why earnest people like you have problems. the pictures do not even match the text in the manuals.

Anyway, DEFINITIONS do not change. That's the definition of definitions. They are true in any context. And Um, the um feature you are using is an ordinary, documented feature. Has been since 1987. Except MS lost it somewhere and no one can find it. You'll have to ask a Sybase Guru who was around in the old days, who remembers what exact data structures were in the code that they acquired. And if you are really lucky, he will be up to date with the differences that MoronSociety has introduced in 2000, 2005, 2008. He might even have a single accurate diagram that matches the output of sysindexes and DBCC on your box. If you find him, kiss his ring and shower him with gold. Lock up your daughters.

(not serious, my sides are killing me, the mirth is overflowing).

Now do you see why I will not answer questions about the confused rubbish in the universe ? There are just SO MANY morons out there in MoronSociety.

-----

Gail again:

"Scans:
An index scan is a complete read of all of the leaf pages in the index. When an index scan is done on the clustered index, it’s a table scan in all but name.
When an index scan is done by the query processor, it is always a full read of all of the leaf pages in the index, regardless of whether all of the rows are returned. It is never a partial scan.
A scan does not only involve reading the leaf levels of the index, the higher level pages are also read as part of the index scan."

There must be a reason she is named after fast wind. She writes "books" ? Yeah, fantasy novels. Hot air is for balloonists not IT professionals.

Complete and total drivel. The whole point of an Index Scan AND WHY IT IS PREFERABLE TO A TABLE SCAN, because it is trying to AVOID A TABLE SCAN, is that: - the engine (executing the query tree) can go directly to the Index (Clustered or Nonclustered, at this point) - navigate the B-Tree to find the place to start (which up to this point, is much the same as when it is getting a few rows, ie. not scanning) - the B-Tree (from any good TECHNICAL diagram) is a few pages, containing many, many index entries per page, so it is very fast - that's the root plus non-leaf levels - until it find a leaf-level entry that qualifies - from that point on, it does a SCAN, sequentially, through the LEAF level of said index (fat blue arrow)

now for NCIs, if you remember your homework, that means the leaf level pages are full of index_leaf_level_entry + CI_key

so it is scanning sequentially across the NCI Leaf level (that's why there is a PageChain only at the leaf level of NCIs, so that it can navigate across)

but jumping all over the place on the HEAP, to get the data rows

but for a CI, the leaf level IS the data row (data pages, with only data rows, that's why you cannot see an "index" in them; the non-leaf-level CI pages are pure index pages containing index_entries only)

so when it SCANS the index leaf_level sequentially, using the PageChain, it is SCANNING the data sequentially, they are the same operation (fat green arrow)

no Heap

no jumping around

For comparison, then, a TABLE SCAN (MS Only): - has no PageChain on the Heap - has no choice, but to start at the beginning - and read every data page - of which many will be fragmented (contain unused space left by deleted or forwarded rows) - and others will be completely empty

The whole intent is, the optimiser had already decided, not to go for a table (heap) scan, that it could go for an Index scan (because it required LESS than the full range of data, and it could find the starting point of that data via some index). If you look at your SHOWPLAN, even for retrieving a single unique PK row, it says "INDEX SCAN". All that means is, it will navigate the B-Tree first, to find at least one row. And then it may scan the leaf level, until it finds an end point. If it is a covered query, it never goes to the data rows.

There is no substitute for a Clustered Index.

这篇关于压缩，碎片整理，回收空间，收缩数据库与收缩文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

压缩，碎片整理，回收空间，收缩数据库与收缩文件 [英] compression, defragmentation, reclaiming space, shrinkdatabase vs. shrinkfile

问题描述

相关文章

数据库最新文章

热门教程

热门工具

登录关闭

压缩，碎片整理，回收空间，收缩数据库与收缩文件 [英] compression, defragmentation, reclaiming space, shrinkdatabase vs. shrinkfile

问题描述

相关文章

数据库最新文章

热门教程

热门工具

登录 关闭

登录关闭