压缩,碎片整理,回收空间,收缩数据库与收缩文件 [英] compression, defragmentation, reclaiming space, shrinkdatabase vs. shrinkfile
问题描述
[1]声明:
- 从堆中删除数据时,页面上的数据不会被压缩(回收).并且如果堆页面的所有行都被删除,则通常无法回收整个页面"
- "ALTER INDEX重建和重组选项不能用于对堆中的空间进行碎片整理和回收(但是它们可以用于对堆中的非聚集索引进行碎片整理).
如果要在SQL Server 2005中对堆进行碎片整理,则有以下三种选择:- 1)在堆上创建一个聚集索引,然后删除该聚集索引;
- 2)使用SELECT INTO将旧表复制到新表;或
- 3)使用BCP或SSIS将数据从旧表移动到新表.
在SQL Server 2008中,更改了ALTER TABLE命令,因此它现在具有重建堆的能力."
请解释一下:
MS SQL Server 2005中的压缩,碎片整理,回收空间,收缩文件和收缩数据库之间有什么区别?
收缩文件和收缩数据库在MS SQL Server 2005中完成什么工作?
更新:
该问题的灵感来自于[2]中的讨论-如何在MS SQL Server 2005中缩小数据库?
Update2 :@PerformanceDBA,
ga!您在短短一周内就获得了500多种收益.这太了不起了!
您的图表
再次感谢您抽出宝贵的时间.
我待会再问,不在这里.
内在并不是我最主要的事情,也不是最简单的事情.
它非常简洁,通常颂歌不会引起任何疑问或疑问.
我希望使用一些工具,描述/说明,技巧来提出自己的疑问,问题和讨论.
例如,Plz看到了我的问题:
- http://www.sqlservercentral.com/Forums/Topic1013693 -373-2.aspx#bm1014385
- http://www.sqlservercentral.com/Forums/Topic1013975-373 -1.aspx
它们基本上是我所要求的内容的重复内容,但不能在stackoverflow.com中进行讨论
Update3 :@PerformanceDBA,
再次感谢,我的问题的主要目的是确定解决具体问题的方法(基于和避免什么),这些文档,文章,讨论,答案等相互矛盾,有助于您发现问题.
目前,我在这一领域没有其他问题(无法解决并阻止我).
[1]
布拉德·麦基(Brad McGehee).布拉德肯定的索引指南
(2009年6月11日)
http://www.simple-talk .com/sql/database-administration/brads-sure-guide-to-indexes/
[2]
对问题的答案和反馈
将数据库缩小到其初始大小以下"
https://stackoverflow.com/questions/3543884/shrink-一个数据库低于其初始大小/3774639#3774639
一个多月没有人碰过.
Gail Shaw:我怀疑这些内部结构是否在任何地方都有记录.毕竟,我们使用的是未记录的功能.索引的定义取决于您询问的人和外观.
ROTFLMAO.我的侧面受伤了,我看不清其后的帖子.这些应该是聪明的人吗?在IT世界中工作?定义CHANGE?一天中的温度或时间如何?那就是SQL Server Central?不是偏僻地区吗?
当MS从Sybase窃取SQL Server时,该文档非常可靠.当然,在每个主要发行版中,他们都重写"了该文档,文档变得越来越虚弱和蓬松(回想一下我们在另一篇文章中的讨论).现在,我们拥有可爱的图片,从技术上来说,这些图片会让人们感觉很好,但不准确.这就是为什么像你这样认真的人有问题的原因.图片甚至与手册中的文字不符.
无论如何,定义不会改变.那就是定义的定义.它们在任何情况下都是正确的.嗯,您正在使用的um功能是已记录的普通功能.自1987年以来一直如此.除非MS在某个地方丢失了它,而且没人能找到它.您必须问一个过去的Sybase Guru,谁记得他们获取的代码中确切的数据结构.而且,如果您真的很幸运,他将及时了解MoronSociety在2000年,2005年和2008年引入的差异.他甚至可能只有一个准确的图表,可以与您的机器上的sysindexes和DBCC的输出相匹配.如果找到他,请亲吻他的戒指,并给他淋浴.锁上你的女儿.
(不严重,我的身边正在杀死我,欢乐满溢).
现在您知道为什么我不回答有关宇宙中混乱垃圾的问题了吗??在MoronSociety中,只有这么多白痴.
-----
再次锁定:
扫描:
索引扫描是对索引中所有叶子页的完整读取.当对聚集索引进行索引扫描时,是除名称之外的所有表扫描.
当查询处理器完成索引扫描时,无论是否返回所有行,它始终是索引中所有叶子页的完整读取.这绝不是部分扫描.
扫描不仅涉及读取索引的叶级别,还读取较高级别的页面作为索引扫描的一部分."
一定有一个原因使她以快风而得名.她写书"吗?是的,幻想小说.热风是给气球爱好者而不是IT专业人员的.
完整和完整的摘要.索引扫描的全部要点以及为什么它适合表扫描,因为它试图避免表扫描,因此: -引擎(执行查询树)可以直接转到索引(此时为聚集或非聚集) -导航到B树以找到开始的地方(到现在为止,这与获得几行(即不扫描)大体相同) -B树(从任何良好的TECHNICAL图表来看)只有几页,每页包含很多索引条目,因此它非常快 -这是根加非叶水平 -直到找到符合条件的叶级条目 -从那时起,它会依次通过该索引的LEAF级别进行扫描(胖蓝色箭头)
- 现在是NCI了,如果您还记得功课,那意味着叶子级页面上充满了index_leaf_level_entry + CI_key
- 所以它在NCI叶级别上顺序扫描(这就是为什么仅在NCI的叶级别上有一个PageChain,以便它可以浏览的原因)
-
但是在HEAP上到处跳转以获取数据行
-
,但对于CI,叶级是数据行(数据页,只有数据行,这就是为什么您看不到其中的索引"的原因;非叶级CI页是纯索引)仅包含index_entries的页面)
- 因此,当它使用PageChain顺序地对索引leaf_level进行扫描时,它会顺序地扫描数据,它们是相同的操作(绿色绿色箭头)
- 没有堆
- 不跳来跳去
为进行比较,然后进行表扫描(仅适用于MS): -堆上没有PageChain -别无选择,只能从头开始 -并读取每个数据页 -其中许多将分散(包含已删除或转发的行留下的未使用空间) -而其他人将完全是空的
整个意图是,优化器已经决定不进行表(堆)扫描,而是可以进行索引扫描(因为它需要的数据少于整个数据范围,并且可以找到数据通过某个索引的起点).如果您查看SHOWPLAN,即使是检索一个唯一的PK行,它也会显示"INDEX SCAN".这意味着,它将首先导航B-树,以找到至少一行.然后,它可能会扫描叶级别,直到找到终点为止.如果是覆盖查询,则永远不会转到数据行.
集群索引无可替代.
[1] states:
- "When data is deleted from a heap, the data on the page is not compressed (reclaimed). And should all of the rows of a heap page are deleted, often the entire page cannot be reclaimed"
- "The ALTER INDEX rebuild and reorganize options cannot be used to defragment and reclaim space in a heap (but they can used to defragment non-clustered indexes on a heap).
If you want to defragment a heap in SQL Server 2005, you have three options:- 1) create a clustered index on the heap, then drop the clustered index;
- 2) Use SELECT INTO to copy the old table to a new table; or
- 3) use BCP or SSIS to move the data from the old table to a new table.
In SQL Server 2008, the ALTER TABLE command has been changed so that it now has the ability to rebuild heap"
Plz explain me:
What are the difference between compression, (de)fragmentation, reclaiming the space, shrinkfile and shrinkdatabase in MS SQL Server 2005?
What does shrinkfile and shrinkdatabase accomplish in MS SQL Server 2005?
Update:
The question was inspired by discussion in [2] - how to shrink database in MS SQL Server 2005?
Update2: @PerformanceDBA,
Congats! You've gained over 500+ in just a week. This is remarkable!
Your diagram
Thanks, once more, for your time.
I shall ask later and not here.
Internals is not my primary preoccupation and not easiest one.
It is very succinct and generally odes not invoke any doubts or questions.
I'd prefer some tool, descriptions/instructions, technique around which to develop my doubts, question and discussion.
Plz see, for ex., my questions:
- http://www.sqlservercentral.com/Forums/Topic1013693-373-2.aspx#bm1014385
- http://www.sqlservercentral.com/Forums/Topic1013975-373-1.aspx
They are basically duplicates of what I asked but cannot discuss in stackoverflow.com
Update3: @PerformanceDBA,
thanks, once more, the main purpose of my questions was to determine the ways how to resolve concrete questions (basing on as well as avoiding what ) having contradictory docs, articles, discussions, answers, etc. which you helped to detect.
Currently I do not have further (unresolvable and blocking me) questions in this area.
[1]
Brad McGehee. Brad's Sure Guide to Indexes
(11 June 2009)
http://www.simple-talk.com/sql/database-administration/brads-sure-guide-to-indexes/
[2]
Answers and feedback to question
"Shrink a database below its initial size"
https://stackoverflow.com/questions/3543884/shrink-a-database-below-its-initial-size/3774639#3774639
No one touched this in over one month.
The answers to the first three are actually in the Diagram I Made for You, which you have not bothered to digest and ask questions about ... it is often used as a platform for discussion.
(That is a condensed version of my much more elaborate Sybase diagrams, which I have butchered for the MS context. There is a link at the bottom of that doc, if you want the full Sybase set.)
Therefore I am not going to spend much time on you either. And please do not ask for links to "reference sites", there ain't no such thing (what is available is non-technical rubbish), which is precisely why I have my own diagrams; there are very few people who understand MS SQL Internals.
reclaiming the space
That is the correct term. MS does not remove deleted rows from the page, or deleted pages from the extent. Reclaiming the space is an operation that goes through the Heap and removes the unused (a) rows and (b) pages. Of course that changes the RowIds, so all Nonclustered indices have to be rebuilt.
compression
In the context of the pasted text: same as Reclaiming space.
defragmentation
the operation of full scale removal of unused space. There are three Levels:
I. Database (AllocationUnits), across all objects
II. Object (Extent & Page), Page Chains, Split Pages, Overflow Pages
III. Heap Only (No Clustered index), the subject of the post
shrinkfile
Quite a different operation, to reduce the space allocated on a Device (File). This removes unused AllocationUnits (hence 'shrink') but it is not the same a de-fragmenting AllocationUnits.
shrinkdatabase
To do the same for a Database; All Devices Allocations used by the database across all Devices.
Response to Comments
The poster at SSC is clueless and does not address your question directly.
- there is no such thing as a Clustered table (CREATE CLUSTERED TABLE fails)
- there is such a thing as a Clustered index (CREATE CLUSTERED INDEX succeeds)
- as per my diagrams, it is a single physical structure; the clustered index INCLUDES the rows and thus the Heap is eliminated
- where there is no Clustered index, there are two physical structures: a Heap and a separate Nonclustered Index
Now before you go diving into them with DBCC, which is too low level, and clueless folks cannot identify, let alone explain, the whys and wherefores, you need to understand and confirm the above:
- create a Table_CI (we are intending to add a CI, there is still no such thing as a Clustered Table)
- add an unique clustered index to it UC_PK
add a few rows
create a table Heap
- add an unique Nonclustered index to it NC_PK
add a few rows
SELECT * FROM sysindexes WHERE id = OBJECT_ID("Table_CI")
SELECT * FROM sysindexes WHERE id = OBJECT_ID("Heap")
note that each sysindexes entry is a complete, independent, data storage structure (look at the columns)
- contemplate the output
- compare with my diagram
- compare with the rubbish in the universe
In future, I will not answer questions about the confused rubbish in the universe, and the incorrect and misinformed posts on other sites (I do not care if they are MS Certified Professionals, they have proved that they are incapable of inspecting their databases and determining the correct information)
There is a reason I have bothered to create accurate diagrams (the manuals, pictures, and all available info fro MS, is all rubbish; no use for you to look for accurate info from the :authority", because the "authority" is technically bankrupt).
Even Gail finally gets around to I suspect you'd benefit from more reading on overall architecture of indexes before fiddling with the low level internals.
Except, there isn't any. That are not confusing, non-technical, and inconsistent.
There is a reason I have bothered to create accurate diagrams.
Back to the DBCCs. Gail is simply incorrect. In a Clustered Index (which includes the rows), the single page contains rows. Yes, rows. that is the leaf level of the index. There is a B-tree, it lives in the top of the page, but it is so small and you can't see it. Look at the sysindexes output. The root and firstpage pointer IS pointing to the page; that is the root of the Clustered Index. When you dive into the ocean, you need to know what to look for, AND where to find it, otherwise you won't find what you are looking for, and you will get distracted by the flotsam and jetsam that you do find by accident.
Now look at the TWO SEPARATE STRUCTURES for the NCI and the Heap.
Oh, and MS has changed from using the OAM terminology to the IAM where the data structure is an index. That introduces confusion. In terms of data structures (entries in sysindexes), they are all Objects; they might or might not be Indices). The point is, who cares, we know what it is, it is an ObjectAllocationMap ... if you are looking at at NCI, gee, it is an IndexObjectAllocationMap; if you are looking at a Heap, it is a HeapObjectAllocMap. I will let you ponder what it is in the case of a CI. In chasing it down, or in using it (finding the pages that belong to the OBJECT, it does not matter, they are all Objects. When doing that, you need to know, some objects have a PageChain and others do not (another of your questions). CIs have them; NCIs and Heaps do not.
Gail Shaw: "I doubt these kinds of internals are documented anywhere. After all, we're using undocumented features. Definition of index depends who you ask and where you look.
ROTFLMAO. My sides hurt, I could not read the posts that followed it. These are supposed to be intelligent human beings ? Working in the IT world ? Definitions CHANGE ? What with the temperature or the time of day ? And that was SQL Server Central ? Not the backwoods ?
When MS stole SQL Server from Sybase, the documentation was rock solid. Of coure, with each major release, they "rewrite" it, and the docs get weaker and more fluffy (recall our discussion in another post). Now we have cute pictures that make people feel good but are inaccurate, technically. Which is why earnest people like you have problems. the pictures do not even match the text in the manuals.
Anyway, DEFINITIONS do not change. That's the definition of definitions. They are true in any context. And Um, the um feature you are using is an ordinary, documented feature. Has been since 1987. Except MS lost it somewhere and no one can find it. You'll have to ask a Sybase Guru who was around in the old days, who remembers what exact data structures were in the code that they acquired. And if you are really lucky, he will be up to date with the differences that MoronSociety has introduced in 2000, 2005, 2008. He might even have a single accurate diagram that matches the output of sysindexes and DBCC on your box. If you find him, kiss his ring and shower him with gold. Lock up your daughters.
(not serious, my sides are killing me, the mirth is overflowing).
Now do you see why I will not answer questions about the confused rubbish in the universe ? There are just SO MANY morons out there in MoronSociety.
-----
Gail again:
"Scans:
An index scan is a complete read of all of the leaf pages in the index. When an index scan is done on the clustered index, it’s a table scan in all but name.
When an index scan is done by the query processor, it is always a full read of all of the leaf pages in the index, regardless of whether all of the rows are returned. It is never a partial scan.
A scan does not only involve reading the leaf levels of the index, the higher level pages are also read as part of the index scan."
There must be a reason she is named after fast wind. She writes "books" ? Yeah, fantasy novels. Hot air is for balloonists not IT professionals.
Complete and total drivel. The whole point of an Index Scan AND WHY IT IS PREFERABLE TO A TABLE SCAN, because it is trying to AVOID A TABLE SCAN, is that: - the engine (executing the query tree) can go directly to the Index (Clustered or Nonclustered, at this point) - navigate the B-Tree to find the place to start (which up to this point, is much the same as when it is getting a few rows, ie. not scanning) - the B-Tree (from any good TECHNICAL diagram) is a few pages, containing many, many index entries per page, so it is very fast - that's the root plus non-leaf levels - until it find a leaf-level entry that qualifies - from that point on, it does a SCAN, sequentially, through the LEAF level of said index (fat blue arrow)
- now for NCIs, if you remember your homework, that means the leaf level pages are full of index_leaf_level_entry + CI_key
- so it is scanning sequentially across the NCI Leaf level (that's why there is a PageChain only at the leaf level of NCIs, so that it can navigate across)
but jumping all over the place on the HEAP, to get the data rows
but for a CI, the leaf level IS the data row (data pages, with only data rows, that's why you cannot see an "index" in them; the non-leaf-level CI pages are pure index pages containing index_entries only)
- so when it SCANS the index leaf_level sequentially, using the PageChain, it is SCANNING the data sequentially, they are the same operation (fat green arrow)
- no Heap
- no jumping around
For comparison, then, a TABLE SCAN (MS Only): - has no PageChain on the Heap - has no choice, but to start at the beginning - and read every data page - of which many will be fragmented (contain unused space left by deleted or forwarded rows) - and others will be completely empty
The whole intent is, the optimiser had already decided, not to go for a table (heap) scan, that it could go for an Index scan (because it required LESS than the full range of data, and it could find the starting point of that data via some index). If you look at your SHOWPLAN, even for retrieving a single unique PK row, it says "INDEX SCAN". All that means is, it will navigate the B-Tree first, to find at least one row. And then it may scan the leaf level, until it finds an end point. If it is a covered query, it never goes to the data rows.
There is no substitute for a Clustered Index.
这篇关于压缩,碎片整理,回收空间,收缩数据库与收缩文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!