查询表包含10,000,000条记录,大小超过4GB [英] Query on Table containing 10,000,000 records and size more than 4GB

查看:138
本文介绍了查询表包含10,000,000条记录,大小超过4GB的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好,



我使用的是sql2008 R2。我的一张表包含超过10,000,000条记录,数据大小约为4.2 GB。此表包含有关库存及其属性的数据。

select * for table query需要27分钟检索数据。



我正在使用此表与内部联接与另一个包含每日交易的表。目标是找到历史数据,每项消耗和趋势的平均使用情况。因此我无法在查询中使用过滤器用于商业目的。(尽管如果使用过滤器,查询会在更短的时间内返回数​​据)



表包含主键聚簇索引以及否在内连接中使用的列上的聚簇索引,我还根据sql执行计划添加了其他索引。



我的问题是,我们有什么可以在一分钟内加载大数据(大于3 GB)?我应该对我的表进行分区,但它最终会选择相同的数据。

(抱歉愚蠢的问题,我理解索引,查询优化部分。但我的问题是大量数据。)





我的机器是双核,8 GB RAM

表统计如下(使用sp_spaceused)

行:10157697

保留:4268752 KB

数据:4026432 KB

index_size:240720 KB




感谢您花时间阅读我的问题。

Hello ,

I use sql2008 R2. One my my table contains more than 10,000,000 records with data size around 4.2 GB.This table Contains data about inventories and its properties.
A select * for table query takes 27 Minutes Retrieve data.

I am using this table with inner joins with another table which contains daily transaction. Goal it to find the Average Usage on historical data ,consumption per item and trend. hence i cannot use filters on query for business purpose.(although if filters used, query returns data in less time)

Table contain primary key clustered index as well as no clustered index on columns which are used in inner join, i have also added additional indexes as per sql execution plan.

My Question is, is there any by which we can load large data(greater the 3 GB ) within a minute ? Shall i partitions my table , but it will ultimately select the same data.
(Sorry for dumb question, i do understand indexes, and query optimization part. But my problem is with huge data.)


My Machine is dual core with 8 GB RAM
Table Stats as follows(using sp_spaceused)
rows:10157697
reserved: 4268752 KB
data:4026432 KB
index_size:240720 KB


Thanks for taking time to read my question.

推荐答案

您可以做一些事情do(我到目前为止处理超过560.000.000条记录的历史数据)。

1.使您的表分区,可能在某些日期列。这可以帮助SQL仅在特定的时间范围内扫描索引页。

2.仔细选择您的聚簇索引。关于聚簇索引的特殊之处在于表示索引的所有数据都存储在索引页面上,因此不需要额外的查找来扫描它们。因此,如果使用已定义的列序列进行大量连接,请将该序列设置为聚簇索引(但请记住保持索引尽可能小)。

3.为每个外键创建索引你在桌面上使用。

4.在索引页面添加更多列(使用CREATE INDEX ... INCLUDE)。它可以帮助您检索最少(和最重要)的数据,甚至无需读取单个数据页。

5.考虑按步骤进行查询。这将降低SQL查询执行计划的复杂性,并可以节省时间...



[更多]

其他我刚刚得到的想法(我之前没有使用过)是运行每日摘要,为您创建必要的使用数据。我相信在大多数情况下平均值不是最新的到最后一秒......
There are a few things you can do (and I deal with historical data with over 560.000.000 records so far).
1. Make your table partitioned, probably over some column of date. That can help SQL to scan index pages only for a specific time frame.
2. Choose carefully your clustered index. The special about clustered index is that all the data representing the index is stored on the index pages so no additional look-up need to scan them. So if you do a lot of joins using a defined sequence of columns, make that sequence into clustered index (but remember to keep the index as small as possible).
3. Create index for every foreign key you use on the table.
4. Add more columns to the index pages (using CREATE INDEX ... INCLUDE). It can help you to retrieve minimal (and most important) data without even reading a single data page.
5. Consider to do your query in steps. That will reduce the complexity of the execution plan of your SQL query and can save time...

[some more]
An other idea I just got (I didn't used it before) is to run daily summaries that creates for you the necessary usage data. I believe in most cases the average has not to be up to date to the last second...


Amol_B写道:

我的问题是,有没有我们可以在一分钟内加载大数据(大于3 GB)?

(...)

用于表查询的 select * 需要27分钟检索数据

My Question is, is there any by which we can load large data(greater the 3 GB ) within a minute ?
(...)
A select * for table query takes 27 Minutes Retrieve data





不,没有方法在一分钟内加载10百万条记录即使你添加索引,分区你的桌子等等。



数据库程序员的主要罪恶是使用



No, there is no method to load 10 millions records within one minute even if you add indexes, make partitioned your table, etc.

The main sin of database programmers is to use

SELECT *
FROM TableName



永远不要使用它!



使用:


Never use it!

Use:

SELECT <field_list>
FROM TableName



甚至更好:


or even better:

SELECT <field_list>
FROM TableName
WHERE <condition>





请阅读查询效果 [ ^ ]和数据库性能等。在此站点的右上角使用 SearchBox



Please, read about Query performance[^] and Database performance, etc. Use SearchBox on the right-top corner of this site.


我在这里看到两个错误。

首先,每个项目的平均使用和消费是数据汇总。

过滤和聚合是数据库擅长的。那么为什么你想要从数据库中移出一大堆数据呢?在查询中进行聚合。

我们有一个数据库,其中有几个表有超过2亿行,其中600万行的聚合在不到6秒的时间内完成。但结果集只有几kb。

随机数据需要线性时间,移动4.2 GB的数据并不是一个可以在不升级硬件的情况下优化的过程。



其次,如果您有二级索引,请不要使用聚簇主键。我不打算解释为什么Marcus Winand是这样做 [ ^ ]好多了。



< edit>还考虑使用索引视图,大多数聚合都可以用在索引视图。我现在可以考虑的例外是Min()和Max()< / edit>
I'm seeing two errors here.
Firstly average usage and consumption per item is aggregation of data.
Filtering and aggregation is what the database excels at. So why on earth do you want to move the whole bunch of data off the database. Do your aggregation in the query.
Were having a database where several tables have more than 200 million rows where an aggregate on a pivot of 6 million rows is done in less than 6 seconds. But the resultset is only a few kb.
Shuffling data takes linear time and moving 4.2 GB of data is not a process you can optimize without upgrading your hardware.

Secondly, never use a clustered primary key if you have secondary indexes. I'm not going to explain why as Marcus Winand is doing it[^] so much better.

<edit>Also consider using an indexed view, most aggregates can be used in an indexed view. The exceptions I can think about at the moment is Min() and Max()</edit>


这篇关于查询表包含10,000,000条记录,大小超过4GB的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆