转换/转移50Gb - 如何快速完成? [英] Transform/transfer 50Gb - how to do it fast?

查看:61
本文介绍了转换/转移50Gb - 如何快速完成?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好!

我有一个50Gb数据的大表,写了som函数

清理数据并希望做这样的事情


插入新表格

选择id,func1(col1),func2(col2)
来自oldtable的
;


我也计划制作newtable partion(插入之前)。


但是我怎样才能尽可能快地插入插页?

问候

Bjorn D. Jensen

解决方案




最快的方法是使用INSERT INTO ... SELECT FROM,因为它

是一个未记录的操作


问候,


Malc


BD Jensen写道:


你好!

我有一个50Gb数据的大表,写了som函数

清理数据并想做这样的事情


插入到新表中

选择id,func1(col1),func2 (col2)
来自oldtable的
;


我也计划制作newtable partion(插入之前)。


但是我怎么能尽可能快地获得插入?

问候

Bjorn D. Jensen


BD Jensen(bj************@gmail.com)写道:


我有一张大桌子50Gb数据,写了som函数

清理数据并希望做这样的事情


插入newtable

select id, func1(col1),func2(col2)
来自oldtable的
;


我也计划制作newtable partioned(插入之前)。


但是我怎么能尽可能快地获得插入?



这些函数究竟是什么?他们是否进行数据访问?

他们是用T-SQL还是用CLR编写的?我问,因为他们可能会对性能产生很大的影响。


除此之外,还有几种可能的策略

情况。一个是SELECT INTO,但由于你打算将新的

表分区,我不认为SELECT INTO对此有好处。 (SELECT

INTO创建一个新表。)


另一种方法是使用BCP首先将表卸载到文件中。你会用

然后使用queryout或带有你的函数的视图,所以上传文件

是清理后的版本。然后使用BCP将数据加载到

新表中。这里的关键是表上应该没有索引

并且它应该是空的。在这种情况下,批量加载的记录最少。

当然,您还需要考虑索引创建

所需的时间。


最后一个选项是使用普通的INSERT。但是单个INSERT

语句对您的事务日志不利。

批处理更好,一次插入100000行,最好将数据库设置为简单恢复。你应该在旧表的聚集索引上批量处理


SELECT @start = 1

WHILE EXISTS(SELECT *来自oldtable WHERE clustercol> = @start)

BEGIN

INSERT newtable(...)

SELECT ...

来自oldtable

WHERE clustercol> = @start AND clustercol< @start - 100000

SELECT @start = @start + 100000

END


这里的实际增量取决于性质您的群集

键。如果是约会,也许一次只需要一个月就是个好主意。


如果新的表格与旧表格具有相同的聚集索引,

在运行上面的时候有聚集索引,但是等待

添加非聚集索引,直到你有数据为止。


-

Erland Sommarskog,SQL Server MVP, es****@sommarskog.se


SQL Server 2005联机丛书
http://www.microsoft.com/technet/pro...ads/books.mspx

SQL Server联机丛书2000年在
http://www.microsoft。 com / sql / prodinf ... ons / books.mspx


Mork69(ml****@bigfoot.com)写道:


最快的方法是使用INSERT INTO ... SELECT FROM,因为它/ b
是未记录的操作



这是不正确的。 INSERT SELECT FROM是一个完全记录的操作。考虑到SELECT INTO,这是一个记录最少的操作。

。也就是说,

所记录的是范围分配。 SQL Server中没有完全未记录的写操作



-

Erland Sommarskog,SQL Server MVP, es **** @ sommarskog.se


SQL Server 2005联机丛书
http://www.microsoft。 com / technet / pro ... ads / books.mspx

SQL Server 2000联机丛书
http://www.microsoft.com/sql/prodinf...ons/books.mspx


Hello!
I have an big table with 50Gb data, written som functions
that cleanup data and want to do something like this

insert into newtable
select id, func1(col1), func2(col2)
from oldtable;

I''m also plan to make newtable partioned (before insert).

But how could i get the insert as fast as possible?
Greetings
Bjorn D. Jensen

解决方案

Hi,

The quickest way to do this is to use INSERT INTO...SELECT FROM as it
is a non-logged operation

regards,

Malc

B D Jensen wrote:

Hello!
I have an big table with 50Gb data, written som functions
that cleanup data and want to do something like this

insert into newtable
select id, func1(col1), func2(col2)
from oldtable;

I''m also plan to make newtable partioned (before insert).

But how could i get the insert as fast as possible?
Greetings
Bjorn D. Jensen


B D Jensen (bj************@gmail.com) writes:

I have an big table with 50Gb data, written som functions
that cleanup data and want to do something like this

insert into newtable
select id, func1(col1), func2(col2)
from oldtable;

I''m also plan to make newtable partioned (before insert).

But how could i get the insert as fast as possible?

Exactly what is in those functions? Do they perform data access? Are
they written in T-SQL or in the CLR? I ask, because they could have a
great impact on performance.

Apart from that, there are a couple of possible strategies for this
situation. One is SELECT INTO, but since you plan to make the new
table partitioned, I don''t think SELECT INTO is good for this. (SELECT
INTO creates a new table.)

Another is to use BCP to first unload the table to a file. You would
then use queryout or a view with your functions, so what get on file
is the cleaned-up version. Then you use BCP to load the data into the
new table. The key here is that there should be no indexes on the table
and it should be empty. In this case the bulk-load is minimally logged.
Of course, you also need to account for the time it takes to create
the indexes.

And the final option is to use a plain INSERT. But a single INSERT
statement will not be good for your transaction log. It''s better to
batch and insert, say, 100000 rows at a time, preferrably with the database
set to simple recovery. You should batch on the clustered index of
the old table:

SELECT @start = 1
WHILE EXISTS (SELECT * FROM oldtable WHERE clustercol >= @start)
BEGIN
INSERT newtable (...)
SELECT ...
FROM oldtable
WHERE clustercol >= @start AND clustercol < @start - 100000
SELECT @start = @start + 100000
END

Here the actual increment would depend on the nature of your clustered
key. If it''s a date, maybe taking one month at a time is a good idea.

If new the table will have the same clustered index as the old table,
have the clustered index in place when run the above, but wait with
adding non-clustered indexes until you have the data in place.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx


Mork69 (ml****@bigfoot.com) writes:

The quickest way to do this is to use INSERT INTO...SELECT FROM as it
is a non-logged operation

This is not correct. INSERT SELECT FROM is a fully-logged operation. You are
thinking of SELECT INTO which is a minimally logged operation. That is,
all that is logged are the extent allocations. There are no write operations
in SQL Server that are entirely non-logged.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx


这篇关于转换/转移50Gb - 如何快速完成?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆