在SQL Server 2008中插入/更新大量数据的最佳做法 [英] Best practices for inserting/updating large amount of data in SQL Server 2008
问题描述
我正在建立一个通过各种CSV Feed更新大量数据的系统。通常我只是循环通过feed中的每一行,做一个select查询来检查项目是否已经存在,并插入/更新项目,如果它存在与否。
我觉得这种方法不是很可扩展,可以在更大的Feed上敲击服务器。我的解决方案是像正常循环项目,但将它们存储在内存中。然后对于每100个左右的项目做一个选择在这100个项目,并得到一个列表在数据库中匹配的现有项目。然后将insert / update语句连接在一起,并将它们运行到数据库中。这将基本上减少对数据库的访问。
这是一个可扩展性足够的解决方案,是否有任何有关将大型Feed导入生产环境的示例教程?
感谢
看到您正在使用SQL Server 2008,我会推荐这种方法:
- 首先将CSV文件批量复制到暂存表中
- 使用MERGE命令从暂存表更新目标表
基本上,您创建
- 行匹配,例如该行同时存在于源表和目标表中 - >通常您会更新一些字段,或者只是将其全部忽略。
- 源中的行不存在目标 - >通常是INSERT的情况
您将有 MERGE
语句如下:
MERGE TargetTable AS t
使用SourceTable AS src
ON t。 PrimaryKey = src.PrimaryKey
当不匹配时
INSERT(字段列表)
VALUES(值列表)
当匹配时
UPDATE
SET(SET语句列表)
;
当然, ON
如果需要更多的参与。当然, WHEN
语句也可以更复杂,例如
WHEN MATCHED AND(some other condition)THEN ......
等等。 / p>
MERGE
是一个非常强大和非常有用的新命令在SQL Server 2008 - 使用它,如果你可以! / p>
I'm building a system for updating large amounts of data through various CSV feeds. Normally I would just loop though each row in the feed, do a select query to check if the item already exists and insert/update an item depending if it exists or not.
I feel this method isn't very scalable and could hammer the server on larger feeds. My solution is to loop through the items as normal but store them in memory. Then for every 100 or so items do a select on those 100 items and get a list of existing items in the database that match. Then concatenate the insert/update statements together and run them into the database. This would essentially cut down on the trips to the database.
Is this a scalable enough solution and are there any example tutorials on importing large feeds into a productive environment?
Thanks
Seeing that you're using SQL Server 2008, I would recommend this approach:
- first bulkcopy your CSV files into a staging table
- update your target table from that staging table using the MERGE command
Check out the MSDN docs and a great blog post on how to use the MERGE command.
Basically, you create a link between your actual data table and the staging table on a common criteria (e.g. a common primary key), and then you can define what to do when
- the rows match, e.g. the row exists in both the source and the target table --> typically you'd either update some fields, or just ignore it all together
- the row from the source doesn't exist in the target --> typically a case for an INSERT
You would have a MERGE
statement something like this:
MERGE TargetTable AS t
USING SourceTable AS src
ON t.PrimaryKey = src.PrimaryKey
WHEN NOT MATCHED THEN
INSERT (list OF fields)
VALUES (list OF values)
WHEN MATCHED THEN
UPDATE
SET (list OF SET statements)
;
Of course, the ON
clause can be much more involved if needed. And of course, your WHEN
statements can also be more complex, e.g.
WHEN MATCHED AND (some other condition) THEN ......
and so forth.
MERGE
is a very powerful and very useful new command in SQL Server 2008 - use it, if you can!
这篇关于在SQL Server 2008中插入/更新大量数据的最佳做法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!