上传海量CSV文件到SQL Server数据库 [英] Upload a Massive CSV File to SQL Server Database

查看:224
本文介绍了上传海量CSV文件到SQL Server数据库的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在SQL Server 2005数据库一个​​巨大的(16GB,65岁以上万条记录)CSV文件上传到一个表。有没有人有最好的方式任何指针做到这一点?



详细信息



我目前使用的是C#控制台应用程序(.NET Framework 2.0中)的导入文件分割成50000条记录文件,然后处理每个文件。我上传到记录使用在5000批次SqlBulkCopy类要分割文件的控制台应用程序的数据库大约需要30分钟,并上传整个数据集(65岁以上万条记录)大约需要4.5小时。生成的文件大小和批量上传大小都配置设置,而我调查同时增加的价值来提高性能。要运行应用程序,我们使用16GB内存四核服务器。此服务器也是数据库服务器。



更新



给出的答案让迄今为止,请注意,在导入之前:




  • 的数据库表将被截断,所有的索引和约束都将被丢弃

  • 的数据库收缩和磁盘空间回收



导入完成后:




  • 的索引重建



如果你能提出任何不同的方法或途径,我们可以改善现有的进口申请,我将不胜感激。谢谢你。



相关问题



下面的问题可能是用别人的处理这个问题:




  • 从平面文件



  • 解决方案



    我已经调查了改变批量大小的影响,和的大小分割文件,并发现500记录,以及20万的记录分割文件批工作最适合我的申请。利用 SqlBulkCopyOptions.TableLock 也有帮助。一看便知,为进一步此问题详细信息。



    我也看了一下使用SSIS DTS包,和一个 BULK INSERT SQL脚本。 SSIS包出现更快,但没有提供我记录无效的记录等。 BULK INSERT SQL脚本,而不是SSIS包较慢的能力,比相当快C#应用程序。这确实让我记录错误等等,因为这个原因,我从的 BULK INSERT 答案/ 15401 / concernedoftunbridgewells> ConcernedOfTunbridgeWells 中作为溶液。我知道,这可能不是每个人都面临这个问题最好的答案,但它回答我的眼前的问题。



    感谢大家谁回答。



    问候,MagicAndi


    解决方案

    BULK INSERT 从DBMS本身运行,读通过BCP控制文件从一个目录服务器上的描述文件(或安装就可以了)。编写将文件分割成小块,把它们在适当的目录执行的执行一系列批量插入的一个包装的应用程序。您可以同时在必要时运行多个线程。



    这可能是一样快批量加载得到。此外,如果有一个在批量加载文件中提供一个合适的分区键,把临时表上的分区方案。



    另外,如果你是批量加载到表具有聚簇索引,确保数据在相同的顺序排序的指标。归并排序是你的朋友对于大数据集。


    I need to upload a massive (16GB, 65+ million records) CSV file to a single table in a SQL server 2005 database. Does anyone have any pointers on the best way to do this?

    Details

    I am currently using a C# console application (.NET framework 2.0) to split the import file into files of 50000 records, then process each file. I upload the records into the database from the console application using the SqlBulkCopy class in batches of 5000. To split the files takes approximately 30 minutes, and to upload the entire data set (65+ million records) takes approximately 4.5 hours. The generated file size and the batch upload size are both configuration settings, and I am investigating increasing the value of both to improve performance. To run the application, we use a quad core server with 16GB RAM. This server is also the database server.

    Update

    Given the answers so far, please note that prior to the import:

    • The database table is truncated, and all indexes and constraints are dropped.
    • The database is shrunk, and disk space reclaimed.

    After the import has completed:

    • The indexes are recreated

    If you can suggest any different approaches, or ways we can improve the existing import application, I would appreciate it. Thanks.

    Related Question

    The following question may be of use to others dealing with this problem:

    Solution

    I have investigated the affect of altering batch size, and the size of the split files, and found that batches of 500 records, and split files of 200,000 records work best for my application. Use of the SqlBulkCopyOptions.TableLock also helped. See the answer to this question for further details.

    I also looked at using a SSIS DTS package, and a BULK INSERT SQL script. The SSIS package appeared quicker, but did not offer me the ability to record invalid records, etc. The BULK INSERT SQL script whilst slower than the SSIS package, was considerably faster than the C# application. It did allow me to record errors, etc, and for this reason, I am accepting the BULK INSERT answer from ConcernedOfTunbridgeWells as the solution. I'm aware that this may not be the best answer for everyone facing this issue, but it answers my immediate problem.

    Thanks to everyone who replied.

    Regards, MagicAndi

    解决方案

    BULK INSERT is run from the DBMS itself, reading files described by a bcp control file from a directory on the server (or mounted on it). Write an application that splits the file into smaller chunks, places them in an appropriate directory executes a wrapper that executes a series of BULK INSERTS. You can run several threads in parallel if necessary.

    This is probably about as fast as a bulk load gets. Also, if there's a suitable partitioning key available in the bulk load file, put the staging table on a partition scheme.

    Also, if you're bulk loading into a table with a clustered index, make sure the data is sorted in the same order as the index. Merge sort is your friend for large data sets.

    这篇关于上传海量CSV文件到SQL Server数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆