将SQL Server表导出到多个部件文件 [英] Exporting SQL Server Table to multiple part files

查看:122
本文介绍了将SQL Server表导出到多个部件文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要导出一个相当大的SQL Server表〜100GB到CSV文件。



我看到BCP有一个batch_size参数,但这仍然写入所有的csv文件,而不是一个单独的csv文件,它应该是多个文件,数据到单个文件?有没有其他免费的实用程序来做我需要的?在哪里可以指定文件的大小以字节或行数?



对于上下文位,这是数据可以与Hive / Hadoop平台中的其他源组合,因此如果有更好的方式导出数据打开建议。

解决方案

不幸的是,BCP的batch_size参数不能控制输出。



我做了这种类型的分割:



1 - 简单但不可重复:创建命令文件.cmd)在表的特定行范围上运行多个 BCP 。这可能需要在表上有 IDENTITY(1,1)的主键。

  bcpSELECT * FROM MyTable WHERE id BETWEEN 0 AND 10000000queryout ... 
bcpSELECT * FROM MyTable WHERE id BETWEEN 10000000 AND 20000000queryout ...
/ pre>

2 - 简单和可重复,使用了大量磁盘: BCP 将整个表导出到单个文件,并使用 split 创建任意数量的新文件,每个文件具有给定的字节数(注意:按行分割会更好想法IMO)。使用'Cygwin'(GnuWin32不再维护)安装 split 和任何其他所需的实用程序。

  bcp MyDb.MySchema.MyTable out C:\MyFile.csv -T -w 
split -b 10737418240 C:\MyFile.csv C:\MySplitFile_

生成以下文件

  C:\MySplitFile_aaa 
C:\MySplitFile_aab
...

3 - 复杂但可重复,需要可能不安全的T-SQL:使用 xp_cmdshell 函数在遍历表的存储过程中调用BCP。

  DECLARE @loop AS INT; 
- 使用WHILE根据需要循环 -
DECLARE @sql AS VARCHAR(MAX);
- 添加生成动态SQL的代码 -
DECLARE @bcp AS VARCHAR(MAX);
SELECT @ bcp ='BCP'+ @ sql +'queryout C:\MyFolder\MyFile _'+ @ loop +'。csv';

FINAL注意:如果您在数据中使用任何NVARCHAR字段您需要使用 -w 标志,并注意输出将为UTF-16LE。我会强烈建议使用 iconv (再次从Cygwin)将其转换为UTF-8,然后再尝试在Hadoop中执行任何操作。 / p>

I need to export a fairly large SQL Server table ~100GB to a CSV file. But rather than the output be a single csv file, it should ideally be multiple files say 10 files each 10GB.

I see BCP has a batch_size argument but this still writes all data to a single file? Are there other free utilities for doing what I require? Either where the size of file can be specified in bytes or number of rows?

For bit of context this is so the data can be combined with other sources in a Hive/Hadoop platform, so if there are better ways of exporting the data I'm open for suggestions.

解决方案

BCP's batch_size argument does not control the output, unfortunately.

Ways I've done this type of splitting:

1 - Simple but non-repeatable: Create a command file (.cmd) that runs a multiple BCPs over the table for specific row ranges. This probably requires an IDENTITY(1,1) based primary key on the table.

bcp "SELECT * FROM MyTable WHERE Id BETWEEN 0 AND 10000000" queryout …  
bcp "SELECT * FROM MyTable WHERE Id BETWEEN 10000000 AND 20000000" queryout …   

2 - Simple and repeatable, uses a lot of disk: BCP out the entire table to a single file and use split to create as many new files as needed with a given number of bytes in each (note: splitting by lines would be a better idea IMO). Use 'Cygwin' (GnuWin32 no longer maintained) to install split and any other utilities you want.

 bcp MyDb.MySchema.MyTable out C:\MyFile.csv -T -w  
 split -b 10737418240 C:\MyFile.csv C:\MySplitFile_  

Generates the following files

 C:\MySplitFile_aaa
 C:\MySplitFile_aab
 …

3 - Complex but repeatable, requires possibly insecure T-SQL: Use the xp_cmdshell function to call BCP inside a stored procedure that iterates through the table.

 DECLARE @loop AS INT;   
 --Use WHILE to loop as needed--   
 DECLARE @sql AS VARCHAR(MAX);   
 --Add code to generate dynamic SQL here--   
 DECLARE @bcp AS VARCHAR(MAX);   
 SELECT @bcp='BCP "'+@sql+'" queryout C:\MyFolder\MyFile_'+@loop+'.csv';   

FINAL NOTE: If you are using any NVARCHAR fields in your data then you need to use the -w flag and be aware that the output will be in UTF-16LE. I would strongly recommend converting that to UTF-8 using iconv (from 'Cygwin' again) before trying to do anything with it in Hadoop.

这篇关于将SQL Server表导出到多个部件文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆