将数据从关系数据库移动到Blob服务时,MergeFiles复制行为 [英] MergeFiles copy behavior when move data from relation database to blob service

查看:75
本文介绍了将数据从关系数据库移动到Blob服务时,MergeFiles复制行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好,

我需要将数据从PostgreSQL复制到Blob服务存储.由于某些表太大(大约2500万条记录),因此我无法在一个复制活动的范围内复制完整数据.原因是ADF复制活动失败 带有j ava.lang.OutOfMemoryError:Java堆空间 消息.为避免这种情况,我创建了一个管道,该管道从表中批量加载数据.但是,这样做的缺点 方法是为每个批处理副本创建一个单独的文件. (例如,如果我有两个批处理,则会创建两个文件tablename_part1.parquet和tablename_part2.parquet).问题是,我必须根据要求将所有数据复制到单个文件中. 为了达到此要求,我尝试使用MergeFiles复制行为,但是出现下一个错误:

I have got a requirement to copy data from PostgreSQL to the Blob Service Storage. Since some tables are too large (in about 25000000 records), I am not able to copy full data in scope of one copy activity. The reason of this is that ADF copy activity fails with java.lang.OutOfMemoryError:Java heap space message. To avoid this, I have created a pipeline, which loads data from table in batches. However, the downside of this approach is a creation of separate file for each batch copy. (For example, if I have two batches, than two files tablename_part1.parquet and tablename_part2.parquet are created). The problem, that I have to copy all data to a single file by the requirement. To achieve this requirement, I have tried to use MergeFiles copy behavior, however I have got next error:

" ErrorCode = UserErrorFormatRequiredWithCopyBehaviorMergeFiles,'Type = Microsoft.DataTransfer.Common.Shared.HybridDeliveryException ,Message =两个源上都需要设置格式 并沉入"MergeFiles"复制行为.,Source = Microsoft.DataTransfer.ClientLibrary, "

"ErrorCode=UserErrorFormatRequiredWithCopyBehaviorMergeFiles,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Format setting is required on both source and sink for 'MergeFiles' copy behavior.,Source=Microsoft.DataTransfer.ClientLibrary, "


有人可以建议ADF问哪种格式吗? (源类型在CopyActivity中指定为RelationSource).

Could someone suggest about which format ADF asking me? (Source type is specified as RelationSource in CopyActivity).

还有其他方法可以将多个复制活动的结果合并到一个文件中吗?

Are there any other ways to merge results from several copy activities into one file?

提前谢谢!

推荐答案

对于合并复制行为,需要同时指定源和接收器的格式设置.由于合并行为基于表格数据. ADF将根据指定的源格式设置读取和反序列化数据,然后根据指定的序列化 接收器格式设置,然后将数据写入目标一个文件.

For Merge Copy Behavior, it is required to specify format setting for both source and sink. As the merge behavior is based on tabular data. ADF will read and deserialize data based on the specified source format settings, then serialize based on the specified sink format settings then write data into the target one file. 

曾经为从PostgreSQL到Azure Blob的复制活动批量指定接收器格式设置.如果是,请为第二个合并副本指定相同的格式设置.如果没有,则可以指定默认格式设置TextFormat继续.

Have ever specify sink format setting for the copy activity from PostgreSQL to Azure Blob in batches. If yes, please specify the same format setting for the second merge copy; if not, you could specify the default format setting TextFormat to proceed.

请参考 有关TextFormat的更多详细信息.

Please refer here for more details on TextFormat.


这篇关于将数据从关系数据库移动到Blob服务时,MergeFiles复制行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆