如何在talend中推送大文件数据? [英] How to push a big file data in talend?
问题描述
我创建了一个表,其中有一个大小为 7.5 GB 的文本输入文件,有 6500 万条记录,现在我想将该数据推送到 Amazon RedShift 表中.
I have created a table where I have a text input file which is 7.5 GB in size and there are 65 million records and now I want to push that data into an Amazon RedShift table.
但在处理了 560 万条记录后,它不再移动.
But after processing 5.6 million records it's no longer moving.
可能是什么问题?由于作业已运行 3 小时,因此 tFileOutputDelimited 是否有任何限制.
What can be the issue? Is there any limitation with tFileOutputDelimited as the job has been running for 3 hours.
下面是我创建的将数据推送到 Redshift 表的作业.
Below is the job which I have created to push data in to Redshift table.
tFileInputDelimited(.text)---tMap--->tFilOutputDelimited(csv)
tFileInputDelimited(.text)---tMap--->tFilOutputDelimited(csv)
|
|
tS3Put(copy output file to S3) ------> tRedShiftRow(createTempTable)--> tRedShiftRow(COPY to Temp)
tS3Put(copy output file to S3) ------> tRedShiftRow(createTempTable)--> tRedShiftRow(COPY to Temp)
推荐答案
看起来是 tFilOutputDelimited(csv) 造成了问题.在一定数量的数据之后,任何文件都无法处理.不确定想.尝试找出一种方法来仅加载父输入文件的一部分并在 redshift 中提交它.重复该过程,直到您的父输入文件被完全处理.
Looks like, tFilOutputDelimited(csv) is creating the problem. Any file can't handle after certain amount of data. Not sure thought. Try to find out a way to load only portion of the parent input file and commit it in redshift. Repeat the process till your parent input file gets completely processed.
这篇关于如何在talend中推送大文件数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!