如何在talend中推送大文件数据? [英] How to push a big file data in talend?

查看:33
本文介绍了如何在talend中推送大文件数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我创建了一个表,其中有一个大小为 7.5 GB 的文本输入文件,有 6500 万条记录,现在我想将该数据推送到 Amazon RedShift 表中.

I have created a table where I have a text input file which is 7.5 GB in size and there are 65 million records and now I want to push that data into an Amazon RedShift table.

但在处理了 560 万条记录后,它不再移动.

But after processing 5.6 million records it's no longer moving.

可能是什么问题?由于作业已运行 3 小时,因此 tFileOutputDelimited 是否有任何限制.

What can be the issue? Is there any limitation with tFileOutputDelimited as the job has been running for 3 hours.

下面是我创建的将数据推送到 Redshift 表的作业.

Below is the job which I have created to push data in to Redshift table.

tFileInputDelimited(.text)---tMap--->tFilOutputDelimited(csv)

tFileInputDelimited(.text)---tMap--->tFilOutputDelimited(csv)

|

|

tS3Put(copy output file to S3) ------> tRedShiftRow(createTempTable)--> tRedShiftRow(COPY to Temp)

tS3Put(copy output file to S3) ------> tRedShiftRow(createTempTable)--> tRedShiftRow(COPY to Temp)

推荐答案

看起来是 tFilOutputDelimited(csv) 造成了问题.在一定数量的数据之后,任何文件都无法处理.不确定想.尝试找出一种方法来仅加载父输入文件的一部分并在 redshift 中提交它.重复该过程,直到您的父输入文件被完全处理.

Looks like, tFilOutputDelimited(csv) is creating the problem. Any file can't handle after certain amount of data. Not sure thought. Try to find out a way to load only portion of the parent input file and commit it in redshift. Repeat the process till your parent input file gets completely processed.

这篇关于如何在talend中推送大文件数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆