使用AWS Glue将很大的csv.gz(每个30-40 GB)转换为镶木地板 [英] Using AWS Glue to convert very big csv.gz (30-40 gb each) to parquet

查看：156 发布时间：2020/8/23 20:25:59 amazon-web-services aws-glue

本文介绍了使用AWS Glue将很大的csv.gz(每个30-40 GB)转换为镶木地板的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

有很多这样的问题，但似乎无济于事.我试图将相当大的csv.gz文件隐藏起来，并不断出现诸如此类的错误

There are lots of such questions but nothing seems to help. I am trying to covert quite large csv.gz files to parquet and keep on getting various errors like

'Command failed with exit code 1'

或

An error occurred while calling o392.pyWriteDynamicFrame. Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, ip-172-31-5-241.eu-central-1.compute.internal, executor 4): ExecutorLostFailure (executor 4 exited caused by one of the running tasks) Reason: Container marked as failed

.在指标监视中，我看不到太多的CPU或内存负载.有ETL数据移动，但是使用S3时应触发任何错误.

. In the metrics monitoring I don't see much of CPU or Memory load. There is ETL data movement but that should be trigger any error when working with S3.

另一个问题是，这种作业在投掷前要运行4-5个小时.这是预期的行为吗? CSV文件的大小约为30-40列.

Another problem is that such job runs 4-5 hours before throwing. Is it an expected behavior? CSV files have like 30-40 cols.

我不知道该往哪个方向走.胶水可以整体处理这么大的文件吗?

I don't know which direction to go. Can Glue overall handle such large files?

使用AWS Glue将很大的csv.gz(每个30-40 GB)转换为镶木地板 [英] Using AWS Glue to convert very big csv.gz (30-40 gb each) to parquet

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用AWS Glue将很大的csv.gz(每个30-40 GB)转换为镶木地板 [英] Using AWS Glue to convert very big csv.gz (30-40 gb each) to parquet

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭