多部分上传到Amazon S3来自Apache星火 [英] Multipart uploads to Amazon S3 from Apache Spark

查看:271
本文介绍了多部分上传到Amazon S3来自Apache星火的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

保存数据时到Amazon S3我怎样才能让Apache星火使用多部分上传。使用 RDD.saveAs ...文件方法星火写入数据。当目标被以 S3N:// 自动星火使用JetS3Tt做上传,但这种失败的文件超过5G大。大文件需要使用多载,这被认为是有益的更小的文件,以及要上传到S3。在JetS3Tt与 MultipartUtils 多部分上传的支持,但星火不会在默认配置下使用此。有没有一种方法,使之使用此功能。

How can I make Apache Spark use multipart uploads when saving data to Amazon S3. Spark writes data using RDD.saveAs...File methods. when the destination is start with s3n:// Spark automatically uses JetS3Tt to do the upload, but this fails for files larger than 5G. Large files need to be uploaded to S3 using multipart upload, which is supposed to be beneficial for smaller files as well. Multipart uploads are supported in JetS3Tt with MultipartUtils, but Spark does not use this in the default configuration. Is there a way to make it use this functionality.

推荐答案

这是一个限制的 S3N ,您可以使用新的 S3A 协议来访问您的文件S3。 S3A是基于AWS-ADK库,并支持大部分的功能​​,包括多部分上传。在此链接的更多细节:

This is a limitation of s3n, you can use the new s3a protocol to access your files in S3. s3a is based on aws-adk library and support much of the features including multipart upload. More details in this link:

这篇关于多部分上传到Amazon S3来自Apache星火的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆