从S3中读取的ApacheSpark异常:Content-Length分隔的消息正文的过早结束(预期:2,250,236;已接收:16,360) [英] ApacheSpark read from S3 Exception: Premature end of Content-Length delimited message body (expected: 2,250,236; received: 16,360)

查看:250
本文介绍了从S3中读取的ApacheSpark异常:Content-Length分隔的消息正文的过早结束(预期:2,250,236;已接收:16,360)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从S3资源创建一个Apache Spark DataFrame.我曾在AWS和IBM S3 Clout对象存储上尝试过,但都失败了

I want to create an Apache Spark DataFrame from a S3 resource. I've tried on AWS and on IBM S3 Clout Object Store, both fail with

org.apache.spark.util.TaskCompletionListenerException: Premature end of Content-Length delimited message body (expected: 2,250,236; received: 16,360)

我正在使用pyspark

I'm running pyspark with

./pyspark --packages com.amazonaws:aws-java-sdk-pom:1.11.828,org.apache.hadoop:hadoop-aws:2.7.0

我正在为IBM设置S3配置

I'm setting the S3 configuration for IBM with

sc._jsc.hadoopConfiguration().set("fs.s3a.access.key", "xx")
sc._jsc.hadoopConfiguration().set("fs.s3a.secret.key", "xx")
sc._jsc.hadoopConfiguration().set("fs.s3a.endpoint", "s3.eu-de.cloud-object-storage.appdomain.cloud")

使用AWS或

sc._jsc.hadoopConfiguration().set("fs.s3a.access.key", "xx")
sc._jsc.hadoopConfiguration().set("fs.s3a.secret.key", " xx ")
sc._jsc.hadoopConfiguration().set("fs.s3a.endpoint", "s3.us-west-2.amazonaws.com")

在两种情况下,以下代码:df = spark.read.csv("s3a://drill-test/cases.csv")

In both cases the following code: df=spark.read.csv("s3a://drill-test/cases.csv")

除失败外失败

org.apache.spark.util.TaskCompletionListenerException: Premature end of Content-Length delimited message body (expected: 2,250,236; received: 16,360)

推荐答案

这可能会让您感到困惑.

This is probably very confusing for you.

以下错误:

org.apache.spark.util.TaskCompletionListenerException: Premature end of Content-Length delimited message body (expected: 2,250,236; received: 16,360)

是s3告诉您与s3通讯时出错.我的猜测是您使用的是旧版本的spark,它不知道异常是什么,它会尝试将文件作为XML错误消息重新带回.

Is s3 telling you that you have an error in communication with s3. My guess is that you are on an older version of spark that does not know what the exception is and it attempts to bring the file back as the XML error message.

通过将它们放置在您的读取调用上方并填写< aws_key> < aws_secret> ,请查看以下对您的情况有帮助的更新,和< aws_region> :

Please see the below updates that should help with your situation, by placing them above your read call and by filling in <aws_key>, <aws_secret>, and <aws_region>:

hadoop_conf = spark.sparkContext._jsc.hadoopConfiguration()
hadoop_conf.set("fs.s3a.awsAccessKeyId", "<aws_key>")
hadoop_conf.set("fs.s3a.awsSecretAccessKey", "<aws_secret>")
hadoop_conf.set("fs.s3a.impl","org.apache.hadoop.fs.s3a.S3AFileSystem")
hadoop_conf.set("fs.s3a.impl","org.apache.hadoop.fs.s3native.NativeS3FileSystem")
hadoop_conf.set("com.amazonaws.services.s3.enableV4", "true")
hadoop_conf.set("fs.s3a.aws.credentials.provider","org.apache.hadoop.fs.s3a.BasicAWSCredentialsProvider")
hadoop_conf.set("fs.s3a.endpoint", "<aws_region>.amazonaws.com")

祝你好运!

这篇关于从S3中读取的ApacheSpark异常:Content-Length分隔的消息正文的过早结束(预期:2,250,236;已接收:16,360)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆