aws严重要求(400)火花 [英] aws bad request(400) spark
问题描述
我正在尝试从s3中读取orc文件.我可以从spark-shell读取它,如下所示.
I'm trying to read orc file from s3. I'm able to read it from spark-shell as you can see below.
scala> val df = spark.read.format("orc").load("s3a://bucketname/testorc/people/")
df: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more field]
并使用以下配置运行spark-shell.
and have run spark-shell with below configuration.
--master spark://ipaddress \
--packages datastax:spark-cassandra-connector:2.0.7-s_2.11,org.apache.hadoop:hadoop-aws:2.7.4,org.apache.hadoop:hadoop-client:2.7.4,com.typesafe:config:1.2.1 \
--conf "spark.driver.memory=4g" \
--conf spark.hadoop.fs.s3a.endpoint=s3.ap-south-1.amazonaws.com \
--conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \
--conf spark.executor.extraJavaOptions=-Dcom.amazonaws.services.s3.enableV4=true \
--conf spark.driver.extraJavaOptions=-Dcom.amazonaws.services.s3.enableV4=true \
--conf spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2 \
--conf spark.speculation=false \
--conf "spark.executor.memory=3g" \
但是当我尝试使用带有水圈雾的火花从s3读取同一文件时.我收到以下错误:
But when I try to read the same file from s3 using spark with hydrosphere mist. I get the below error:
Status Code: 400, AWS Service: Amazon S3, AWS Request ID: 123456ABDGS, AWS Error Code: null, AWS Error Message: Bad Request,
下面是我的雾气火花配置.
And below is my spark configuration with mist.
mist.context-defaults.spark-conf = {
spark.master = "spark://ipaddress"
spark.default.parallelism = 3
spark.cores.max = 4
spark.executor.cores = 1
spark.driver.memory = "1g"
spark.executor.memory = "1g"
spark.cassandra.connection.host = "cassandrahost"
spark.eventLog.enabled = false
spark.sql.crossJoin.enabled = true
spark.sql.shuffle.partitions = 50
spark.hadoop.fs.s3a.endpoint=s3.ap-south-1.amazonaws.com
spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
spark.executor.extraJavaOptions="-Dcom.amazonaws.services.s3.enableV4=true"
spark.driver.extraJavaOptions="-Dcom.amazonaws.services.s3.enableV4=true"
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2
spark.speculation=false
}
读取文件的标量代码:
val df = spark.read.format("orc").load("s3a://bucketname/testorc/people/")
我在这里想念的是什么?请帮忙.
What I'm missing here? please help.
已编辑的问题
通过上下文运行选项传递外部依赖项.
Passing external dependency through context run-options.
mist.context.abc.run-options = "--packages org.apache.hadoop:hadoop-aws:2.7.4,org.apache.hadoop:hadoop-client:2.7.4,com.typesafe:config:1.2.1"
推荐答案
您需要在上下文中添加与在第一个示例中使用spark-shell
提到的相同的--packages
设置.
You need to add the same --packages
settings into your context as you mentioned in the first example with spark-shell
.
这篇关于aws严重要求(400)火花的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!