如何阅读从S3输入火花流EC2集群应用程序 [英] How to read input from S3 in a Spark Streaming EC2 cluster application

查看：179 发布时间：2015/12/1 10:14:39 amazon-ec2 amazon-s3 apache-spark

本文介绍了如何阅读从S3输入火花流EC2集群应用程序的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图让我的星火流媒体应用程序从S3目录阅读他的投入，但我一直有推出后得到此异常火花提交脚本：

 在线程异常主要java.lang.IllegalArgumentException：如果AWS访问密钥ID和秘密访问键必须被指定为S3N URL的用户名和密码（分别），或通过设置fs.s3n.awsAccessKeyId或fs.s3n.awsSecretAccessKey性质（分别）。
    在org.apache.hadoop.fs.s3.S3Credentials.initialize（S3Credentials.java:66）
    在org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.initialize(Jets3tNativeFileSystemStore.java:49)
    在sun.reflect.NativeMethodAccessorImpl.invoke0（本机方法）
    在sun.reflect.NativeMethodAccessorImpl.invoke（NativeMethodAccessorImpl.java:57）
    在sun.reflect.DelegatingMethodAccessorImpl.invoke（DelegatingMethodAccessorImpl.java:43）
    在java.lang.reflect.Method.invoke（Method.java:606）
    在org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod（RetryInvocationHandler.java:82）
    在org.apache.hadoop.io.retry.RetryInvocationHandler.invoke（RetryInvocationHandler.java:59）
    在org.apache.hadoop.fs.s3native。$ Proxy6.initialize（来源不明）
    在org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize（NativeS3FileSystem.java:216）
    在org.apache.hadoop.fs.FileSystem.createFileSystem（FileSystem.java:1386）
    在org.apache.hadoop.fs.FileSystem.access $ 200（FileSystem.java:66）
    在org.apache.hadoop.fs.FileSystem $ Cache.get（FileSystem.java:1404）
    在org.apache.hadoop.fs.FileSystem.get（FileSystem.java:254）
    在org.apache.hadoop.fs.Path.getFileSystem（Path.java:187）
    在org.apache.spark.streaming.StreamingContext.checkpoint（StreamingContext.scala：195）
    在MainClass $。主要（MainClass.scala：1190）
    在MainClass.main（MainClass.scala）
    在sun.reflect.NativeMethodAccessorImpl.invoke0（本机方法）
    在sun.reflect.NativeMethodAccessorImpl.invoke（NativeMethodAccessorImpl.java:57）
    在sun.reflect.DelegatingMethodAccessorImpl.invoke（DelegatingMethodAccessorImpl.java:43）
    在java.lang.reflect.Method.invoke（Method.java:606）
    在org.apache.spark.deploy.SparkSubmit $ .launch（SparkSubmit.scala：292）
    在org.apache.spark.deploy.SparkSubmit $。主要（SparkSubmit.scala：55）
    在org.apache.spark.deploy.SparkSubmit.main（SparkSubmit.scala）

我通过code这个块这里<建议的设置这些变量href="http://spark.apache.org/docs/latest/ec2-scripts.html">http://spark.apache.org/docs/latest/ec2-scripts.html （页面底部）：

  VAL SSC =新org.apache.spark.streaming.StreamingContext（
  CONF，
  秒（60））
ssc.sparkContext.hadoopConfiguration.set（fs.s3n.awsAccessKeyId，ARGS（2））
ssc.sparkContext.hadoopConfiguration.set（fs.s3n.awsSecretAccessKey，ARGS（3））

的args（2）和args（3）是我的AWS访问密钥ID，当然还有秘密访问键。

为什么口口声声说他们没有设置？

编辑：我也尝试过这种方式，但我得到了同样的异常：

  VAL线= ssc.textFileStream（S3N：//+的args（2）+：+的args（3）+@＆LT; mybucket＆GT; /路径/ ）

解决方案

奇。也尝试对 sparkContext 做 .SET 。尝试还出口ENV变量在启动应用程序之前：

 出口AWS_ACCESS_KEY_ID =＆lt;您的访问＆GT;
出口AWS_SECRET_ACCESS_KEY =＆lt;您的秘密＆GT;

^^这是我们如何做到这一点。

I'm trying to make my Spark Streaming application reading his input from a S3 directory but I keep getting this exception after launching it with spark-submit script:

Exception in thread "main" java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3n URL, or by setting the fs.s3n.awsAccessKeyId or fs.s3n.awsSecretAccessKey properties (respectively).
    at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:66)
    at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.initialize(Jets3tNativeFileSystemStore.java:49)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at org.apache.hadoop.fs.s3native.$Proxy6.initialize(Unknown Source)
    at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:216)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
    at org.apache.spark.streaming.StreamingContext.checkpoint(StreamingContext.scala:195)
    at MainClass$.main(MainClass.scala:1190)
    at MainClass.main(MainClass.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

I'm setting those variables through this block of code as suggested here http://spark.apache.org/docs/latest/ec2-scripts.html (bottom of the page):

val ssc = new org.apache.spark.streaming.StreamingContext(
  conf,
  Seconds(60))
ssc.sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId",args(2))
ssc.sparkContext.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey",args(3))

args(2) and args(3) are my AWS Access Key ID and Secrete Access Key of course.

Why it keeps saying they are not set?

EDIT: I tried also this way but I get the same exception:

val lines = ssc.textFileStream("s3n://"+ args(2) +":"+ args(3) + "@<mybucket>/path/")

解决方案

Odd. Try also doing a .set on the sparkContext. Try also exporting env variables before you start the application:

export AWS_ACCESS_KEY_ID=<your access>
export AWS_SECRET_ACCESS_KEY=<your secret>

^^this is how we do it.

这篇关于如何阅读从S3输入火花流EC2集群应用程序的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何阅读从S3输入火花流EC2集群应用程序 [英] How to read input from S3 in a Spark Streaming EC2 cluster application

问题描述

相关文章

云存储最新文章

热门教程

热门工具

登录关闭

如何阅读从S3输入火花流EC2集群应用程序 [英] How to read input from S3 in a Spark Streaming EC2 cluster application

问题描述

相关文章

云存储最新文章

热门教程

热门工具

登录 关闭

登录关闭