Scala代码不获取S3文件 [英] Scala code doesnt fetch s3 file

查看:99
本文介绍了Scala代码不获取S3文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试运行EMR缩放作业,并且Scala代码假设是要获取位于S3存储桶中的文本文件的内容. scala.io.source库弄乱了S3路径的正确位置.

I am trying to run an EMR scalding job and the Scala code is suppose to fetch the content of a text file located in an S3 bucket. The scala.io.source library is messing up with the correct location of the S3 path.

我将参数runidfile赋予EMR作业:

I am giving the parameter runidfile to the EMR job :

--runidfile s3://my-bucket/input.txt

scala代码执行以下操作:

The scala code does the following :

val runid_path = args("runidfile")
val runid = Source.fromFile(runid_path).getLines().mkString

该代码某种程度上不接受S3路径中的"//",并且出现错误:

The code somehow doesn't accept the "//" in the S3 path and I get an error:

原因:java.io.FileNotFoundException:s3:/my-bucket/input.txt(无此类文件或目录)
在java.io.FileInputStream.open(本机方法)
在java.io.FileInputStream.(FileInputStream.java:146)
在scala.io.Source $ .fromFile(Source.scala:90)
在scala.io.Source $ .fromFile(Source.scala:75)
在scala.io.Source $ .fromFile(Source.scala:53)
com.move.scalding.userEvents.RecommenderValidator.(RecommenderValidator.scala:37)

Caused by: java.io.FileNotFoundException: s3:/my-bucket/input.txt (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.(FileInputStream.java:146)
at scala.io.Source$.fromFile(Source.scala:90)
at scala.io.Source$.fromFile(Source.scala:75)
at scala.io.Source$.fromFile(Source.scala:53)
at com.move.scalding.userEvents.RecommenderValidator.(RecommenderValidator.scala:37)

对此有任何解决方案或解决方法吗?我尝试使用Source.fromURL,但是S3不是有效的协议,因此它不接受.

Is there any solution or a workaround to this? I tried using Source.fromURL, but S3 is not a valid protocol so it doesn't accept it.

推荐答案

scala.io.Source库并非旨在直接从Amazon S3访问文件.为此,您需要另一个库.

The scala.io.Source library is not meant to access files directly from Amazon S3. You need another library for that.

您可以使用官方的 Amazon S3 Java库.这是一些示例代码(从此问题及其答案复制在一起)

You can use the offical Amazon S3 Java Library. Here is some sample code (copied together from this question and its answers)

val credentials = new BasicAWSCredentials("myKey", "mySecretKey")
val s3Client = new AmazonS3Client(credentials)
val s3Object = s3Client.getObject(new GetObjectRequest("my-bucket", "input.txt"))
val myData = Source.fromInputStream(s3Object.getObjectContent())

val runid = myData.getLines().mkString

这篇关于Scala代码不获取S3文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆