Scala代码不获取S3文件 [英] Scala code doesnt fetch s3 file
问题描述
我正在尝试运行EMR缩放作业,并且Scala代码假设是要获取位于S3存储桶中的文本文件的内容. scala.io.source
库弄乱了S3路径的正确位置.
I am trying to run an EMR scalding job and the Scala code is suppose to fetch the content of a text file located in an S3 bucket. The scala.io.source
library is messing up with the correct location of the S3 path.
我将参数runidfile赋予EMR作业:
I am giving the parameter runidfile to the EMR job :
--runidfile s3://my-bucket/input.txt
scala代码执行以下操作:
The scala code does the following :
val runid_path = args("runidfile")
val runid = Source.fromFile(runid_path).getLines().mkString
该代码某种程度上不接受S3路径中的"//",并且出现错误:
The code somehow doesn't accept the "//" in the S3 path and I get an error:
原因:java.io.FileNotFoundException:s3:/my-bucket/input.txt(无此类文件或目录)
在java.io.FileInputStream.open(本机方法)
在java.io.FileInputStream.(FileInputStream.java:146)
在scala.io.Source $ .fromFile(Source.scala:90)
在scala.io.Source $ .fromFile(Source.scala:75)
在scala.io.Source $ .fromFile(Source.scala:53)
com.move.scalding.userEvents.RecommenderValidator.(RecommenderValidator.scala:37)
Caused by: java.io.FileNotFoundException: s3:/my-bucket/input.txt (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.(FileInputStream.java:146)
at scala.io.Source$.fromFile(Source.scala:90)
at scala.io.Source$.fromFile(Source.scala:75)
at scala.io.Source$.fromFile(Source.scala:53)
at com.move.scalding.userEvents.RecommenderValidator.(RecommenderValidator.scala:37)
对此有任何解决方案或解决方法吗?我尝试使用Source.fromURL
,但是S3不是有效的协议,因此它不接受.
Is there any solution or a workaround to this? I tried using Source.fromURL
, but S3 is not a valid protocol so it doesn't accept it.
推荐答案
scala.io.Source
库并非旨在直接从Amazon S3访问文件.为此,您需要另一个库.
The scala.io.Source
library is not meant to access files directly from Amazon S3. You need another library for that.
您可以使用官方的 Amazon S3 Java库.这是一些示例代码(从此问题及其答案复制在一起)
You can use the offical Amazon S3 Java Library. Here is some sample code (copied together from this question and its answers)
val credentials = new BasicAWSCredentials("myKey", "mySecretKey")
val s3Client = new AmazonS3Client(credentials)
val s3Object = s3Client.getObject(new GetObjectRequest("my-bucket", "input.txt"))
val myData = Source.fromInputStream(s3Object.getObjectContent())
val runid = myData.getLines().mkString
这篇关于Scala代码不获取S3文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!