星火:从S3使用Scala读取csv文件 [英] Spark: read csv file from s3 using scala

查看:1787
本文介绍了星火:从S3使用Scala读取csv文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写一个火花的工作,尝试使用Scala读取文本文件,下面的工作正常在我的本地机器上。

  VAL MYFILE =myLocalPath / myFile.csv
  对(线474;  -  Source.fromFile(MYFILE).getLines()){
    VAL数据= line.split()
    myHashMap.put(数据(0),数据(1).toDouble)
  }
 

然后我试图使它在AWS上的工作,我做了以下,但它似乎没有正确读取整个文件。应该用什么有道阅读S3这样的文本文件?非常感谢!

  VAL凭据=新BasicAWSCredentials(的myKey,mySecretKey);
VAL s3Client =新AmazonS3Client(凭证);
VAL s3Object = s3Client.getObject(新GetObjectRequest(myBucket,myFile.csv));

VAL读卡器=新的BufferedReader(新的InputStreamReader(s3Object.getObjectContent()));

变种线=
而((行= reader.readLine())!= NULL){
      VAL数据= line.split()
      myHashMap.put(数据(0),数据(1).toDouble)
      的println(线);
}
 

解决方案

我想我得到了它的工作如下图所示:

  VAL s3Object = s3Client.getObject(新GetObjectRequest(myBucket,mypath中/ myFile.csv));

    VAL myData的= Source.fromInputStream(s3Object.getObjectContent())。getLines()
    对(线474;  -  myData的){
        VAL数据= line.split()
        myMap.put(数据(0),数据(1).toDouble)
    }

    的println(我的地图+ myMap.toString())
 

I am writing a spark job, trying to read a text file using scala, the following works fine on my local machine.

  val myFile = "myLocalPath/myFile.csv"
  for (line <- Source.fromFile(myFile).getLines()) {
    val data = line.split(",")
    myHashMap.put(data(0), data(1).toDouble)
  }

Then I tried to make it work on AWS, I did the following, but it didn't seem to read the entire file properly. What should be the proper way to read such text file on s3? Thanks a lot!

val credentials = new BasicAWSCredentials("myKey", "mySecretKey");
val s3Client = new AmazonS3Client(credentials);
val s3Object = s3Client.getObject(new GetObjectRequest("myBucket", "myFile.csv"));

val reader = new BufferedReader(new InputStreamReader(s3Object.getObjectContent()));

var line = ""
while ((line = reader.readLine()) != null) {
      val data = line.split(",")
      myHashMap.put(data(0), data(1).toDouble)
      println(line);
}

解决方案

I think I got it work like below:

    val s3Object= s3Client.getObject(new GetObjectRequest("myBucket", "myPath/myFile.csv"));

    val myData = Source.fromInputStream(s3Object.getObjectContent()).getLines()
    for (line <- myData) {
        val data = line.split(",")
        myMap.put(data(0), data(1).toDouble)
    }

    println(" my map : " + myMap.toString())

这篇关于星火:从S3使用Scala读取csv文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆