星火:从S3使用Scala读取csv文件 [英] Spark: read csv file from s3 using scala
本文介绍了星火:从S3使用Scala读取csv文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我写一个火花的工作,尝试使用Scala读取文本文件,下面的工作正常在我的本地机器上。
VAL MYFILE =myLocalPath / myFile.csv
对(线474; - Source.fromFile(MYFILE).getLines()){
VAL数据= line.split()
myHashMap.put(数据(0),数据(1).toDouble)
}
然后我试图使它在AWS上的工作,我做了以下,但它似乎没有正确读取整个文件。应该用什么有道阅读S3这样的文本文件?非常感谢!
VAL凭据=新BasicAWSCredentials(的myKey,mySecretKey);
VAL s3Client =新AmazonS3Client(凭证);
VAL s3Object = s3Client.getObject(新GetObjectRequest(myBucket,myFile.csv));
VAL读卡器=新的BufferedReader(新的InputStreamReader(s3Object.getObjectContent()));
变种线=
而((行= reader.readLine())!= NULL){
VAL数据= line.split()
myHashMap.put(数据(0),数据(1).toDouble)
的println(线);
}
解决方案
我想我得到了它的工作如下图所示:
VAL s3Object = s3Client.getObject(新GetObjectRequest(myBucket,mypath中/ myFile.csv));
VAL myData的= Source.fromInputStream(s3Object.getObjectContent())。getLines()
对(线474; - myData的){
VAL数据= line.split()
myMap.put(数据(0),数据(1).toDouble)
}
的println(我的地图+ myMap.toString())
I am writing a spark job, trying to read a text file using scala, the following works fine on my local machine.
val myFile = "myLocalPath/myFile.csv"
for (line <- Source.fromFile(myFile).getLines()) {
val data = line.split(",")
myHashMap.put(data(0), data(1).toDouble)
}
Then I tried to make it work on AWS, I did the following, but it didn't seem to read the entire file properly. What should be the proper way to read such text file on s3? Thanks a lot!
val credentials = new BasicAWSCredentials("myKey", "mySecretKey");
val s3Client = new AmazonS3Client(credentials);
val s3Object = s3Client.getObject(new GetObjectRequest("myBucket", "myFile.csv"));
val reader = new BufferedReader(new InputStreamReader(s3Object.getObjectContent()));
var line = ""
while ((line = reader.readLine()) != null) {
val data = line.split(",")
myHashMap.put(data(0), data(1).toDouble)
println(line);
}
解决方案
I think I got it work like below:
val s3Object= s3Client.getObject(new GetObjectRequest("myBucket", "myPath/myFile.csv"));
val myData = Source.fromInputStream(s3Object.getObjectContent()).getLines()
for (line <- myData) {
val data = line.split(",")
myMap.put(data(0), data(1).toDouble)
}
println(" my map : " + myMap.toString())
这篇关于星火:从S3使用Scala读取csv文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文