使用Scala从HDFS读取数据 [英] Read the data from HDFS using Scala
本文介绍了使用Scala从HDFS读取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我是Scala的新手.如何使用Scala(不使用Spark)从HDFS中读取文件? 当我用Google搜索它时,我只发现了写HDFS的选项.
I am new to Scala. How can I read a file from HDFS using Scala (not using Spark)? When I googled it I only found writing option to HDFS.
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import java.io.PrintWriter;
/**
* @author ${user.name}
*/
object App {
//def foo(x : Array[String]) = x.foldLeft("")((a,b) => a + b)
def main(args : Array[String]) {
println( "Trying to write to HDFS..." )
val conf = new Configuration()
//conf.set("fs.defaultFS", "hdfs://quickstart.cloudera:8020")
conf.set("fs.defaultFS", "hdfs://192.168.30.147:8020")
val fs= FileSystem.get(conf)
val output = fs.create(new Path("/tmp/mySample.txt"))
val writer = new PrintWriter(output)
try {
writer.write("this is a test")
writer.write("\n")
}
finally {
writer.close()
println("Closed!")
}
println("Done!")
}
}
请帮助我.如何使用Scala读取文件或从HDFS加载文件.
Please help me.How can read the file or load file from HDFS using scala.
推荐答案
其中一种方法(功能风格有点类似)可能是这样的:
One of the ways (kinda in functional style) could be like this:
val hdfs = FileSystem.get(new URI("hdfs://yourUrl:port/"), new Configuration())
val path = new Path("/path/to/file/")
val stream = hdfs.open(path)
def readLines = Stream.cons(stream.readLine, Stream.continually( stream.readLine))
//This example checks line for null and prints every existing line consequentally
readLines.takeWhile(_ != null).foreach(line => println(line))
您还可以看看这篇文章或此处和此处,这些问题看起来与您有关,并且如果您感兴趣的话,还包含有效的(但更像Java的)代码示例.
Also you could take a look this article or here and here, these questions look related to yours and contain working (but more Java-like) code examples if you're interested.
这篇关于使用Scala从HDFS读取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文