如何使用Scala写入HDFS [英] How to write to HDFS using Scala

查看:4463
本文介绍了如何使用Scala写入HDFS的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在学习Scala,我需要为HDFS编写一个自定义文件。我有我自己的HDFS在我的笔记本电脑上使用vmware融合在Cloudera映像上运行。



这是我的实际代码:

  package org.glassfish。样品

导入org.apache.hadoop.conf.Configuration;
导入org.apache.hadoop.fs.FileSystem;
导入org.apache.hadoop.fs.Path;
import java.io.PrintWriter;
$ b $ **
* @author $ {user.name}
* /
对象应用程序{

def main(args: Array [String]){
println(尝试写入HDFS ...)
val conf = new Configuration()
val fs = FileSystem.get(conf)
val output = fs.create(new Path(hdfs://quickstart.cloudera:8020 / tmp / mySample.txt))
val writer = new PrintWriter(输出)
try {
writer.write(this is a test)
writer.write(\\\

}
finally {
writer.close()

print(Done!)
}

}

我得到这个异常:

 引起:java.lang.IllegalArgumentException:错FS:hdfs ://quickstart.cloudera:8020 / tmp,expected:file:/// 
在org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
在org.apache。 hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:80)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:414)
at org.a在org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:439)上
$ or $ $ b $ org.apache.hadoop。
在org.apache.hadoop。
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:439)
at org.apache.hadoop。 fs.ChecksumFileSystem.create(ChecksumFileSystem.java:426)
位于org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908)
位于org.apache.hadoop.fs.FileSystem。在org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786)上创建(FileSystem.java:889)
在org.apache.hadoop.fs.FileSystem.create(FileSystem。 org.glassfish.samples.App $ .bmain(App.scala:19)
at org.glassfish.samples.App.main(App.scala)
。 .. 6 more

我可以使用终端和Hue访问hdfs

  [cloudera @ quickstart〜] $ hdfs dfs -ls / tmp 
找到3项
drwxr-xr-x - hdfs supergroup 0 2015 -06-09 17:54 / tmp / hadoop-yarn
drwx-wx-wx - hive supergroup 0 2015-08-17 15:24 / tmp / hive
drwxr-xr-x - cloudera supergrou p 0 2015-08-17 16:50 / tmp / labdata

这是我的 pom.xml



我运行了这个项目使用命令:

mvn clean package scala:run



我做错了什么?提前感谢您!

<@>编辑@jeroenr建议

这是实际的代码:

  package org.glassfish.samples 

import org.apache.hadoop.conf.Configuration;
导入org.apache.hadoop.fs.FileSystem;
导入org.apache.hadoop.fs.Path;
import java.io.PrintWriter;
$ b $ **
* @author $ {user.name}
* /
对象应用程序{

// def foo( x:Array [String])= x.foldLeft()((a,b)=> a + b)

def main(args:Array [String]){
println(试图写入HDFS ...)
val conf = new Configuration()
//conf.set(\"fs.defaultFS,hdfs://quickstart.cloudera: 8020)
conf.set(fs.defaultFS,hdfs://192.168.30.147:8020)
val fs = FileSystem.get(conf)
val output = fs .create(new Path(/ tmp / mySample.txt))
val writer = new PrintWriter(输出)
try {
writer.write(this is a test)
writer.write(\\\

}
finally {
writer.close()
println(Closed!)
}
println(Done!)
}

}


解决方案

看看这个这个例子在这里。我认为问题在于你不使用

  conf.set(fs.defaultFS, hdfs://quickstart.cloudera:8020)

并传递相对路径,如下所示:

  fs.create(new Path(/ tmp / mySample.txt))

写入文件,直接在由 fs.create 返回的输出流上调用'write',就像这样:

  val os = fs.create(new Path(/ tmp / mySample.txt))
os .write(This is a test.getBytes)


I am learning Scala and i need to write a custom file to HDFS. I have my own HDFS running on a Cloudera image using vmware fusion on my laptop.

This is my actual code:

package org.glassfish.samples

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import java.io.PrintWriter;

/**
* @author ${user.name}
*/
object App {

def main(args : Array[String]) {
println( "Trying to write to HDFS..." )
val conf = new Configuration()
val fs= FileSystem.get(conf)
val output = fs.create(new Path("hdfs://quickstart.cloudera:8020/tmp/mySample.txt"))
val writer = new PrintWriter(output)
try {
    writer.write("this is a test") 
    writer.write("\n")
}
finally {
    writer.close()
}
print("Done!")
}

}

And i am getting this exception:

Caused by: java.lang.IllegalArgumentException: Wrong FS: hdfs://quickstart.cloudera:8020/tmp, expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:80)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:414)
at org.apache.hadoop.fs.ChecksumFileSystem.mkdirs(ChecksumFileSystem.java:588)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:439)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:426)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:775)
at org.glassfish.samples.App$.main(App.scala:19)
at org.glassfish.samples.App.main(App.scala)
... 6 more

I can access hdfs using the terminal and Hue

[cloudera@quickstart ~]$ hdfs dfs -ls /tmp
Found 3 items
drwxr-xr-x   - hdfs     supergroup          0 2015-06-09 17:54 /tmp/hadoop-yarn
drwx-wx-wx   - hive     supergroup          0 2015-08-17 15:24 /tmp/hive
drwxr-xr-x   - cloudera supergroup          0 2015-08-17 16:50 /tmp/labdata

this is my pom.xml

I ran the project using the command:

mvn clean package scala:run

What do i am doing wrong? thank you in advance!

EDIT after @jeroenr advice

This is actual code:

package org.glassfish.samples

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import java.io.PrintWriter;

/**
* @author ${user.name}
*/
object App {

//def foo(x : Array[String]) = x.foldLeft("")((a,b) => a + b)

def main(args : Array[String]) {
println( "Trying to write to HDFS..." )
val conf = new Configuration()
//conf.set("fs.defaultFS", "hdfs://quickstart.cloudera:8020")
conf.set("fs.defaultFS", "hdfs://192.168.30.147:8020")
val fs= FileSystem.get(conf)
val output = fs.create(new Path("/tmp/mySample.txt"))
val writer = new PrintWriter(output)
try {
    writer.write("this is a test") 
    writer.write("\n")
}
finally {
    writer.close()
    println("Closed!")
}
println("Done!")
}

}

解决方案

Have a look at this this example here. I think the problem is that you don't configure the default file system using

conf.set("fs.defaultFS", "hdfs://quickstart.cloudera:8020")

and pass the relative path, like so:

fs.create(new Path("/tmp/mySample.txt"))

to write to the file, call 'write' directly on the output stream returned by fs.create, like so:

val os = fs.create(new Path("/tmp/mySample.txt"))
os.write("This is a test".getBytes)

这篇关于如何使用Scala写入HDFS的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆