无法使用 Spark 从 HDFS 读取文件 [英] Cannot Read a file from HDFS using Spark

查看:53
本文介绍了无法使用 Spark 从 HDFS 读取文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经使用 cloudera manager 安装了 cloudera CDH 5.

I have installed cloudera CDH 5 by using cloudera manager.

我可以轻松做到

hadoop fs -ls /input/war-and-peace.txt
hadoop fs -cat /input/war-and-peace.txt

上面的命令将在控制台上打印整个 txt 文件.

this above command will print the whole txt file on the console.

现在我启动 spark shell 并说

now I start the spark shell and say

val textFile = sc.textFile("hdfs://input/war-and-peace.txt")
textFile.count

现在出现错误

Spark 上下文可用作 sc.

Spark context available as sc.

scala> val textFile = sc.textFile("hdfs://input/war-and-peace.txt")
2014-12-14 15:14:57,874 INFO  [main] storage.MemoryStore (Logging.scala:logInfo(59)) - ensureFreeSpace(177621) called with curMem=0, maxMem=278302556
2014-12-14 15:14:57,877 INFO  [main] storage.MemoryStore (Logging.scala:logInfo(59)) - Block broadcast_0 stored as values in memory (estimated size 173.5 KB, free 265.2 MB)
textFile: org.apache.spark.rdd.RDD[String] = hdfs://input/war-and-peace.txt MappedRDD[1] at textFile at <console>:12

scala> textFile.count
2014-12-14 15:15:21,791 INFO  [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 0 time(s); maxRetries=45
2014-12-14 15:15:41,905 INFO  [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 1 time(s); maxRetries=45
2014-12-14 15:16:01,925 INFO  [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 2 time(s); maxRetries=45
2014-12-14 15:16:21,983 INFO  [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 3 time(s); maxRetries=45
2014-12-14 15:16:42,001 INFO  [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 4 time(s); maxRetries=45
2014-12-14 15:17:02,062 INFO  [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 5 time(s); maxRetries=45
2014-12-14 15:17:22,082 INFO  [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 6 time(s); maxRetries=45
2014-12-14 15:17:42,116 INFO  [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 7 time(s); maxRetries=45
2014-12-14 15:18:02,138 INFO  [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 8 time(s); maxRetries=45
2014-12-14 15:18:22,298 INFO  [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 9 time(s); maxRetries=45
2014-12-14 15:18:42,319 INFO  [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 10 time(s); maxRetries=45
2014-12-14 15:19:02,354 INFO  [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 11 time(s); maxRetries=45
2014-12-14 15:19:22,373 INFO  [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 12 time(s); maxRetries=45
2014-12-14 15:19:42,424 INFO  [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 13 time(s); maxRetries=45
2014-12-14 15:20:02,446 INFO  [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 14 time(s); maxRetries=45
2014-12-14 15:20:22,512 INFO  [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 15 time(s); maxRetries=45
2014-12-14 15:20:42,515 INFO  [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 16 time(s); maxRetries=45
2014-12-14 15:21:02,550 INFO  [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 17 time(s); maxRetries=45
2014-12-14 15:21:22,558 INFO  [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 18 time(s); maxRetries=45
2014-12-14 15:21:42,683 INFO  [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 19 time(s); maxRetries=45
2014-12-14 15:22:02,702 INFO  [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 20 time(s); maxRetries=45
2014-12-14 15:22:22,832 INFO  [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 21 time(s); maxRetries=45
2014-12-14 15:22:42,852 INFO  [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 22 time(s); maxRetries=45
2014-12-14 15:23:02,974 INFO  [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 23 time(s); maxRetries=45
2014-12-14 15:23:22,995 INFO  [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 24 time(s); maxRetries=45
2014-12-14 15:23:43,109 INFO  [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 25 time(s); maxRetries=45
2014-12-14 15:24:03,128 INFO  [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 26 time(s); maxRetries=45
2014-12-14 15:24:23,250 INFO  [main] ipc.Client (Client.java:handleConnectionTimeout(814)) - Retrying connect to server: input/92.242.140.21:8020. Already tried 27 time(s); maxRetries=45
java.net.ConnectException: Call From dn1home/192.168.1.21 to input:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
        at org.apache.hadoop.ipc.Client.call(Client.java:1415)

为什么我会收到这个错误?我可以使用 hadoop 命令读取同一个文件吗?

Why did I get this error? I am able to read the same file by using hadoop commands?

推荐答案

这里是解决方案

sc.textFile("hdfs://nn1home:8020/input/war-and-peace.txt")

我是如何发现 nn1home:8020 的?

How did I find out nn1home:8020?

只需搜索文件 core-site.xml 并查找 xml 元素 fs.defaultFS

Just search for the file core-site.xml and look for xml element fs.defaultFS

这篇关于无法使用 Spark 从 HDFS 读取文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆