如何解决'文件只能复制到0节点,而不是1'在hadoop? [英] How to resolve 'file could only be replicated to 0 nodes, instead of 1' in hadoop?

查看:147
本文介绍了如何解决'文件只能复制到0节点,而不是1'在hadoop?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个简单的hadoop工作,可以抓取网站并将它们缓存到HDFS。映射器检查HDFS中是否存在URL,如果存在,则使用它,否则会下载该页面并将其保存到HDFS。

如果在下载页面时遇到网络错误(404等),那么URL将被完全跳过 - 而不是写入HDFS。每当我运行一个小型列表〜1000个网站时,我似乎总是会遇到这个错误,它在我的伪分布式安装中反复地崩溃了这个工作。可能是这个问题?

我正在运行Hadoop 0.20.2-cdh3u3。

  org.apache.hadoop.ipc.RemoteException:java.io.IOException:File / user / raj / cache / 9b4edc6adab6f81d5bbb84fdabb82ac0只能复制到0个节点,而不是1 
在org.apache .hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1520)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:665)
sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java: 616)
在org.apache.hadoop.ipc.RPC $ Server.call(RPC.java:557)
在org.apache.hadoop.ipc.Server $ Handler $ 1.run(Server.java :1434)
at org.apache.hadoop.ipc.Server $ Handler $ 1.run(Server.java:1430)
at java.security.AccessController.doPrivileged(Native Method)
at javax.sec (org.apache.hadoop.ipc)
。 Server $ Handler.run(Server.java:1428)


解决方案

问题是映射器中的一个未关闭的FileSystem InputStream实例,用于将输入缓存到文件系统。

I have a simple hadoop job that crawls websites and caches them to the HDFS. The mapper checks if a URL already exists in the HDFS and if so, uses it otherwise downloads the page and saves it to the HDFS.

If an network error (404, etc) is encountered while downloading the page, then the URL is skipped entirely - not written to the HDFS. Whenever I run a small list ~1000 websites, I always seem to encounter this error which crashes the job repeatedly in my pseudo distributed installation. What could be the problem?

I'm running Hadoop 0.20.2-cdh3u3.

org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/raj/cache/9b4edc6adab6f81d5bbb84fdabb82ac0 could only be replicated to 0 nodes, instead of 1
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1520)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:665)
    at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:616)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1430)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:416)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1428)

解决方案

The problem was an unclosed FileSystem InputStream instance in the mapper that was used for caching input to file system.

这篇关于如何解决'文件只能复制到0节点,而不是1'在hadoop?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆