Spark Streaming：java.io.FileNotFoundException：文件不存在：< input_filename> ._ COPYING_ [英] Spark Streaming: java.io.FileNotFoundException: File does not exist: <input_filename>._COPYING_

查看：983 发布时间：2018/6/1 12:36:49 hadoop apache-spark hdfs spark-streaming

本文介绍了Spark Streaming：java.io.FileNotFoundException：文件不存在：< input_filename> ._ COPYING_的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在写一个从HDFS读取输入的Spark应用程序。我将spark应用程序提交给yarn，然后运行一个将数据从本地fs复制到HDFS的脚本。

但Spark应用程序开始引发fileNotFoundException。
我相信这是因为在将文件完全复制到HDFS之前，文件正在拾取文件。

以下是异常追踪的一部分：

  java。 io.FileNotFoundException：文件不存在：其中文件名> ._ COPYING_ 
。在org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf（INodeFile.java:66）
。在org.apache .hadoop.hdfs.server.namenode.INodeFile.valueOf（INodeFile.java:56）
在org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes（FSNamesystem.java:1932）
在org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt（FSNamesystem.java:1873）
在org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations（FSNamesystem.java:1853）在org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations 
（FSNamesystem.java:1825）
在org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations（NameNodeRpcServer。 java：559）
在org.apache.hadoop.hdfs.server.namenode.AuthorizationPro viderProxyClientProtocol.getBlockLocations（AuthorizationProviderProxyClientProtocol.java:87）
 at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations（ClientNamenodeProtocolServerSideTranslatorPB.java:363）
 at org.apache.hadoop.hdfs.protocol。 proto.ClientNamenodeProtocolProtos $ ClientNamenodeProtocol $ 2.callBlockingMethod（ClientNamenodeProtocolProtos.java）
 at org.apache.hadoop.ipc.ProtobufRpcEngine $ Server $ ProtoBufRpcInvoker.call（ProtobufRpcEngine.java:619）
 at org.apache.hadoop .ipc.RPC $ Server.call（RPC.java:1060）
在org.apache.hadoop.ipc.Server $ Handler $ 1.run（Server.java:2044）
在org.apache。 hadoop.ipc.Server $ Handler $ 1.run（Server.java:2040）
 at java.security.AccessController.doPrivileged（Native Method）
 at javax.security.auth.Subject.doAs（Subject。 java：415）
在org.apache.hadoop.security.UserGroupInformation.doAs（UserGroupInformation.java:16 71）
 at org.apache.hadoop.ipc.Server $ Handler.run（Server.java:2038）
 $ b $ at sun.reflect.NativeConstructorAccessorImpl.newInstance0（Native Method）
。在sun.reflect.NativeConstructorAccessorImpl.newInstance（NativeConstructorAccessorImpl.java:57）
在sun.reflect.DelegatingConstructorAccessorImpl.newInstance（DelegatingConstructorAccessorImpl.java:45）
在java.lang.reflect.Constructor.newInstance（ Constructor.java:526）在org.apache.hadoop.ipc.RemoteException.instantiateException（RemoteException.java:106 
）
在org.apache.hadoop.ipc.RemoteException.unwrapRemoteException（RemoteException.java： 73）

任何建议如何解决？

谢谢

解决方案

您需要让您的数据生产者文件名与正在复制的文件不同，复制当前文件。然后你需要在文件dstream上添加过滤器来获取完全复制的文件。例如
$ b

持续复制文件前缀_copying *

完全复制文件前缀数据*

I am writing a spark streaming application which reads input from HDFS. I submit spark application to yarn and then run a script which copies data from local fs to HDFS.

But Spark application starts throwing fileNotFoundException. I believe this is happening because spark is picking up files before it is being copied fully onto HDFS.

Following is the some part of exception trace:
java.io.FileNotFoundException: File does not exist: <filename>._COPYING_ at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:559) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
Any suggestion how to resolve this?

Thanks
解决方案
You need to have your data producer filename the current file being copied different from the completely copied file. Then you need to add filter on file dstream getting the completely copied file only. E.g.

Ongoing copy file prefix _copying*
Completely copies file prefix data*

这篇关于Spark Streaming：java.io.FileNotFoundException：文件不存在：< input_filename> ._ COPYING_的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Spark Streaming：java.io.FileNotFoundException：文件不存在：< input_filename> ._ COPYING_ [英] Spark Streaming: java.io.FileNotFoundException: File does not exist: <input_filename>._COPYING_

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

Spark Streaming：java.io.FileNotFoundException：文件不存在：&lt; input_filename&gt; ._ COPYING_ [英] Spark Streaming: java.io.FileNotFoundException: File does not exist: &lt;input_filename&gt;._COPYING_

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

Spark Streaming：java.io.FileNotFoundException：文件不存在：< input_filename> ._ COPYING_ [英] Spark Streaming: java.io.FileNotFoundException: File does not exist: <input_filename>._COPYING_

登录关闭