崩溃的 HDFS 客户端 - 如何关闭剩余的打开文件? [英] Crashed HDFS client - how to close remaining open files?

查看:41
本文介绍了崩溃的 HDFS 客户端 - 如何关闭剩余的打开文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的 Hadoop 应用程序遇到了一些问题.

I am experiencing some problems with my Hadoop application.

每当我的客户端在没有关闭文件的情况下退出时(例如由于崩溃),Hadoop 中就会有从未关闭的打开文件.

Whenever my client exits without closing the files (e.g. due to a crash), there are open files in Hadoop that are never closed.

当我尝试重新启动客户端时,它在重新打开这些文件以追加数据时失败.(异常信息见下文)

When I then try to restart the client it fails when re-opening those files to append data. (See below for Exception message)

是否有手动关闭这些文件的好方法,或者更好的方法,即在重新打开它们之前直接检查并关闭它们?

Is there a good way to close those files manually or even better, a way to check and close them directly before reopening them?

我使用的是 Cloudera CDH5 (2.3.0-cdh5.0.0).

I am using Cloudera CDH5 (2.3.0-cdh5.0.0).

这些是客户端意外退出后我打开的文件:

$ hadoop fsck -openforwrite /

[root@cloudera ~]# su hdfs -c'hadoop fsck -openforwrite /'
Connecting to namenode via http://cloudera:50070
FSCK started by hdfs (auth:SIMPLE) from /127.0.0.1 for path / at Fri May 23 08:04:20 PDT 2014
../tmp/event_consumer_test/game=game1/month=201405/day=20140521/events_2014052100 11806743 bytes, 1 block(s), OPENFORWRITE: ../tmp/event_consumer_test/game=game1/month=201405/day=20140521/events_2014052103 11648439 bytes, 1 block(s), OPENFORWRITE: ..../tmp/event_consumer_test/game=game1/month=201405/day=20140521/events_2014052108 11953116 bytes, 1 block(s), OPENFORWRITE: /tmp/event_consumer_test/game=game1/month=201405/day=20140521/events_2014052109 12047982 bytes, 1 block(s), OPENFORWRITE: .../tmp/event_consumer_test/game=game1/month=201405/day=20140521/events_2014052113 12010734 bytes, 1 block(s), OPENFORWRITE: ........../tmp/event_consumer_test/game=game1/month=201405/day=20140521/events_2014052124 11674047 bytes, 1 block(s), OPENFORWRITE: /tmp/event_consumer_test/game=game2/month=201405/day=20140521/events_2014052100 11995602 bytes, 1 block(s), OPENFORWRITE: /tmp/event_consumer_test/game=game2/month=201405/day=20140521/events_2014052101 12257502 bytes, 1 block(s), OPENFORWRITE: ../tmp/event_consumer_test/game=game2/month=201405/day=20140521/events_2014052104 11964174 bytes, 1 block(s), OPENFORWRITE: .../tmp/event_consumer_test/game=game2/month=201405/day=20140521/events_2014052108 11777061 bytes, 1 block(s), OPENFORWRITE: /tmp/event_consumer_test/game=game2/month=201405/day=20140521/events_2014052109 12000840 bytes, 1 block(s), OPENFORWRITE: ......./tmp/event_consumer_test/game=game2/month=201405/day=20140521/events_2014052117 12041871 bytes, 1 block(s), OPENFORWRITE: .../tmp/event_consumer_test/game=game2/month=201405/day=20140521/events_2014052121 12129462 bytes, 1 block(s), OPENFORWRITE: ../tmp/event_consumer_test/game=game2/month=201405/day=20140521/events_2014052124 11856213 bytes, 1 block(s), OPENFORWRITE: ....../tmp/event_consumer_test/game=game3/month=201405/day=20140521/events_2014052106 11863488 bytes, 1 block(s), OPENFORWRITE: ....../tmp/event_consumer_test/game=game3/month=201405/day=20140521/events_2014052113 11707803 bytes, 1 block(s), OPENFORWRITE: ./tmp/event_consumer_test/game=game3/month=201405/day=20140521/events_2014052115 11690052 bytes, 1 block(s), OPENFORWRITE: ../tmp/event_consumer_test/game=game3/month=201405/day=20140521/events_2014052118 11898117 bytes, 1 block(s), OPENFORWRITE: ........../tmp/logs/hdfs/logs/application_1400845529689_0013/cloudera_8041 0 bytes, 0 block(s), OPENFORWRITE: ..................
......................................../user/history/done_intermediate/hdfs/job_1400845529689_0007.summary_tmp 0 bytes, 0 block(s), OPENFORWRITE: ...........................................................
....................................................................................................
................................................Status: HEALTHY
 Total size:    1080902001 B
 Total dirs:    68
 Total files:   348
 Total symlinks:        0
 Total blocks (validated):  344 (avg. block size 3142156 B)
 Minimally replicated blocks:   344 (100.0 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:   0 (0.0 %)
 Mis-replicated blocks:     0 (0.0 %)
 Default replication factor:    1
 Average block replication: 1.0
 Corrupt blocks:        0
 Missing replicas:      0 (0.0 %)
 Number of data-nodes:      1
 Number of racks:       1
FSCK ended at Fri May 23 08:04:20 PDT 2014 in 25 milliseconds


The filesystem under path '/' is HEALTHY

<小时>

创建和写入文件的代码(减少到问题):

Path path = new Path(filename);

if(!this.fs.exists(path)) {
    this.fs.create(path).close();
}

OutputStream out = this.fs.append(path);
out.write(... message ...);

IOUtils.closeStream(out);

<小时>

尝试写入打开的文件时遇到的异常:

Exception in thread "main" org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): failed to create file /tmp/event_consumer_test/game=game1/month=201405/day=20140521/events_2014052124 for DFSClient_NONMAPREDUCE_-1420767882_1 on client 127.0.0.1 because current leaseholder is trying to recreate file.
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2458)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2340)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2569)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2532)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:522)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:373)
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
    at org.apache.hadoop.ipc.Client.call(Client.java:1409)
    at org.apache.hadoop.ipc.Client.call(Client.java:1362)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
    at com.sun.proxy.$Proxy9.append(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at com.sun.proxy.$Proxy9.append(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.append(ClientNamenodeProtocolTranslatorPB.java:276)
    at org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1558)
    at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1598)
    at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1586)
    at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:320)
    at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:316)
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:316)
    at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1161)
    at com.cmp.eventconsumer.io.HdfsOutputManager.get(HdfsOutputManager.java:46)
    at com.cmp.eventconsumer.EventConsumer.fetchEvents(EventConsumer.java:68)
    at com.cmp.eventconsumer.EventConsumer.main(EventConsumer.java:112)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

推荐答案

我有同样的问题.我做的是:

I had the same proplem. What i do is :

try {

} catch(Excetion e) {
  logger.info("try to recover file Lease : "+hdfspath);
  fileSystem.recoverLease(hdfspath);
  boolean isclosed= filesystem.isFileClosed(hdfspath);
  Stopwatch sw = new StopWatch().start();
  while(!isclosed) {
    if(sw.elapsedMillis()>60*1000)
      throw e;
    try {
        Thread.currentThread().sleep(1000);
    } catch (InterruptedException e1) {
    }
    isclosed = filesystem.isFileClosed(hdfspath);
  }
}

这篇关于崩溃的 HDFS 客户端 - 如何关闭剩余的打开文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆