太多的小文件HDFS Sink Flume [英] Too many small files HDFS Sink Flume

查看:217
本文介绍了太多的小文件HDFS Sink Flume的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  agent.sinks = hpd 
agent.sinks.hpd.type = hdfs
agent.sinks.hpd.channel = memoryChannel
agent.sinks .hpd.hdfs.path = hdfs:// master:9000 / user / hduser / gde
agent.sinks.hpd.hdfs.fileType = DataStream
agent.sinks.hpd.hdfs.writeFormat = Text
agent.sinks.hpd.hdfs.rollSize = 0
agent.sinks.hpd.hdfs.batchSize = 1000
agent.sinks.hpd.hdfs.fileSuffix = .i
agent.sinks.hpd.hdfs.rollCount = 1000
agent.sinks.hpd.hdfs.rollInterval = 0

我试图使用HDFS Sink将事件写入HDFS。并已尝试大小,计数和时间基础滚动但没有按预期工作。它在HDFS中产生了太多的小文件,如:

  -rw-r  -  r-- 2 hduser supergroup 11617 2016- 03-05 19:37 hdfs:// master:9000 / user / hduser / gde / FlumeData.1457186832879.i 
-rw-r - r-- 2 hduser supergroup 1381 2016-03-05 19:37 hdfs:// master:9000 / user / hduser / gde / FlumeData.1457186832880.i
-rw-r - r-- 2 hduser supergroup 553 2016-03-05 19:37 hdfs:// master: 9000 / user / hduser / gde / FlumeData.1457186832881.i
-rw-r - r-- 2 hduser supergroup 2212 2016-03-05 19:37 hdfs:// master:9000 / user / hduser / gde / FlumeData.1457186832882.i
-rw-r - r-- 2 hduser supergroup 1379 2016-03-05 19:37 hdfs:// master:9000 / user / hduser / gde / FlumeData.1457186832883。 i
-rw-r - r-- 2 hduser supergroup 2762 2016-03-05 19:37 hdfs:// master:9000 / user / hduser / gde / FlumeData.1457186832884.i.tmp

请协助解决给定的问题。我正在使用flume 1.6.0



〜谢谢

解决方案

我提供的配置都是正确的。这种行为背后的原因是HDFS。我有2个数据节点,其中有一个出现故障。所以,文件没有达到最低要求的复制。在Flume日志中,也可以看到下面的警告消息:


Block under-replication detected。Rotating file。


若要消除此问题,可以选择以下任一解决方案: -

$ ul

  • 设置属性 hdfs.minBlockReplicas >



  • 〜谢谢

    agent.sinks=hpd
    agent.sinks.hpd.type=hdfs
    agent.sinks.hpd.channel=memoryChannel
    agent.sinks.hpd.hdfs.path=hdfs://master:9000/user/hduser/gde
    agent.sinks.hpd.hdfs.fileType=DataStream
    agent.sinks.hpd.hdfs.writeFormat=Text
    agent.sinks.hpd.hdfs.rollSize=0
    agent.sinks.hpd.hdfs.batchSize=1000
    agent.sinks.hpd.hdfs.fileSuffix=.i  
    agent.sinks.hpd.hdfs.rollCount=1000
    agent.sinks.hpd.hdfs.rollInterval=0
    

    I'm trying to use HDFS Sink to write events to HDFS. And have tried Size, Count and Time bases rolling but none is working as expected. It is generating too many small files in HDFS like:

    -rw-r--r--   2 hduser supergroup      11617 2016-03-05 19:37 hdfs://master:9000/user/hduser/gde/FlumeData.1457186832879.i
    -rw-r--r--   2 hduser supergroup       1381 2016-03-05 19:37 hdfs://master:9000/user/hduser/gde/FlumeData.1457186832880.i
    -rw-r--r--   2 hduser supergroup        553 2016-03-05 19:37 hdfs://master:9000/user/hduser/gde/FlumeData.1457186832881.i
    -rw-r--r--   2 hduser supergroup       2212 2016-03-05 19:37 hdfs://master:9000/user/hduser/gde/FlumeData.1457186832882.i
    -rw-r--r--   2 hduser supergroup       1379 2016-03-05 19:37 hdfs://master:9000/user/hduser/gde/FlumeData.1457186832883.i
    -rw-r--r--   2 hduser supergroup       2762 2016-03-05 19:37 hdfs://master:9000/user/hduser/gde/FlumeData.1457186832884.i.tmp
    

    Please assist to resolve the given problem. I'm using flume 1.6.0

    ~Thanks

    解决方案

    My provided configurations were all correct. The reason behind such behavior was HDFS. I had 2 data nodes out of which one was down. So, files were not achieving minimum required replication. In Flume logs one can see below warning message too:

    "Block Under-replication detected. Rotating file."

    To remove this problem one can opt for any of below solution:-

    • Up the data node to achieve required replication of blocks, or
    • Set property hdfs.minBlockReplicas accordingly.

    ~Thanks

    这篇关于太多的小文件HDFS Sink Flume的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆