在hadoop中的块大小 [英] Block Size in hadoop

查看:575
本文介绍了在hadoop中的块大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在研究四节点多集群。任何人都可以建议我适用于22GB输入文件的块大小?
提前致谢。



以下是我的表现结果:
64M - 32分钟。
128M - 19.4分钟
256M - 15分钟

现在,我应该考虑将其扩大到1GB / 2GB吗?请解释一下是否有任何问题。



编辑:
此外,如果性能随着块大小的增加而增加一个20GB的输入文件为什么默认块大小是64MB或128MB?
请在这里回答类似的问题

解决方案

您将用于处理此文件的分割大小是多少?如果它略大于默认块大小,那么我建议您将块大小更改为分割大小值。这应该会增加映射器数据局部性的机会,从而提高作业吞吐量。



拆分大小按输入格式计算。

  protected long computeSplitSize(long blockSize,long minSize,
long maxSize){
return Math.max (minSize,Math.min(maxSize,blockSize));
}

minSize和maxSize可以使用以下配置参数进行操作,


mapreduce.input.fileinputformat.split.minsize



mapreduce.input.fileinputformat.split。 maxsize


您可以在FileInputFormat类中找到详细的数据流。


I am currently working on a four node multi cluster. Can anyone suggest me the appropriate block size for working on a 22GB input file? Thanks in advance.

Here are my performance results: 64M - 32 min. 128M - 19.4 min 256M - 15 min

Now, should I consider making it much larger to 1GB/2GB? Kindly explain if there are any issues if done so.

Edit: Also, if the performance increases with increasing block size for a 20GB input file why is the default block size 64MB or 128MB? Kindly answer similar question over here

解决方案

What is the split size that you are going to use for processing this file? If it's slightly more than the default block size, then i'd suggest you to change the block size to the split size value. This should increase the chances of data locality for mappers thereby improving the job throughput.

Split size is computed by input format.

    protected long computeSplitSize(long blockSize, long minSize,
                                  long maxSize) {
    return Math.max(minSize, Math.min(maxSize, blockSize));
  }

minSize and maxSize can be manipulated using the below configuration parameters,

mapreduce.input.fileinputformat.split.minsize

mapreduce.input.fileinputformat.split.maxsize

You can find the detailed data flow in the FileInputFormat class.

这篇关于在hadoop中的块大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆