hadoop复制因素混乱 [英] hadoop replication factor confusion

查看:102
本文介绍了hadoop复制因素混乱的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有3个hadoop复制设置,即:

  dfs.replication.max = 10 
dfs.replication .min = 1
dfs.replication = 2

因此dfs.replication是默认复制hadoop集群中的文件,直到hadoop客户端使用 setrep 手动设置它。
和hadoop客户端可以将最大复制设置为 dfs.replication.mx



dfs.relication.min用于两种情况:


  1. 在安全模式下,它会检查块的复制是否达到 dfs.replication.min 或不是。

  2. dfs.replication.min 是同步处理的。而其余的dfs.replication-dfs.replication.min是异步处理的。

所以我们必须在每个节点上设置这些配置(namenode + datanode)或仅在客户端节点上?



如果设置上述三个设置在不同datanode上有所不同,那该怎么办? >解决方案

无法为群集中的任何特定节点设置复制因子,可以将其设置为整个群集/目录/文件。 dfs.replication可以在运行群集中更新。



设置文件的复制因子 - hadoop dfs -setrep -w file-path



或者为目录或整个群集递归设置 - hadoop fs -setrep -R -w 1 /



使用最小和最大代理因子 -

1-在将数据写入数据节点时,很可能会有很多数据节点失败。如果写入dfs.replication.min副本,则写入操作成功。发布写操作块异步复制,直到达到dfs.replication级别。



2-最大复制因子 dfs.replication.max用于设置块的复制限制。在创建文件时,用户无法将块复制设置为超过限制。



3-您可以设置流行文件块的高复制因子来分配读取群集上的负载。

We have 3 settings for hadoop replication namely:

dfs.replication.max = 10
dfs.replication.min = 1
dfs.replication     = 2

So dfs.replication is default replication of files in hadoop cluster until a hadoop client is setting it manually using "setrep". and a hadoop client can set max replication up to dfs.replication.mx.

dfs.relication.min is used in two cases:

  1. During safe mode, it checks whether replication of blocks is upto dfs.replication.min or not.
  2. dfs.replication.min are processed synchronously. and remaining dfs.replication-dfs.replication.min are processed asynchronously.

So we have to set these configuration on each node (namenode+datanode) or only on client node?

What if setting for above three settings vary on different datanodes?

解决方案

Replication factor can’t be set for any specific node in cluster, you can set it for entire cluster/directory/file. "dfs.replication" can be updated in running cluster.

Set the replication factor for a file- hadoop dfs -setrep -w file-path

Or set it recursively for directory or for entire cluster- hadoop fs -setrep -R -w 1 /

Use of min and max rep factor-

1- While writing the data to datanode it is possible that many datanodes may fail. If the "dfs.replication.min" replicas written then the write operation succeed. Post to write operation the blocks replicated asynchronously until it reaches to "dfs.replication" level.

2- The max replication factor "dfs.replication.max" is used to set the replication limit of blocks. A user can’t set the blocks replication more than the limit while creating the file.

3- You can set the high replication factor for blocks of popular file to distribute the read load on the cluster.

这篇关于hadoop复制因素混乱的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆