Distcp - 容器运行超出物理内存限制 [英] Distcp - Container is running beyond physical memory limits

查看:650
本文介绍了Distcp - 容器运行超出物理内存限制的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经用distcp挣扎了好几天,我发誓我的搜索引擎已经够用了。这里是我的用例:

USE CASE



我在某个位置有一个主文件夹, em> / hdfs / root ,有很多subdirs(深度不固定)和文件。

卷:200,000个文件〜= 30 GO p>

我需要在另一个位置复制一个客户端的子集 / hdfs / root ,例如/ hdfs / dest
这个子集是由可以随时间更新的绝对路径列表定义的。



体积:50,000个文件〜= 5 GO



您知道我不能使用简单的 hdfs dfs -cp / hdfs / root / hdfs dest ,因为它没有优化,它会占用每个文件,它没有更新模式。



SOLUTION POC



我以两种方式结束了hadoop distcp :

 算法1(简化):
#我为每个子目录并行启动N个distcp作业, N = MAX_PROC(〜30)

在mylist中的foreach子目录:
#mylist = / hdfs / root / dirX / file1 / h dfs / root / dirX / file2 ...
mylist = buildList(subdirs)
hadoop distcp -i -pct -update mylist / hdfs / dest / subdir&

  Algo 2 
#我开始一个有黑名单的distcp
blacklist = buildBlackList()
hadoop distcp -numListstatusThread 10 -filters blacklist -pct -update / hdfs / root / hdfs / dest

Algo 2甚至没有开始,似乎在源和黑名单之间建立差异也是如此对他来说很难,所以我使用Algo 1,它很有效。

OOZIE工作流程



知道我需要在Oozie工作流程中安排所有工作流程。
我将algo 2放在了shell动作中,因为我有很多distcp命令,并且我没有掌握oozie的递归或循环。



一旦开始,过了一段时间,我得到以下错误:
容器超出了物理内存限制。当前使用情况:使用16 GB物理内存17.2 GB



好了,我会添加更多内存:

 < configuration> 
<属性>
< name> oozie.launcher.mapreduce.map.memory.mb< / name>
<值> 32768< /值>
< / property>
<属性>
< name> oozie.launcher.mapreduce.map.java.opts< / name>
<值> -Xmx512m< /值>
< / property>
< / configuration>

我仍然得到: Container超出了物理内存限制。目前的使用情况:使用32GB的32GB物理内存但作业的寿命是前一次的两倍。



我的集群上的RAM不是无限的,所以我不能走得更远。以下是我的假设:


  1. 一个distcp作业不释放内存(JVM垃圾收集器?)
  2. Oozie将所有distcp作业添加为当前的内存使用情况,这很愚蠢

  3. 这不是正确的方式(我知道,但仍然)

此外,还有很多内容管理我不明白,它非常模糊(yarn,oozie,jvm,mapreduce)。 p>

在使用Google搜索时,我发现很少有人谈论真正的distcp用例,这篇文章已经过去4天了: https://community.hortonworks.com/articles/71775/managing-hadoop-dr-with-distcp -and-snapshots.html ,并解释了快照的用法,我不能用于我的情况。



我也听说过 http:// atlas .incubator.apache.org ,它最终通过标记文件并授予特定用户访问权限来解决我的问题,因此我们可以避免复制到某个特定位置。我的管理团队正在研究它,但我们不会让它知道生产。



我非常绝望。帮助我。

解决方案

YARN容器建立在Linuxcgroups之上。这些cgroups用于对CPU进行软限制,但不用于RAM ......

因此,YARN使用一种笨拙的解决方法:定期检查每个容器使用多少RAM,杀死<残酷地超过配额的任何东西。所以你失去了执行日志,只能得到你所看到的可怕信息。

在大多数情况下,您正在运行某种JVM二进制文件(即Java / Scala实用程序或自定义程序),因此您可以通过设置自己的JVM配额特别是 -Xmx ),以便您始终保持在YARN限制之下。这意味着一些浪费的RAM,因为安全边际。但最糟糕的情况是JVM在内存不足时出现干净故障,您可以使用extenso获得执行日志,并且可以开始调整配额或修复内存泄漏: - /



那么在你的具体情况下会发生什么?您正在使用Oozie启动一个shell - 然后shell启动一个 hadoop 命令,该命令在JVM中运行。嵌入式JVM 上,您必须设置最大堆大小。



长话短说:如果你将32GB分配给运行你的shell的YARN容器(通过 oozie.launcher.mapreduce.map.memory.mb ),那么你必须确保shell内的Java命令不会消耗超过28GB的Heap(以保持安全)。



如果你幸运的话,设置一个env变量就可以做到这一点:

  export HADOOP_OPTS = -Xmx28G 
hadoop distcp ...........

如果您不幸运,您将不得不解开整个乱七八糟的 hadoop-env.sh 以不同的设置混合不同的env变量(由你明显讨厌你的人设置,在你甚至不知道的init脚本中)由JVM使用复杂的优先级规则进行解释。玩的开心。您可以查看那个非常旧的帖子,了解关于挖掘位置的提示。


I've been strugling with distcp for several days and I swear I have googled enough. Here is my use-case:

USE CASE

I have a main folder in a certain location say /hdfs/root, with a lot of subdirs (deepness is not fixed) and files.

Volume: 200,000 files ~= 30 GO

I need to copy only a subset for a client, /hdfs/root in another location, say /hdfs/dest This subset is defined by a list of absolute path that can be updated over time.

Volume: 50,000 files ~= 5 GO

You understand that I can't use a simple hdfs dfs -cp /hdfs/root /hdfs dest because it is not optimized, it will take every files, and it hasn't an -update mode.

SOLUTION POC

I ended up using hadoop distcp in two ways:

Algo 1 (simplified):
# I start up to N distcp jobs in parallel for each subdir, with N=MAX_PROC (~30)

foreach subdir in mylist: 
    # mylist = /hdfs/root/dirX/file1 /hdfs/root/dirX/file2 ...
    mylist = buildList(subdirs)
    hadoop distcp -i -pct -update mylist /hdfs/dest/subdir &

and

Algo 2
# I start one distcp that has a blacklist
blacklist = buildBlackList()
hadoop distcp -numListstatusThread 10 -filters blacklist -pct -update /hdfs/root /hdfs/dest

Algo 2 does not even start, it seems that building a diff between source and blacklist is too hard for him, so I use Algo 1, and it works.

OOZIE WORKFLOW

Know I need to schedule all the workflow in a Oozie workflow. I have put the algo 2 in a shell action, since I have a lot of distcp command and I don't master recursion or loop in oozie.

Once started, after a while, I get the following error: Container runs beyond physical memory limits. Current usage: 17.2 GB of 16 GB physical memory used

Alright then, i'm gonna add more memory :

<configuration>
    <property>
        <name>oozie.launcher.mapreduce.map.memory.mb</name>
        <value>32768</value>
    </property>
    <property>
        <name>oozie.launcher.mapreduce.map.java.opts</name>
        <value>-Xmx512m</value>
    </property>
</configuration>

And still I get: Container runs beyond physical memory limits. Current usage: 32.8 GB of 32 GB physical memory used But the job lived twice as long as the previous one.

The RAM on my cluster is not infinite, so I can't go further. Here are my hypothesis:

  1. A distcp job does not release memory (JVM garbage collector ?)
  2. Oozie sees the addition of all distcp jobs as the current memory usage, which is stupid
  3. This is not the right way to do this (well I know, but still)

Also, there are a lot of things I did not understand about memory management, it's pretty foggy (yarn, oozie, jvm, mapreduce).

While googling, I noticed few people are talking about real distcp use case, this post is 4 days old: https://community.hortonworks.com/articles/71775/managing-hadoop-dr-with-distcp-and-snapshots.html and explains the snapshot usage, that I can't use in my case.

I've also heard about http://atlas.incubator.apache.org that would eventually solve my problem by "tagging" files and grant access to specific users, so we can avoid copying to a certain location. My admin team is working on it, but we won't get it to production know.

I'm quite desperate. Help me.

解决方案

YARN containers are built on top of Linux "cgroups". These "cgroups" are used to put soft limits on CPU, but not on RAM...
Therefore YARN uses a clumsy workaround: it periodically checks how much RAM each container uses, and kills brutally anything that got over quota. So you lose the execution logs, and only get that dreadful message you have seen.

In most cases, you are running some kind of JVM binary (i.e. a Java/Scala utility or custom program) so you can get away by setting your own JVM quotas (especially -Xmx) so that you always stay under the YARN limit. Which means some wasted RAM because of the safety margin. But then the worse case is an clean failure of the JVM when it's out of memory, you get the execution logs in extenso and can start adjusting the quotas -- or fixing your memory leaks :-/

So what happens in your specific case? You are using Oozie to start a shell -- then the shell starts a hadoop command, which runs in a JVM. It is on that embedded JVM that you must set the Max Heap Size.


Long story short: if you allocate 32GB to the YARN container that runs your shell (via oozie.launcher.mapreduce.map.memory.mb) then you must ensure that the Java commands inside the shell do not consume more than, say, 28GB of Heap (to stay on the safe side).

If you are lucky, setting a single env variable will do the trick:

export HADOOP_OPTS=-Xmx28G
hadoop distcp ...........

If you are not lucky, you will have to unwrap the whole mess of hadoop-env.sh mixing different env variables with different settings (set by people that visibly hate you, in init scripts that you cannot even know about) to be interpreted by the JVM using complex precedence rules. Have fun. You may peek at that very old post for hints about where to dig.

这篇关于Distcp - 容器运行超出物理内存限制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆