如何更改在AWS数据管道中运行的Hive Activity的内存设置? [英] How to change memory settings for Hive Activity running in AWS data pipeline?
问题描述
使用 AWS Data Pipeline
运行一个 Hive Activity
时,我的Hive活动失败,并出现以下错误:
While running one Hive Activity
using AWS Data Pipeline
, my Hive activity is failing with following error:
Diagnostics: Container [pid=,containerID=] is running beyond physical memory limits.
Current usage: 1.0 GB of 1 GB physical memory used;
2.8 GB of 5 GB virtual memory used. Killing container.
当我运行由Hive Activity手动执行的Hive脚本时,我必须按如下所示执行它:
When I ran Hive script which was getting executed by Hive Activity manually, I had to execute it as shown below:
hive \
-hiveconf tez.am.resource.memory.mb=16000 \
-hiveconf mapreduce.map.memory.mb=10240 \
-hiveconf mapreduce.map.java.opts=-Xmx8192m \
-hiveconf mapreduce.reduce.memory.mb=10240 \
-hiveconf mapreduce.reduce.java.opts=-Xmx8192m \
-hiveconf hive.exec.parallel=true
-f <hive script file path.>
通过这些设置,Hive脚本可以完美执行.
With these settings Hive script executes perfectly.
现在的问题是如何将这些设置传递给AWS数据管道的Hive活动?我似乎找不到任何将 -hiveconf
传递给Hive活动的方法.
Now question is how do I pass these settings to Hive Activity of AWS data pipeline? I can't seem to find any way to pass -hiveconf
to Hive activity.
推荐答案
您如何在DataPipeline中调用蜂巢脚本?如果您使用ShellCommandActivity,则应该能够像在命令行上一样传递这些-hiveconf,并且应该可以正常运行.
How are you calling your hive script within DataPipeline ? If you use ShellCommandActivity you should be able to pass these -hiveconf as you would do on a command line and it should run fine.
这篇关于如何更改在AWS数据管道中运行的Hive Activity的内存设置?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!