运行 Pig 脚本时的堆空间问题 [英] Heap Space Issue while Running a Pig Script

查看:22
本文介绍了运行 Pig 脚本时的堆空间问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试执行包含大约 3000 万条数据的 Pig 脚本,但出现以下堆空间错误:

<代码>>错误 2998:未处理的内部错误.Java堆空间>>java.lang.OutOfMemoryError:Java 堆空间>在 java.util.Arrays.copyOf(Arrays.java:2367)>在 java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)>在 java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)>在 java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415)>在 java.lang.StringBuilder.append(StringBuilder.java:132)>在 org.apache.pig.newplan.logical.optimizer.LogicalPlanPrinter.shiftStringByTabs(LogicalPlanPrinter.java:223)>在 org.apache.pig.newplan.logical.optimizer.LogicalPlanPrinter.depthFirst(LogicalPlanPrinter.java:108)>在 org.apache.pig.newplan.logical.optimizer.LogicalPlanPrinter.depthFirst(LogicalPlanPrinter.java:102)>在 org.apache.pig.newplan.logical.optimizer.LogicalPlanPrinter.depthFirst(LogicalPlanPrinter.java:102)>在 org.apache.pig.newplan.logical.optimizer.LogicalPlanPrinter.depthFirst(LogicalPlanPrinter.java:102)>在 org.apache.pig.newplan.logical.optimizer.LogicalPlanPrinter.depthFirst(LogicalPlanPrinter.java:102)>在 org.apache.pig.newplan.logical.optimizer.LogicalPlanPrinter.depthFirst(LogicalPlanPrinter.java:102)>在 org.apache.pig.newplan.logical.optimizer.LogicalPlanPrinter.depthFirst(LogicalPlanPrinter.java:102)>在 org.apache.pig.newplan.logical.optimizer.LogicalPlanPrinter.depthFirst(LogicalPlanPrinter.java:102)>在 org.apache.pig.newplan.logical.optimizer.LogicalPlanPrinter.depthFirst(LogicalPlanPrinter.java:102)>在 org.apache.pig.newplan.logical.optimizer.LogicalPlanPrinter.depthFirstLP(LogicalPlanPrinter.java:83)>在 org.apache.pig.newplan.logical.optimizer.LogicalPlanPrinter.visit(LogicalPlanPrinter.java:69)>在 org.apache.pig.newplan.logical.relational.LogicalPlan.getLogicalPlanString(LogicalPlan.java:148)>在 org.apache.pig.newplan.logical.relational.LogicalPlan.getSignature(LogicalPlan.java:133)>在 org.apache.pig.PigServer.execute(PigServer.java:1295)>在 org.apache.pig.PigServer.executeBatch(PigServer.java:375)>在 org.apache.pig.PigServer.executeBatch(PigServer.java:353)>在 org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:140)>在 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:202)>在 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)>在 org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)>在 org.apache.pig.Main.run(Main.java:607)>在 org.apache.pig.Main.main(Main.java:156)>在 sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)>在 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)>在 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)>在 java.lang.reflect.Method.invoke(Method.java:606)>================================================================================

我用 1000 万个数据运行了相同的代码,并且运行良好.

<块引用><块引用>

那么有哪些可能的方法可以避免上述问题?
压缩是否有助于避免堆空间问题?
我试图将代码分成多个片段,但我仍然得到错误.因此,即使我们增加堆内存分配,它也能保证如果我们对卷执行相同的操作,它也会成立数据?

解决方案

您可以通过将 mapred.map.tasks 设置为您想要的任何数量来增加映射器的数量.然后运行你的脚本.

I am trying to execute a pig script with around 30 million data and I am getting the below heap space error:

> ERROR 2998: Unhandled internal error. Java heap space
> 
> java.lang.OutOfMemoryError: Java heap space
>         at java.util.Arrays.copyOf(Arrays.java:2367)
>         at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
>         at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
>         at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415)
>         at java.lang.StringBuilder.append(StringBuilder.java:132)
>         at org.apache.pig.newplan.logical.optimizer.LogicalPlanPrinter.shiftStringByTabs(LogicalPlanPrinter.java:223)
>         at org.apache.pig.newplan.logical.optimizer.LogicalPlanPrinter.depthFirst(LogicalPlanPrinter.java:108)
>         at org.apache.pig.newplan.logical.optimizer.LogicalPlanPrinter.depthFirst(LogicalPlanPrinter.java:102)
>         at org.apache.pig.newplan.logical.optimizer.LogicalPlanPrinter.depthFirst(LogicalPlanPrinter.java:102)
>         at org.apache.pig.newplan.logical.optimizer.LogicalPlanPrinter.depthFirst(LogicalPlanPrinter.java:102)
>         at org.apache.pig.newplan.logical.optimizer.LogicalPlanPrinter.depthFirst(LogicalPlanPrinter.java:102)
>         at org.apache.pig.newplan.logical.optimizer.LogicalPlanPrinter.depthFirst(LogicalPlanPrinter.java:102)
>         at org.apache.pig.newplan.logical.optimizer.LogicalPlanPrinter.depthFirst(LogicalPlanPrinter.java:102)
>         at org.apache.pig.newplan.logical.optimizer.LogicalPlanPrinter.depthFirst(LogicalPlanPrinter.java:102)
>         at org.apache.pig.newplan.logical.optimizer.LogicalPlanPrinter.depthFirst(LogicalPlanPrinter.java:102)
>         at org.apache.pig.newplan.logical.optimizer.LogicalPlanPrinter.depthFirstLP(LogicalPlanPrinter.java:83)
>         at org.apache.pig.newplan.logical.optimizer.LogicalPlanPrinter.visit(LogicalPlanPrinter.java:69)
>         at org.apache.pig.newplan.logical.relational.LogicalPlan.getLogicalPlanString(LogicalPlan.java:148)
>         at org.apache.pig.newplan.logical.relational.LogicalPlan.getSignature(LogicalPlan.java:133)
>         at org.apache.pig.PigServer.execute(PigServer.java:1295)
>         at org.apache.pig.PigServer.executeBatch(PigServer.java:375)
>         at org.apache.pig.PigServer.executeBatch(PigServer.java:353)
>         at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:140)
>         at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:202)
>         at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
>         at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
>         at org.apache.pig.Main.run(Main.java:607)
>         at org.apache.pig.Main.main(Main.java:156)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
> ================================================================================

I ran the same code with 10 million data and it ran fine.

So what are the possible ways I can avoid the above issue?
Does compression helps in avoiding the heap space issue?
I have tried to split the code into multiple fragments and still I am getting the error.So even though if we increase the heap memeory alloaction does it gurantee it will hold true if we execute the same with volume of data?

解决方案

You can increase numbers of mappers by setting mapred.map.tasks to any number you want. and than run your script.

这篇关于运行 Pig 脚本时的堆空间问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆