Spark SQL失败,因为“常量池已超过JVM的0xFFFF限制". [英] Spark SQL fails because "Constant pool has grown past JVM limit of 0xFFFF"

查看:305
本文介绍了Spark SQL失败,因为“常量池已超过JVM的0xFFFF限制".的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在EMR 4.6.0 + Spark 1.6.1上运行此代码:

I am running this code on EMR 4.6.0 + Spark 1.6.1 :

val sqlContext = SQLContext.getOrCreate(sc)
val inputRDD = sqlContext.read.json(input)

try {
    inputRDD.filter("`first_field` is not null OR `second_field` is not null").toJSON.coalesce(10).saveAsTextFile(output)
    logger.info("DONE!")
} catch {
    case e : Throwable => logger.error("ERROR" + e.getMessage)
}

saveAsTextFile的最后阶段,它因以下错误而失败:

In the last stage of saveAsTextFile, it fails with this error:

16/07/15 08:27:45 ERROR codegen.GenerateUnsafeProjection: failed to compile: org.codehaus.janino.JaninoRuntimeException: Constant pool has grown past JVM limit of 0xFFFF
/* 001 */
/* 002 */ public java.lang.Object generate(org.apache.spark.sql.catalyst.expressions.Expression[] exprs) {
/* 003 */   return new SpecificUnsafeProjection(exprs);
/* 004 */ }
(...)

可能是什么原因?谢谢

推荐答案

通过删除数据帧中所有未使用的列或仅过滤实际需要的列来解决此问题.

Solved this problem by dropping all the unused column in the Dataframe, or just filter columns you actually need.

结果发现Spark Dataframe无法处理超宽模式.没有具体的列数,Spark可能会因常量池已超出JVM限制0xFFFF而中断"而中断-它取决于查询的种类,但是减少列数可以帮助解决此问题.

Turnes out Spark Dataframe can not handle super wide schemas. There is no specific number of columns where Spark might break with "Constant pool has grown past JVM limit of 0xFFFF" - it depends on kind of query, but reducing number of columns can help to workaround this issue.

根本原因在于生成Java类的JVM的64kb中-另请参阅安德鲁的答案.

The underlying root cause is in JVM's 64kb for generated Java classes - see also Andrew's answer.

这篇关于Spark SQL失败,因为“常量池已超过JVM的0xFFFF限制".的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆