如何在pyspark中设置pivotMaxValues? [英] How to set pivotMaxValues in pyspark?
问题描述
我正在尝试对具有超过 10000 个不同值的列进行透视.Spark 中不同值最大数量的默认限制为 10000,我收到此错误
I am trying to pivot a column which has more than 10000 distinct values. The default limit in Spark for maximum number of distinct values is 10000 and I am receiving this error
数据透视列 COLUMN_NUM_2
有超过 10000 个不同的值,这可能表示存在错误.如果这是有意的,请将 spark.sql.pivotMaxValues 设置为至少枢轴列的不同值的数量
The pivot column
COLUMN_NUM_2
has more than 10000 distinct values, this could indicate an error. If this was intended, set spark.sql.pivotMaxValues to at least the number of distinct values of the pivot column
如何在 PySpark 中进行设置?
How do I set this in PySpark?
推荐答案
您必须在 Spark 解释器中添加/设置此参数.
You have to add / set this parameter in the Spark interpreter.
我正在 EMR (AWS) 集群上使用 Zeppelin 笔记本,收到与您相同的错误消息,并且在我在解释器中添加参数后它工作正常.
I am working with Zeppelin notebooks on an EMR (AWS) cluster, had the same error message as you and it worked after I added the parameter in the interpreter.
希望这有帮助...
这篇关于如何在pyspark中设置pivotMaxValues?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!