pyspark 列不可迭代 [英] pyspark Column is not iterable
本文介绍了pyspark 列不可迭代的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
当我尝试 groupBy 并获得 max 时,有了这个数据框,我得到的 Column 是不可迭代的:
Having this dataframe I am getting Column is not iterable when I try to groupBy and getting max:
linesWithSparkDF
+---+-----+
| id|cycle|
+---+-----+
| 31| 26|
| 31| 28|
| 31| 29|
| 31| 97|
| 31| 98|
| 31| 100|
| 31| 101|
| 31| 111|
| 31| 112|
| 31| 113|
+---+-----+
only showing top 10 rows
ipython-input-41-373452512490> in runlgmodel2(model, data)
65 linesWithSparkDF.show(10)
66
---> 67 linesWithSparkGDF = linesWithSparkDF.groupBy(col("id")).agg(max(col("cycle")))
68 print "linesWithSparkGDF"
69
/usr/hdp/current/spark-client/python/pyspark/sql/column.py in __iter__(self)
241
242 def __iter__(self):
--> 243 raise TypeError("Column is not iterable")
244
245 # string methods
TypeError: Column is not iterable
推荐答案
这是因为,你覆盖了 apache-spark
提供的 max
定义,这很容易发现因为 max
期待一个 iterable
.
It's because, you've overwritten the max
definition provided by apache-spark
, it was easy to spot because max
was expecting an iterable
.
要解决此问题,您可以使用 一种不同的语法,它应该可以工作.
To fix this, you can use a different syntax, and it should work.
inesWithSparkGDF = linesWithSparkDF.groupBy(col("id")).agg({"cycle": "max"})
或者替代
from pyspark.sql.functions import max as sparkMax
linesWithSparkGDF = linesWithSparkDF.groupBy(col("id")).agg(sparkMax(col("cycle")))
这篇关于pyspark 列不可迭代的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文