pyspark 列不可迭代 [英] pyspark Column is not iterable

查看：30 发布时间：2021/12/22 21:28:40 apache-spark pyspark

本文介绍了pyspark 列不可迭代的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

当我尝试 groupBy 并获得 max 时，有了这个数据框，我得到的 Column 是不可迭代的:

Having this dataframe I am getting Column is not iterable when I try to groupBy and getting max:

linesWithSparkDF
+---+-----+
| id|cycle|
+---+-----+
| 31|   26|
| 31|   28|
| 31|   29|
| 31|   97|
| 31|   98|
| 31|  100|
| 31|  101|
| 31|  111|
| 31|  112|
| 31|  113|
+---+-----+
only showing top 10 rows


ipython-input-41-373452512490> in runlgmodel2(model, data)
     65     linesWithSparkDF.show(10)
     66 
---> 67     linesWithSparkGDF = linesWithSparkDF.groupBy(col("id")).agg(max(col("cycle")))
     68     print "linesWithSparkGDF"
     69 

/usr/hdp/current/spark-client/python/pyspark/sql/column.py in __iter__(self)
    241 
    242     def __iter__(self):
--> 243         raise TypeError("Column is not iterable")
    244 
    245     # string methods

TypeError: Column is not iterable

推荐答案

这是因为，你覆盖了 apache-spark 提供的 max 定义，这很容易发现因为 max 期待一个 iterable.

It's because, you've overwritten the max definition provided by apache-spark, it was easy to spot because max was expecting an iterable.

要解决此问题，您可以使用一种不同的语法，它应该可以工作.

To fix this, you can use a different syntax, and it should work.

inesWithSparkGDF = linesWithSparkDF.groupBy(col("id")).agg({"cycle": "max"})

或者替代

from pyspark.sql.functions import max as sparkMax

linesWithSparkGDF = linesWithSparkDF.groupBy(col("id")).agg(sparkMax(col("cycle")))

这篇关于pyspark 列不可迭代的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

pyspark 列不可迭代 [英] pyspark Column is not iterable

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

pyspark 列不可迭代 [英] pyspark Column is not iterable

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭