Pyspark-从DataFrame列的操作创建新列会给出错误“列不可迭代". [英] Pyspark - create new column from operations of DataFrame columns gives error "Column is not iterable"

查看：256 发布时间：2020/9/4 21:57:24 python apache-spark pyspark spark-dataframe

本文介绍了Pyspark-从DataFrame列的操作创建新列会给出错误“列不可迭代".的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个PySpark DataFrame，我尝试了许多示例，这些示例展示了如何基于对现有列的操作来创建新列，但是似乎都没有用.

I have a PySpark DataFrame and I have tried many examples showing how to create a new column based on operations with existing columns, but none of them seem to work.

所以我有一个问题:

1-为什么此代码不起作用?

1- Why doesn't this code work?

from pyspark import SparkContext, SparkConf
from pyspark.sql import SQLContext
import pyspark.sql.functions as F

sc = SparkContext()
sqlContext = SQLContext(sc)

a = sqlContext.createDataFrame([(5, 5, 3)], ['A', 'B', 'C'])
a.withColumn('my_sum', F.sum(a[col] for col in a.columns)).show()

我得到了错误: TypeError: Column is not iterable

答案1

我了解了如何进行这项工作.我必须使用本机Python sum函数. a.withColumn('my_sum', F.sum(a[col] for col in a.columns)).show().它有效，但我不知道为什么.

I found out how to make this work. I have to use the native Python sum function. a.withColumn('my_sum', F.sum(a[col] for col in a.columns)).show(). It works, but I have no idea why.

2-如果有一种方法可以使这个总和起作用，我如何编写一个udf函数来做到这一点(并将结果添加到DataFrame的新列中)?

2- If there is a way to make this sum work, how can I write a udf function to do this (and add the result to a new column of a DataFrame)?

import numpy as np
def my_dif(row):
    d = np.diff(row) # creates an array of differences element by element
    return d.mean() # returns the mean of the array

我正在使用Python 3.6.1和Spark 2.1.1.

I am using Python 3.6.1 and Spark 2.1.1.

谢谢！

Pyspark-从DataFrame列的操作创建新列会给出错误“列不可迭代". [英] Pyspark - create new column from operations of DataFrame columns gives error "Column is not iterable"

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Pyspark-从DataFrame列的操作创建新列会给出错误“列不可迭代". [英] Pyspark - create new column from operations of DataFrame columns gives error &quot;Column is not iterable&quot;

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

Pyspark-从DataFrame列的操作创建新列会给出错误“列不可迭代". [英] Pyspark - create new column from operations of DataFrame columns gives error "Column is not iterable"

登录关闭