更新的火花数据框列 [英] Updating a dataframe column in spark

查看:309
本文介绍了更新的火花数据框列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

寻找在新的火花数据帧的API,也不清楚是否有可能要修改数据帧列。

Looking at the new spark dataframe api, it is unclear whether it is possible to modify dataframe columns.

我怎么会去行 X A数据框的改变价值?

How would I go about changing a value in row x column y of a dataframe?

熊猫这将是 df.ix [X,Y] = NEW_VALUE

推荐答案

虽然你不能修改列正因为如此,你可以在一列操作,并返回一个新的数据框反映这种变化。对于你首先创建一个 UserDefinedFunction 实施操作申请,然后有选择地应用该功能仅目标列。在Python:

While you cannot modify a column as such, you may operate on a column and return a new DataFrame reflecting that change. For that you'd first create a UserDefinedFunction implementing the operation to apply and then selectively apply that function to the targeted column only. In Python:

from pyspark.sql.functions import UserDefinedFunction
from pyspark.sql.types import StringType

name = 'target_column'
udf = UserDefinedFunction(lambda x: 'new_value', Stringtype())
new_df = old_df.select(*[udf(column).alias(name) if column == name else column for column in old_df.columns])

new_df 现在拥有相同的架构为 old_df (假设 old_df.target_column 是类型 StringType 为好),但在列的所有值 target_column NEW_VALUE

new_df now has the same schema as old_df (assuming that old_df.target_column was of type StringType as well) but all values in column target_column will be new_value.

这篇关于更新的火花数据框列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆