Spark 数据框添加带有随机数据的新列 [英] Spark dataframe add new column with random data

查看：64 发布时间：2021/11/14 22:23:57 python apache-spark pyspark apache-spark-sql

本文介绍了Spark 数据框添加带有随机数据的新列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想向数据框中添加一个新列，其值由 0 或 1 组成.我使用了'randint'函数，

I want to add a new column to the dataframe with values consist of either 0 or 1. I used 'randint' function from,

from random import randint

df1 = df.withColumn('isVal',randint(0,1))

但我收到以下错误，

/spark/python/pyspark/sql/dataframe.py"，第 1313 行，在 withColumn 中assert isinstance(col, Column), "col 应该是 Column"断言错误:col 应该是 Column

/spark/python/pyspark/sql/dataframe.py", line 1313, in withColumn assert isinstance(col, Column), "col should be Column" AssertionError: col should be Column

如何使用自定义函数或 randint 函数为列生成随机值?

how to use a custom function or randint function for generate random value for the column?

推荐答案

您正在使用 python 内置随机.这将返回一个特定的常量值(返回值).

You are using python builtin random. This returns a specific value which is constant (the returned value).

如错误消息所示，我们需要一个代表表达式的列.

As the error message shows, we expect a column which represents the expression.

要做到这一点:

from pyspark.sql.functions import rand,when
df1 = df.withColumn('isVal', when(rand() > 0.5, 1).otherwise(0))

这将给出 0 和 1 之间的均匀分布.有关更多选项，请参阅函数文档(http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#module-pyspark.sql.functions)

This would give a uniform distribution between 0 and 1. See the functions documentation for more options (http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#module-pyspark.sql.functions)

这篇关于Spark 数据框添加带有随机数据的新列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Spark 数据框添加带有随机数据的新列 [英] Spark dataframe add new column with random data

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Spark 数据框添加带有随机数据的新列 [英] Spark dataframe add new column with random data

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭