如何使用pySpark数据框的多个列创建BinaryType列? [英] How to create BinaryType Column using multiple columns of a pySpark Dataframe?
本文介绍了如何使用pySpark数据框的多个列创建BinaryType列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我最近开始使用pySpark,所以对此一无所知.
I have recently started working with pySpark so don't know about many details regarding this.
我正在尝试在数据框中创建BinaryType列?但是努力去做...
I am trying to create a BinaryType column in a data frame? But struggling to do it...
例如,让我们看一个简单的df
for example, let's take a simple df
df.show(2)
+---+----------+
| col1|col2|
+---+----------+
| "1"| null|
| "2"| "20"|
+---+----------+
现在,我想使用BinaryType作为第三列"col3"
Now I want to have a third column "col3" with BinaryType like
| col1|col2| col3|
+---+----------+
| "1"| null|[1 null]
| "2"| "20"|[ 2 20]
+---+----------+
我应该怎么做?
推荐答案
尝试一下:
a = [('1', None), ('2', '20')]
df = spark.createDataFrame(a, ['col1', 'col2'])
df.show()
+----+----+
|col1|col2|
+----+----+
| 1|null|
| 2| 20|
+----+----+
df = df.withColumn('col3', F.array(['col1', 'col2']))
df.show()
+----+----+-------+
|col1|col2| col3|
+----+----+-------+
| 1|null| [1,]|
| 2| 20|[2, 20]|
+----+----+-------+
这篇关于如何使用pySpark数据框的多个列创建BinaryType列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文