如何删除pyspark数据框中的列 [英] How to delete columns in pyspark dataframe
本文介绍了如何删除pyspark数据框中的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
>>> a
DataFrame[id: bigint, julian_date: string, user_id: bigint]
>>> b
DataFrame[id: bigint, quan_created_money: decimal(10,0), quan_created_cnt: bigint]
>>> a.join(b, a.id==b.id, 'outer')
DataFrame[id: bigint, julian_date: string, user_id: bigint, id: bigint, quan_created_money: decimal(10,0), quan_created_cnt: bigint]
有两个id: bigint
,我想删除一个.我该怎么办?
There are two id: bigint
and I want to delete one. How can I do?
推荐答案
阅读Spark文档,我发现了一个更简单的解决方案.
Reading the Spark documentation I found an easier solution.
自从1.4版本的spark开始,就有一个函数drop(col)
可以在数据帧上的pyspark中使用.
Since version 1.4 of spark there is a function drop(col)
which can be used in pyspark on a dataframe.
您可以通过两种方式使用它
You can use it in two ways
-
df.drop('age').collect()
-
df.drop(df.age).collect()
df.drop('age').collect()
df.drop(df.age).collect()
这篇关于如何删除pyspark数据框中的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文