如何在pyspark数据框中将字符串类型的列转换为int形式？ [英] How to convert column with string type to int form in pyspark data frame?

查看：952 发布时间：2020/10/16 22:00:49 python dataframe pyspark

本文介绍了如何在pyspark数据框中将字符串类型的列转换为int形式？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在pyspark中有数据框。它的某些数字列包含 nan，因此当我读取数据并检查数据框的架构时，这些列将具有 string类型。我如何将它们更改为int类型。我用0替换了'nan'值，并再次检查了模式，然后它又显示了这些列的字符串类型。我遵循以下代码：

I have dataframe in pyspark. Some of its numerical columns contain 'nan' so when I am reading the data and checking for the schema of dataframe, those columns will have 'string' type. How I can change them to int type.I replaced the 'nan' values with 0 and again checked the schema, but then also it's showing the string type for those columns.I am following the below code:

data_df = sqlContext.read.format("csv").load('data.csv',header=True, inferSchema="true")
data_df.printSchema()
data_df = data_df.fillna(0)
data_df.printSchema()

我的数据如下所示：

my data looks like this:

此处播放和草稿包含整数值，但由于这些列中存在nan，因此将它们视为字符串类型。

here columns 'Plays' and 'drafts' containing integer values but because of nan present in these columns,they are treated as string type.

推荐答案

from pyspark.sql.types import IntegerType
data_df = data_df.withColumn("Plays", data_df["Plays"].cast(IntegerType()))
data_df = data_df.withColumn("drafts", data_df["drafts"].cast(IntegerType()))

您可以为每个列运行循环，但这是最简单的方法将字符串列转换为整数。

You can run loop for each column but this is the simplest way to convert string column into integer.

这篇关于如何在pyspark数据框中将字符串类型的列转换为int形式？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在pyspark数据框中将字符串类型的列转换为int形式？ [英] How to convert column with string type to int form in pyspark data frame?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在pyspark数据框中将字符串类型的列转换为int形式？ [英] How to convert column with string type to int form in pyspark data frame?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭