Pyspark将列的类型从日期更改为字符串 [英] Pyspark changing type of column from date to string

查看：607 发布时间：2020/9/4 0:02:35 python apache-spark apache-spark-sql pyspark

本文介绍了Pyspark将列的类型从日期更改为字符串的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下数据框:

corr_temp_df
[('vacationdate', 'date'),
 ('valueE', 'string'),
 ('valueD', 'string'),
 ('valueC', 'string'),
 ('valueB', 'string'),
 ('valueA', 'string')]

现在，我想将Vacationdate列的数据类型更改为String，以便数据框也采用这种新类型并覆盖所有条目的数据类型数据.例如.写完之后:

Now I would like to change the datatype of the column vacationdate to String, so that also the dataframe takes this new type and overwrites the datatype data for all of the entries. E.g. after writing:

corr_temp_df.dtypes

vacationdate的数据类型应被覆盖.

The datatype of vacationdate should be overwritten.

我已经使用了诸如cast，StringType或astype之类的函数，但是我没有成功.你知道该怎么做吗?

I already used functions like cast, StringType or astype, but I was not successful. Do you know how to do that?

推荐答案

让我们创建一些虚拟数据:

Lets create some dummy data:

import datetime
from pyspark.sql import Row
from pyspark.sql.functions import col

row = Row("vacationdate")

df = sc.parallelize([
    row(datetime.date(2015, 10, 07)),
    row(datetime.date(1971, 01, 01))
]).toDF()

如果Spark> = 1.5.0，则可以使用date_format函数:

If you Spark >= 1.5.0 you can use date_format function:

from pyspark.sql.functions import date_format

(df
   .select(date_format(col("vacationdate"), "dd-MM-YYYY")
   .alias("date_string"))
   .show())

在Spark中< 1.5.0可以使用Hive UDF完成:

In Spark < 1.5.0 it can be done using Hive UDF:

df.registerTempTable("df")
sqlContext.sql(
    "SELECT date_format(vacationdate, 'dd-MM-YYYY') AS date_string FROM df")

它当然在Spark> = 1.5.0中仍然可用.

It is of course still available in Spark >= 1.5.0.

如果不使用HiveContext，则可以使用UDF模仿date_format:

If you don't use HiveContext you can mimic date_format using UDF:

from pyspark.sql.functions import udf, lit
my_date_format = udf(lambda d, fmt: d.strftime(fmt))

df.select(
    my_date_format(col("vacationdate"), lit("%d-%m-%Y")).alias("date_string")
).show()

请注意，它使用的是 C标准格式不是Java 简单日期格式

Please note it is using C standard format not a Java simple date format

这篇关于Pyspark将列的类型从日期更改为字符串的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Pyspark将列的类型从日期更改为字符串 [英] Pyspark changing type of column from date to string

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Pyspark将列的类型从日期更改为字符串 [英] Pyspark changing type of column from date to string

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭