创建 PySpark 数据框:年份的月份序列 [英] Create PySpark dataframe : sequence of months with year
本文介绍了创建 PySpark 数据框:年份的月份序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
这里完全是新手.
我想使用 pyspark 创建一个 dataframe,它将列出月份和年份,采用当前日期并列出 x 行.
I would like to create a dataframe using pyspark that will list month and year taking the current date and listing x number of lines.
如果我决定 x=5
数据帧应该如下所示
if i decide x=5
dataframe should like as below
Calendar_Entry
August 2019<br/>
September 2019<br/>
October 2019<br/>
November 2019<br/>
December 2019
推荐答案
Spark 不是以分布式方式生成行的工具,而是用于处理然后分布式的工具.
由于无论如何您的数据都很小,因此最好的解决方案可能是在纯 python 中创建数据,并在需要时从中创建一个 spark 数据框.
Spark is not a tool for generating rows in a distributed way but rather for processing then distributed.
Since your data is small anyway the best solution is probably to create the data in pure python and if required create a spark dataframe out of it.
import datetime
from dateutil.relativedelta import relativedelta
def create_months_df(n_months):
date_list = [datetime.datetime.today() - relativedelta(months=i) for i in range(n_months)]
dates_formatted = [(d.strftime("%B"), d.year) for d in date_list]
return spark.createDataFrame(dates_formatted, ["month", "year"])
这篇关于创建 PySpark 数据框:年份的月份序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文