如何在 Pyspark 中定义一个空数据框并附加相应的数据框? [英] How can I define an empty dataframe in Pyspark and append the corresponding dataframes with it?

查看：21 发布时间：2021/11/14 22:38:26 pyspark pyspark-sql

本文介绍了如何在 Pyspark 中定义一个空数据框并附加相应的数据框?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

所以我想从目录中读取 csv 文件，作为 pyspark 数据帧，然后将它们附加到单个数据帧中.在 pyspark 中没有得到替代方案，就像我们在 Pandas 中所做的那样.

So I want to read the csv files from a directory, as a pyspark dataframe and then append them into single dataframe. Not getting the alternative for this in pyspark, the way we do in pandas.

例如在 Pandas 中，我们这样做:

For example in Pandas, we do:

files=glob.glob(path +'*.csv')

df=pd.DataFrame() 

for f in files:
    dff=pd.read_csv(f,delimiter=',')
    df.append(dff)

在 Pyspark 中我已经尝试过但没有成功

In Pyspark I have tried this but not successful

schema=StructType([])
union_df = sqlContext.createDataFrame(sc.emptyRDD(),schema)

for f in files:
    dff = sqlContext.read.load(f,format='com.databricks.spark.csv',header='true',inferSchema='true',delimiter=',')
    df=df.union_All(dff)

非常感谢任何帮助.

谢谢

推荐答案

在 spark 2.1 中完成此操作的一种方法如下:

One way for getting this done as below in spark 2.1:

files=glob.glob(path +'*.csv')

for idx,f in enumerate(files):
    if idx == 0:
        df = spark.read.csv(f,header=True,inferSchema=True)
        dff = df
    else:
        df = spark.read.csv(f,header=True,inferSchema=True)
        dff=dff.unionAll(df)

这篇关于如何在 Pyspark 中定义一个空数据框并附加相应的数据框?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在 Pyspark 中定义一个空数据框并附加相应的数据框? [英] How can I define an empty dataframe in Pyspark and append the corresponding dataframes with it?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在 Pyspark 中定义一个空数据框并附加相应的数据框? [英] How can I define an empty dataframe in Pyspark and append the corresponding dataframes with it?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭