向 pyspark Dataframe 添加新行 [英] Add new rows to pyspark Dataframe

查看:89
本文介绍了向 pyspark Dataframe 添加新行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是一个非常新的 pyspark,但对 Pandas 很熟悉.我有一个 pyspark 数据框

Am very new pyspark but familiar with pandas. I have a pyspark Dataframe

# instantiate Spark
spark = SparkSession.builder.getOrCreate()

# make some test data
columns = ['id', 'dogs', 'cats']
vals = [
     (1, 2, 0),
     (2, 0, 1)
]

# create DataFrame
df = spark.createDataFrame(vals, columns)

想要添加新的 Row (4,5,7) 所以它会输出:

wanted to add new Row (4,5,7) so it will output:

df.show()
+---+----+----+
| id|dogs|cats|
+---+----+----+
|  1|   2|   0|
|  2|   0|   1|
|  4|   5|   7|
+---+----+----+

推荐答案

因为 thebluephantom 已经说联合是要走的路.我只是回答你的问题,给你一个 pyspark 的例子:

As thebluephantom has already said union is the way to go. I'm just answering your question to give you a pyspark example:

# if not already created automatically, instantiate Sparkcontext
spark = SparkSession.builder.getOrCreate()

columns = ['id', 'dogs', 'cats']
vals = [(1, 2, 0), (2, 0, 1)]

df = spark.createDataFrame(vals, columns)

newRow = spark.createDataFrame([(4,5,7)], columns)
appended = df.union(newRow)
appended.show()

还请查看 databricks 常见问题解答:https://kb.databricks.com/data/append-a-row-to-rdd-or-dataframe.html

Please have also a lookat the databricks FAQ: https://kb.databricks.com/data/append-a-row-to-rdd-or-dataframe.html

这篇关于向 pyspark Dataframe 添加新行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆