spark-scala-使用覆盖模式将数据帧保存到表中 [英] spark - scala - save dataframe to a table with overwrite mode

查看:193
本文介绍了spark-scala-使用覆盖模式将数据帧保存到表中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道覆盖"到底在做什么.假设我有一个表在表"tb1"中具有以下记录(对表的错误表示表示抱歉)

I would like to know what exactly "overwrite" does here. Let's say I have a table having the following records in table "tb1"(sorry for bad representation of tables)

驱动程序vin制作模型

martin abc ford escape
john abd toyota camry
amy abe chevrolet malibu
carlos abf honda civic

现在我有以下数据框(mydf),它们具有相同的列,但具有以下行/数据

Now I have the following dataframe(mydf) with the same columns but with the follwing rows/data

martin abf toyota corolla
carlos abg nissan versa

在将上述数据帧以覆盖模式保存到"tb1"之后,该数据帧是否会完全删除"tb1"的内容并写入mydf的数据(两个记录以上)?

After saving the above dataframe to the "tb1" with overwrite mode, will the dataframe entirely delete the contents of "tb1" and write the data of mydf(above two records)?

但是,我希望覆盖模式仅覆盖具有与驱动程序"列相同值的行.在这种情况下,"tb1"中的4条记录中,mydf仅覆盖2条以上的记录,结果表如下-

However, I would like the overwrite mode to overwrite only those rows that have same values for column "driver". In this case, of 4 records in "tb1", mydf would overwrite only above 2 records and the resultant table would be as follows-

驱动程序vin制作模型

martin abf toyota corolla
john abd toyota camry
amy abe chevrolet malibu
carlos abg nissan versa

我可以使用覆盖模式实现此功能吗?

Can I achieve this functionality using overwrite mode?

mydf.write.mode(SaveMode.Overwrite).saveAsTable("tb1")

推荐答案

您的意思是在主键上合并2个数据帧.您希望合并两个数据框,并用新行替换旧行,并在多余的行之后附加多余的行.

What you meant is merge 2 dataframes on the primary key. You want to merge two dataframe and replace the old rows with the new rows and append the extra rows if any present.

通过SaveMode.Overwrite或SaveMode.append无法实现.

This can't be achieved by SaveMode.Overwrite or SaveMode.append.

为此,您需要在主键上实现2个数据帧的合并功能.

To do this you need to implement merge functionality of 2 dataframe on the primary key.

类似的东西

 parentDF = // actual dataframe
 deltaDF = // new delta to be merged


 val updateDF = spark.sql("select parentDF.* from parentDF join deltaDF on parentDF.id = deltaDF.id")
 val totalDF = parentDF.except(updateDF).union(deltaDF)
 totalDF.write.mode(SaveMode.Overwrite).saveAsTable("tb1")

这篇关于spark-scala-使用覆盖模式将数据帧保存到表中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆