PySpark SQL比较每天的记录并报告差异 [英] PySpark sql compare records on each day and report the differences

查看：77 发布时间：2020/9/4 21:05:29 python sql dataframe pyspark pyspark-sql

本文介绍了PySpark SQL比较每天的记录并报告差异的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

所以我的问题是我有这个数据集:

so the problem I have is I have this dataset:

，它显示企业在特定日期开展业务.我要实现的是报告在哪一天添加哪些业务.也许我希望通过以下方式得到答案:

and it shows the businesses are doing business in the specific days. what i want to achieve is to report which businesses are added on what day. Perhaps Im lookign for some answer as:

我设法使用此sql整理所有记录:

I managed to tide up all the records using this sql:

select [Date]
,Mnemonic
,securityDesc
,sum(cast(TradedVolume as money)) as TradedVolumSum
FROM SomeTable
group by [Date],Mnemonic,securityDesc

但是我不知道如何将每一天的记录与前一天进行比较，并将第二天的不存在记录导出到另一个表中.我对分区群集的SQL感到厌倦，但这使它变得复杂.我可以使用sql或Pyspark sql python组合.

but I dont know how to compare each days record with the other day and export the non existence record on the following day to another table. I tired sql over partition cluase but it makes it complex. I can either use sql or Pyspark sql python combination.

您能告诉我如何解决这个问题吗?

could you let me how I can resolve this problem?

推荐答案

以下是针对您的问题的数据框操作，您可能需要稍作调整，因为我没有示例数据，请通过查看数据来编写代码，请告诉我是否可以解决您的问题:

Below is the dataframe operation for your question you might need to tweak a little bit as I dont have the sample data for it, written the code by seeing your data, please let me know if that solves your problem:

import pyspark.sql.functions as F
from pyspark.sql import Window

some_win = Window.partitionBy("securityDesc").orderBy(F.col("[date]").asc())
some_table.withColumn(
    "buisness_added_day",
    F.first(F.col("id")).over(some_win)
).select(
    "buisness_added_day",
    "securityDesc",
    "TradedVolumSum",
    "Mnemonic"
).distinct().orderBy("buisness_added_day").show()

这篇关于PySpark SQL比较每天的记录并报告差异的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

PySpark SQL比较每天的记录并报告差异 [英] PySpark sql compare records on each day and report the differences

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

PySpark SQL比较每天的记录并报告差异 [英] PySpark sql compare records on each day and report the differences

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭