使用SparkSQL从sql中检索SQL的最大日期以JSON格式导出 [英] Retrieve maximum date from sql using SparkSQL to export in JSON

查看:396
本文介绍了使用SparkSQL从sql中检索SQL的最大日期以JSON格式导出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在远程SQL数据库中有一个表.

I have a table in remote SQL database.

CUSTOMERID  ACCOUNTNO   VEHICLENUMBER   TAGSTARTEFFDATE
20000000    10003014    MH43AJ411       2013-06-07 13:07:13.210 
20000001    10003014    MH43AJ411       2014-08-08.19:10:11.519
20029961    10003019    GJ15CD7387      2016-07-28 19:21:54.173
20009020    10003019    GJ15CF7747      2016-05-25 18:46:55.947
20001866    10003019    GJ15CD7657      2015-07-11 15:17:14.503
20001557    10003019    GJ15CB9601      2016-05-05 16:45:58.247
20001223    10003019    GJ15CA7837      2014-06-06 14:57:42.583
20000933    10003019    MH02DG7774      2014-02-12 13:49:31.427
20001690    10003019    GJ15CD7477      2015-01-03 16:12:59.000
20000008    10003019    GJ15CB727       2013-06-17 12:36:01.190
20001865    10003019    GJ15CA7387      2015-06-24 15:01:14.000
20000005    10003019    MH02BY7774      2013-06-15 12:29:10.000

我想导出为JSON,这是代码段.

I want to export as JSON and this is the code snippet.

  val jdbcSqlConnStr = s"jdbc:sqlserver://192.168.70.15;databaseName=$db;user=bhaskar;password=welcome123;"      
  val jdbcDbTable = table1
  val jdbcDF = sqlContext.read.format("jdbc").options(Map("url" -> jdbcSqlConnStr,"dbtable" -> jdbcDbTable)).load()
//jdbcDF.show(10)
//jdbcDF.printSchema

val query = "SELECT ACCOUNTNO, collect_set(struct(`VEHICLENUMBER`, `CUSTOMERID`, `TAGSTARTEFFDATE`)) as VEHICLE FROM tp_customer_account GROUP BY ACCOUNTNO ORDER BY ACCOUNTNO"




 jdbcDF.registerTempTable("tp_customer_account")
 val res00 = sqlContext.sql(query.toString) 
// res00.show(10)
 res00.coalesce(1).write.json("D:/res15")  

问题: 但是这里的问题是我得到多个VEHICLENUMBER,因为表中存在多个TAGSTARTEFFDATE和同一个VEHICLENUMBER.

Issue: But here the problem is that I am getting multiple VEHICLENUMBER because more than one TAGSTARTEFFDATE along with the same VEHICLENUMBER is present in the table.

想要实现: 所以我想检索TAGSTARTEFFDATE,这是同一VEHICLENUMBER的最长日期.我想使用我在代码片段中给出的使用SQLContext的SparkSQL查询.

Want to achieve: So I want to retrieve the TAGSTARTEFFDATE which is maximum date for the same VEHICLENUMBER. I want to use SparkSQL query using SQLContext as I have given in the code snippet.

请帮助.在此致以最衷心的感谢.

Please help. Heartiest thanks in advanced.

推荐答案

您可以将Window函数与dense_rank()一起使用

You can use Window functions with dense_rank() that goes something like this

val windowSpec = Window.partitionBy(col("VEHICLENUMBER")).orderBy(col("TAGSTARTEFFDATE").desc)
jdbcDF.withColumn("rank", dense_rank().over(windowSpec)).filter(col("rank") === 1).drop(col("rank"))

目前,我还不确定如何使用纯SQL语法表达此逻辑,但是如果您不限于仅使用SQL,则可以使用此代码段.

At the moment I'm not really sure how to express this logic with pure SQL syntax but if you are not restricted to using just SQL you can utilize this snippet.

修改

请朋友帮忙获得上面的SQL等效项.试试看是否有效.

Took help of a friend to get a SQL equivalent of above. Try if it works.

SELECT * FROM tp_customer_account WHERE dense_rank() over(partition by VEHICLENUMBER order by TAGSTARTEFFDATE) = 1

这篇关于使用SparkSQL从sql中检索SQL的最大日期以JSON格式导出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆