使用SparkSQL从sql中检索SQL的最大日期以JSON格式导出 [英] Retrieve maximum date from sql using SparkSQL to export in JSON
问题描述
我在远程SQL数据库中有一个表.
I have a table in remote SQL database.
CUSTOMERID ACCOUNTNO VEHICLENUMBER TAGSTARTEFFDATE
20000000 10003014 MH43AJ411 2013-06-07 13:07:13.210
20000001 10003014 MH43AJ411 2014-08-08.19:10:11.519
20029961 10003019 GJ15CD7387 2016-07-28 19:21:54.173
20009020 10003019 GJ15CF7747 2016-05-25 18:46:55.947
20001866 10003019 GJ15CD7657 2015-07-11 15:17:14.503
20001557 10003019 GJ15CB9601 2016-05-05 16:45:58.247
20001223 10003019 GJ15CA7837 2014-06-06 14:57:42.583
20000933 10003019 MH02DG7774 2014-02-12 13:49:31.427
20001690 10003019 GJ15CD7477 2015-01-03 16:12:59.000
20000008 10003019 GJ15CB727 2013-06-17 12:36:01.190
20001865 10003019 GJ15CA7387 2015-06-24 15:01:14.000
20000005 10003019 MH02BY7774 2013-06-15 12:29:10.000
我想导出为JSON,这是代码段.
I want to export as JSON and this is the code snippet.
val jdbcSqlConnStr = s"jdbc:sqlserver://192.168.70.15;databaseName=$db;user=bhaskar;password=welcome123;"
val jdbcDbTable = table1
val jdbcDF = sqlContext.read.format("jdbc").options(Map("url" -> jdbcSqlConnStr,"dbtable" -> jdbcDbTable)).load()
//jdbcDF.show(10)
//jdbcDF.printSchema
val query = "SELECT ACCOUNTNO, collect_set(struct(`VEHICLENUMBER`, `CUSTOMERID`, `TAGSTARTEFFDATE`)) as VEHICLE FROM tp_customer_account GROUP BY ACCOUNTNO ORDER BY ACCOUNTNO"
jdbcDF.registerTempTable("tp_customer_account")
val res00 = sqlContext.sql(query.toString)
// res00.show(10)
res00.coalesce(1).write.json("D:/res15")
问题:
但是这里的问题是我得到多个VEHICLENUMBER
,因为表中存在多个TAGSTARTEFFDATE
和同一个VEHICLENUMBER
.
Issue:
But here the problem is that I am getting multiple VEHICLENUMBER
because more than one TAGSTARTEFFDATE
along with the same VEHICLENUMBER
is present in the table.
想要实现:
所以我想检索TAGSTARTEFFDATE
,这是同一VEHICLENUMBER
的最长日期.我想使用我在代码片段中给出的使用SQLContext的SparkSQL查询.
Want to achieve:
So I want to retrieve the TAGSTARTEFFDATE
which is maximum date for the same VEHICLENUMBER
. I want to use SparkSQL query using SQLContext as I have given in the code snippet.
请帮助.在此致以最衷心的感谢.
Please help. Heartiest thanks in advanced.
推荐答案
您可以将Window函数与dense_rank()
一起使用
You can use Window functions with dense_rank()
that goes something like this
val windowSpec = Window.partitionBy(col("VEHICLENUMBER")).orderBy(col("TAGSTARTEFFDATE").desc)
jdbcDF.withColumn("rank", dense_rank().over(windowSpec)).filter(col("rank") === 1).drop(col("rank"))
目前,我还不确定如何使用纯SQL语法表达此逻辑,但是如果您不限于仅使用SQL,则可以使用此代码段.
At the moment I'm not really sure how to express this logic with pure SQL syntax but if you are not restricted to using just SQL you can utilize this snippet.
修改
请朋友帮忙获得上面的SQL等效项.试试看是否有效.
Took help of a friend to get a SQL equivalent of above. Try if it works.
SELECT * FROM tp_customer_account WHERE dense_rank() over(partition by VEHICLENUMBER order by TAGSTARTEFFDATE) = 1
这篇关于使用SparkSQL从sql中检索SQL的最大日期以JSON格式导出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!