如何在Spark SQL中启用Postgis查询 [英] How to enable Postgis Query in Spark SQL

查看:154
本文介绍了如何在Spark SQL中启用Postgis查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带Postgis扩展名的PostgreSQL数据库,因此我可以执行以下查询:

I have a PostgreSQL database with Postgis extension, so I can do queries like:

SELECT *
FROM poi_table
WHERE (ST_DistanceSphere(the_geom, ST_GeomFromText('POINT(121.37796 31.208297)', 4326)) < 6000)

使用Spark SQL,我可以在Spark应用程序(在Scala中)中查询表,如下所示:

And with Spark SQL, I can query the table in my Spark Application (in Scala) like:

spark.sql("select the_geom from poi_table where the_geom is not null").show

问题是,Spark SQL不支持Postgis扩展.例如,当我使用Postgis函数 ST_DistanceSphere 查询表时,出现这样的错误:

The problem is, Spark SQL doesn't support Postgis extension. For example, when I query the table using Postgis function ST_DistanceSphere, I got such an error:

scala> spark.sql("select * FROM poi_table WHERE (ST_DistanceSphere(the_geom, ST_GeomFromText('POINT(121.37796 31.208297)', 4326)) < 60)")
org.apache.spark.sql.AnalysisException: Undefined function: 'ST_DistanceSphere'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 65
  at
...

使用Python,我可以创建一个Postgresql连接并将此查询发送到Postgresql服务器以执行它.

With Python, I can create a Postgresql connection and send this query to Postgresql server to execute it.

那么,Spark/Scala中是否有任何类似的解决方法?
甚至更好的是,我可以用来启用支持Postgis扩展的Spark SQL的任何jar吗?

So, is there any similar workaround in Spark/Scala?
Or even better, any jar I can use to enable Spark SQL supporting Postgis extension?

推荐答案

使用Python,我可以创建一个Postgresql连接并将此查询发送到Postgresql服务器以执行它.

With Python, I can create a Postgresql connection and send this query to Postgresql server to execute it.

您可以使用Scala进行相同的操作.使用JDBC( java.sql.{Connection,DriverManager} )并获取结果集.

You can do the same with Scala. Use JDBC (java.sql.{Connection,DriverManager}) and get result set.

甚至更好的是,我可以用来启用支持Postgis扩展的Spark SQL的任何jar

Or even better, any jar I can use to enable Spark SQL supporting Postgis extension

您不能,因为这不是Postgres查询.您在 spark.sql 中执行的是一个Spark查询.您可以做的是使用子查询:

You cannot, because this is not a Postgres query. What you execute in spark.sql is a Spark query. What you can do is to use subquery:

也许它可以满足您的要求(如果查询不是必须动态的).不幸的是,Spark SQL也不支持几何类型,因此可能不得不将其转换为Spark可以使用的东西或定义自己的方言.

Maybe it will fit your requirements (if query doesn't have to be dynamic). Unfortunately Spark SQL doesn't support geometric types either, so may have to cast it to something consumable by Spark or define your own dialect.

这篇关于如何在Spark SQL中启用Postgis查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆