Spark SQL 中的 SQL LIKE [英] SQL LIKE in Spark SQL

查看:40
本文介绍了Spark SQL 中的 SQL LIKE的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 LIKE 条件在 Spark SQL 中实现连接.

I'm trying to implement a join in Spark SQL using a LIKE condition.

我正在执行连接的行看起来像这样,称为修订版":

The row I am performing the join on looks like this and is called 'revision':

表 A:

8NXDPVAE

表 B:

[4,8]NXD_V%

在 SQL 服务器上执行连接 (A.revision LIKE B.revision) 工作正常,但在 Spark SQL 中执行相同操作时,连接不返回任何行(如果使用内部连接)或表 B 的空值(如果使用外连接).

Performing the join on SQL server (A.revision LIKE B.revision) works just fine, but when doing the same in Spark SQL, the join returns no rows (if using inner join) or null values for Table B (if using outer join).

这是我正在运行的查询:

This is the query I am running:

val joined = spark.sql("SELECT A.revision, B.revision FROM RAWDATA A LEFT JOIN TPTYPE B ON A.revision LIKE B.revision")

计划如下:

== Physical Plan ==
BroadcastNestedLoopJoin BuildLeft, LeftOuter, revision#15 LIKE revision#282, false
:- BroadcastExchange IdentityBroadcastMode
:  +- *Project [revision#15]
:     +- *Scan JDBCRelation(RAWDATA) [revision#15] PushedFilters: [EqualTo(bulk_id,2016092419270100198)], ReadSchema: struct<revision>
+- *Scan JDBCRelation(TPTYPE) [revision#282] ReadSchema: struct<revision>

是否可以像这样执行 LIKE 连接,否则我就离题了?

Is it possible to perform a LIKE join like this or am I way off?

推荐答案

你只是有点偏离.Spark SQL 和 Hive 遵循 SQL 标准约定,其中 LIKE 运算符仅接受两个特殊字符:

You are only a little bit off. Spark SQL and Hive follow SQL standard conventions where LIKE operator accepts only two special characters:

  • _(下划线) - 匹配任意字符.
  • %(百分比) - 匹配任意字符序列.
  • _ (underscore) - which matches an arbitrary character.
  • % (percent) - which matches an arbitrary sequence of characters.

方括号没有特殊含义,[4,8] 只匹配一个 [4,8] 文字:

Square brackets have no special meaning and [4,8] matches only a [4,8] literal:

spark.sql("SELECT '[4,8]' LIKE '[4,8]'").show

+----------------+
|[4,8] LIKE [4,8]|
+----------------+
|            true|
+----------------+

要匹配复杂的模式,您可以使用支持 Java 正则表达式的 RLIKE 运算符:

To match complex patterns you can use RLIKE operator which suports Java regular expressions:

spark.sql("SELECT '8NXDPVAE' RLIKE '^[4,8]NXD.V.*$'").show

+-----------------------------+
|8NXDPVAE RLIKE ^[4,8]NXD.V.*$|
+-----------------------------+
|                         true|
+-----------------------------+

这篇关于Spark SQL 中的 SQL LIKE的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆