Spark SQL中的SQL LIKE [英] SQL LIKE in Spark SQL
问题描述
我正在尝试使用LIKE条件在Spark SQL中实现联接.
I'm trying to implement a join in Spark SQL using a LIKE condition.
我正在执行联接的行看起来像这样,称为修订":
The row I am performing the join on looks like this and is called 'revision':
表A:
8NXDPVAE
表B:
[4,8]NXD_V%
在SQL Server(A.revision LIKE B.revision
)上执行联接工作很好,但是在Spark SQL中执行联接时,联接不返回任何行(如果使用内部联接)或为表B返回空值(如果使用外部联接)
Performing the join on SQL server (A.revision LIKE B.revision
) works just fine, but when doing the same in Spark SQL, the join returns no rows (if using inner join) or null values for Table B (if using outer join).
这是我正在运行的查询:
This is the query I am running:
val joined = spark.sql("SELECT A.revision, B.revision FROM RAWDATA A LEFT JOIN TPTYPE B ON A.revision LIKE B.revision")
该计划如下:
== Physical Plan ==
BroadcastNestedLoopJoin BuildLeft, LeftOuter, revision#15 LIKE revision#282, false
:- BroadcastExchange IdentityBroadcastMode
: +- *Project [revision#15]
: +- *Scan JDBCRelation(RAWDATA) [revision#15] PushedFilters: [EqualTo(bulk_id,2016092419270100198)], ReadSchema: struct<revision>
+- *Scan JDBCRelation(TPTYPE) [revision#282] ReadSchema: struct<revision>
是否可以像这样执行LIKE加入?还是我离开了?
Is it possible to perform a LIKE join like this or am I way off?
推荐答案
您只差一点点. Spark SQL和Hive遵循SQL标准约定,其中LIKE
运算符仅接受两个特殊字符:
You are only a little bit off. Spark SQL and Hive follow SQL standard conventions where LIKE
operator accepts only two special characters:
-
_
(下划线)-匹配任意字符. -
%
(百分比)-匹配任意字符序列.
_
(underscore) - which matches an arbitrary character.%
(percent) - which matches an arbitrary sequence of characters.
方括号没有特殊含义,[4,8]
仅与[4,8]
文字匹配:
Square brackets have no special meaning and [4,8]
matches only a [4,8]
literal:
spark.sql("SELECT '[4,8]' LIKE '[4,8]'").show
+----------------+
|[4,8] LIKE [4,8]|
+----------------+
| true|
+----------------+
要匹配复杂的模式,可以使用RLIKE
运算符,该运算符支持Java正则表达式:
To match complex patterns you can use RLIKE
operator which suports Java regular expressions:
spark.sql("SELECT '8NXDPVAE' RLIKE '^[4,8]NXD.V.*$'").show
+-----------------------------+
|8NXDPVAE RLIKE ^[4,8]NXD.V.*$|
+-----------------------------+
| true|
+-----------------------------+
这篇关于Spark SQL中的SQL LIKE的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!