Spark SQL中的SQL LIKE [英] SQL LIKE in Spark SQL

查看:1447
本文介绍了Spark SQL中的SQL LIKE的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用LIKE条件在Spark SQL中实现联接.

I'm trying to implement a join in Spark SQL using a LIKE condition.

我正在执行联接的行看起来像这样,称为修订":

The row I am performing the join on looks like this and is called 'revision':

表A:

8NXDPVAE

表B:

[4,8]NXD_V%

在SQL Server(A.revision LIKE B.revision)上执行联接工作很好,但是在Spark SQL中执行联接时,联接不返回任何行(如果使用内部联接)或为表B返回空值(如果使用外部联接)

Performing the join on SQL server (A.revision LIKE B.revision) works just fine, but when doing the same in Spark SQL, the join returns no rows (if using inner join) or null values for Table B (if using outer join).

这是我正在运行的查询:

This is the query I am running:

val joined = spark.sql("SELECT A.revision, B.revision FROM RAWDATA A LEFT JOIN TPTYPE B ON A.revision LIKE B.revision")

该计划如下:

== Physical Plan ==
BroadcastNestedLoopJoin BuildLeft, LeftOuter, revision#15 LIKE revision#282, false
:- BroadcastExchange IdentityBroadcastMode
:  +- *Project [revision#15]
:     +- *Scan JDBCRelation(RAWDATA) [revision#15] PushedFilters: [EqualTo(bulk_id,2016092419270100198)], ReadSchema: struct<revision>
+- *Scan JDBCRelation(TPTYPE) [revision#282] ReadSchema: struct<revision>

是否可以像这样执行LIKE加入?还是我离开了?

Is it possible to perform a LIKE join like this or am I way off?

推荐答案

您只差一点点. Spark SQL和Hive遵循SQL标准约定,其中LIKE运算符仅接受两个特殊字符:

You are only a little bit off. Spark SQL and Hive follow SQL standard conventions where LIKE operator accepts only two special characters:

  • _(下划线)-匹配任意字符.
  • %(百分比)-匹配任意字符序列.
  • _ (underscore) - which matches an arbitrary character.
  • % (percent) - which matches an arbitrary sequence of characters.

方括号没有特殊含义,[4,8]仅与[4,8]文字匹配:

Square brackets have no special meaning and [4,8] matches only a [4,8] literal:

spark.sql("SELECT '[4,8]' LIKE '[4,8]'").show

+----------------+
|[4,8] LIKE [4,8]|
+----------------+
|            true|
+----------------+

要匹配复杂的模式,可以使用RLIKE运算符,该运算符支持Java正则表达式:

To match complex patterns you can use RLIKE operator which suports Java regular expressions:

spark.sql("SELECT '8NXDPVAE' RLIKE '^[4,8]NXD.V.*$'").show

+-----------------------------+
|8NXDPVAE RLIKE ^[4,8]NXD.V.*$|
+-----------------------------+
|                         true|
+-----------------------------+

这篇关于Spark SQL中的SQL LIKE的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆