BigQuery检查数组重叠 [英] BigQuery check for array overlap

查看:160
本文介绍了BigQuery检查数组重叠的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我正在编写一个BigQuery查询,基本上只需要能够检查是否有任何一些字符串作为表中的一列中的元素存在,其中关心列本身包含数组字符串。为了上下文,我将查询作为一个自动化Python作业的一部分,并使用标准SQL。



我找不到任何明确检查数组包含在这里: https://cloud.google。 com / bigquery / docs / reference / standard-sql / functions-and-operators

所以我想出了一个解决方案,它使用了一个非常好用的正则表达式,特别是:

  ...其他查询资料... 

WHERE
REGEXP_CONTAINS( (LOWER(ARRAY_TO_STRING(column,' - '))),r({joined_string}))

...其中是我在表中关心的列, joined_string 是一个长串我需要检查由 | (其中 | 充当正则表达式运算符)检查的所有字符串。 / p>

是否存在某些亲属?在BigQuery标准SQL中的内置功能是否允许人们更明智地做到这一点?解析方案

下面是两个示例。



首先假设您在另一个表中包含字符串 strings




$ b $ prebug =lang-sql prettyprint-override> #standardSQL
WITH yourTable AS(
SELECT 1 AS id,['abc' ,'def','xyz'] AS列UNION ALL
SELECT 2,['123','456','789'] UNION ALL
SELECT 3,['135','246' ,'369']
),
strings AS(
SELECT'abc'AS str UNION ALL
SELECT'123'UNION ALL
SELECT'456'

SELECT *
FROM yourTable
WHERE(SELECT COUNT(1)FROM UNNEST(column)AS COL JOIN strings ON col = str)> 0

您可以在下面添加 SELECT 列表如果您需要查看有多少个字符串匹配

 (SELECT COUNT(1)FROM UNNEST(第二个例子假设你有一个字符串列表(第一列),第二个例子假设你有一个字符串列表(


#standardSQL
WITH yourTable AS(
SELECT 1 AS列,['abc','def','xyz'] AS列UNION ALL
SELECT 2,['123','456','789'] UNION ALL
SELECT 3,[ '''$'$'$'$'$'$'$'$'$'$'$'$'$'$'$'
SELECT yourTable。*
FROM yourTable,strings
WHERE(SELECT COUNT(1)FROM UNNEST(column)AS COL JOIN UNNEST(strs)AS str ON col = str)> 0

与第一个例子相同 - 您可以在 SELECT list查看匹配计数
$ b

 (SELECT COUNT(1)FROM UNNEST(列)as col JOIN UNNEST(strs)AS str on col = str)AS cnt 


So I'm writing a BigQuery query and basically just need to be able to check if any of a number of strings are present as elements in one of the columns of the table, where the cared-about column itself contains arrays of strings. Just for context, I'm writing the query as part of a little automated Python job and am using standard SQL.

I couldn't find anything that would explicitly check for array inclusion here: https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators

So I came up with a solution that employs a pretty hacky regex, specifically:

...other query stuff...

WHERE
    REGEXP_CONTAINS((LOWER(ARRAY_TO_STRING(column, '-'))), r"({joined_string})")

...where column is the column I care about in the table, and joined_string is a long string composed of all the strings I need to check for joined by | (where | serves as the regex OR operator).

Does there exist some kind of built-in functionality in BigQuery standard SQL that allows one to do this more sanely?

解决方案

Below are two examples.

First assuming you have your strings in another table strings

#standardSQL
WITH yourTable AS (
  SELECT 1 AS id, ['abc', 'def', 'xyz'] AS column UNION ALL
  SELECT 2, ['123', '456', '789'] UNION ALL
  SELECT 3, ['135', '246', '369'] 
),
strings AS (
  SELECT 'abc' AS str UNION ALL
  SELECT '123' UNION ALL
  SELECT '456'
)
SELECT *
FROM yourTable
WHERE (SELECT COUNT(1) FROM UNNEST(column) AS col JOIN strings ON col = str) > 0  

You can add below to SELECT list if you need to see how many strings are matching

(SELECT COUNT(1) FROM UNNEST(column) AS col JOIN strings ON col = str) AS cnt

Second example assumes you have list of strings packed in Array

#standardSQL
WITH yourTable AS (
  SELECT 1 AS id, ['abc', 'def', 'xyz'] AS column UNION ALL
  SELECT 2, ['123', '456', '789'] UNION ALL
  SELECT 3, ['135', '246', '369'] 
),
strings AS (
  SELECT ['abc', 'def', '456'] AS strs
)
SELECT yourTable.*
FROM yourTable, strings
WHERE (SELECT COUNT(1) FROM UNNEST(column) AS col JOIN UNNEST(strs) AS str ON col = str) > 0   

Same as in first example - you can add below to SELECT list to see matches count

(SELECT COUNT(1) FROM UNNEST(column) AS col JOIN UNNEST(strs) AS str ON col = str) AS cnt

这篇关于BigQuery检查数组重叠的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆