BigQuery 检查数组重叠 [英] BigQuery check for array overlap
问题描述
所以我正在编写一个 BigQuery 查询,基本上只需要能够检查是否有许多字符串中的任何一个作为元素存在于表的一列中,其中关心的列本身包含数组字符串.仅就上下文而言,我将查询作为一个小型自动化 Python 作业的一部分编写,并且使用的是标准 SQL.
So I'm writing a BigQuery query and basically just need to be able to check if any of a number of strings are present as elements in one of the columns of the table, where the cared-about column itself contains arrays of strings. Just for context, I'm writing the query as part of a little automated Python job and am using standard SQL.
我在这里找不到任何可以明确检查数组包含的内容:https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators
I couldn't find anything that would explicitly check for array inclusion here: https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators
所以我想出了一个使用非常hacky的正则表达式的解决方案,特别是:
So I came up with a solution that employs a pretty hacky regex, specifically:
...other query stuff...
WHERE
REGEXP_CONTAINS((LOWER(ARRAY_TO_STRING(column, '-'))), r"({joined_string})")
...其中 column
是我在表中关心的列,而 joined_string
是一个长字符串,由我需要检查的所有字符串组成|
(其中 |
用作正则表达式 OR 运算符).
...where column
is the column I care about in the table, and joined_string
is a long string composed of all the strings I need to check for joined by |
(where |
serves as the regex OR operator).
BigQuery 标准 SQL 中是否存在某种内置功能可以让人们更明智地执行此操作?
Does there exist some kind of built-in functionality in BigQuery standard SQL that allows one to do this more sanely?
推荐答案
下面是两个例子.
首先假设你的字符串在另一个表 strings
First assuming you have your strings in another table strings
#standardSQL
WITH yourTable AS (
SELECT 1 AS id, ['abc', 'def', 'xyz'] AS column UNION ALL
SELECT 2, ['123', '456', '789'] UNION ALL
SELECT 3, ['135', '246', '369']
),
strings AS (
SELECT 'abc' AS str UNION ALL
SELECT '123' UNION ALL
SELECT '456'
)
SELECT *
FROM yourTable
WHERE (SELECT COUNT(1) FROM UNNEST(column) AS col JOIN strings ON col = str) > 0
如果您需要查看匹配的字符串数量,可以将下面的内容添加到 SELECT
列表中
You can add below to SELECT
list if you need to see how many strings are matching
(SELECT COUNT(1) FROM UNNEST(column) AS col JOIN strings ON col = str) AS cnt
第二个例子假设你有一个包含在数组中的字符串列表
Second example assumes you have list of strings packed in Array
#standardSQL
WITH yourTable AS (
SELECT 1 AS id, ['abc', 'def', 'xyz'] AS column UNION ALL
SELECT 2, ['123', '456', '789'] UNION ALL
SELECT 3, ['135', '246', '369']
),
strings AS (
SELECT ['abc', 'def', '456'] AS strs
)
SELECT yourTable.*
FROM yourTable, strings
WHERE (SELECT COUNT(1) FROM UNNEST(column) AS col JOIN UNNEST(strs) AS str ON col = str) > 0
与第一个示例相同 - 您可以将下面添加到 SELECT
列表以查看匹配次数
Same as in first example - you can add below to SELECT
list to see matches count
(SELECT COUNT(1) FROM UNNEST(column) AS col JOIN UNNEST(strs) AS str ON col = str) AS cnt
这篇关于BigQuery 检查数组重叠的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!