BigQuery 检查数组重叠 [英] BigQuery check for array overlap

查看:19
本文介绍了BigQuery 检查数组重叠的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我正在编写一个 BigQuery 查询,基本上只需要能够检查是否有许多字符串中的任何一个作为元素存在于表的一列中,其中关心的列本身包含数组字符串.仅就上下文而言,我将查询作为一个小型自动化 Python 作业的一部分编写,并且使用的是标准 SQL.

So I'm writing a BigQuery query and basically just need to be able to check if any of a number of strings are present as elements in one of the columns of the table, where the cared-about column itself contains arrays of strings. Just for context, I'm writing the query as part of a little automated Python job and am using standard SQL.

我在这里找不到任何可以明确检查数组包含的内容:https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators

I couldn't find anything that would explicitly check for array inclusion here: https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators

所以我想出了一个使用非常hacky的正则表达式的解决方案,特别是:

So I came up with a solution that employs a pretty hacky regex, specifically:

...other query stuff...

WHERE
    REGEXP_CONTAINS((LOWER(ARRAY_TO_STRING(column, '-'))), r"({joined_string})")

...其中 column 是我在表中关心的列,而 joined_string 是一个长字符串,由我需要检查的所有字符串组成|(其中 | 用作正则表达式 OR 运算符).

...where column is the column I care about in the table, and joined_string is a long string composed of all the strings I need to check for joined by | (where | serves as the regex OR operator).

BigQuery 标准 SQL 中是否存在某种内置功能可以让人们更明智地执行此操作?

Does there exist some kind of built-in functionality in BigQuery standard SQL that allows one to do this more sanely?

推荐答案

下面是两个例子.

首先假设你的字符串在另一个表 strings

First assuming you have your strings in another table strings

#standardSQL
WITH yourTable AS (
  SELECT 1 AS id, ['abc', 'def', 'xyz'] AS column UNION ALL
  SELECT 2, ['123', '456', '789'] UNION ALL
  SELECT 3, ['135', '246', '369'] 
),
strings AS (
  SELECT 'abc' AS str UNION ALL
  SELECT '123' UNION ALL
  SELECT '456'
)
SELECT *
FROM yourTable
WHERE (SELECT COUNT(1) FROM UNNEST(column) AS col JOIN strings ON col = str) > 0  

如果您需要查看匹配的字符串数量,可以将下面的内容添加到 SELECT 列表中

You can add below to SELECT list if you need to see how many strings are matching

(SELECT COUNT(1) FROM UNNEST(column) AS col JOIN strings ON col = str) AS cnt

第二个例子假设你有一个包含在数组中的字符串列表

Second example assumes you have list of strings packed in Array

#standardSQL
WITH yourTable AS (
  SELECT 1 AS id, ['abc', 'def', 'xyz'] AS column UNION ALL
  SELECT 2, ['123', '456', '789'] UNION ALL
  SELECT 3, ['135', '246', '369'] 
),
strings AS (
  SELECT ['abc', 'def', '456'] AS strs
)
SELECT yourTable.*
FROM yourTable, strings
WHERE (SELECT COUNT(1) FROM UNNEST(column) AS col JOIN UNNEST(strs) AS str ON col = str) > 0   

与第一个示例相同 - 您可以将下面添加到 SELECT 列表以查看匹配次数

Same as in first example - you can add below to SELECT list to see matches count

(SELECT COUNT(1) FROM UNNEST(column) AS col JOIN UNNEST(strs) AS str ON col = str) AS cnt

这篇关于BigQuery 检查数组重叠的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆