如何从bigquery中的另一个表中获取表中字段的匹配项以获取短语列表? [英] How to get count of matches in field of table for list of phrases from another table in bigquery?

查看：22 发布时间：2021/5/12 18:41:11 google-bigquery

本文介绍了如何从bigquery中的另一个表中获取表中字段的匹配项以获取短语列表?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

给定一个短语列表短语1，短语2 *，...短语N(假设它们在另一张表Phrase_Table中)，那么如何获取bigquery表中字段F中每个短语的匹配计数?

在这里，"*"表示短语后面必须有一些非空/非空白的字符串.

假设您有一个包含ID字段和两个字符串字段Field1，Field2的表

输出看起来像

id，CountOfPhrase1InField1，CountOfPhrase2InField1，CountOfPhrase1InField2，CountOfPhrase2InField2

或者我猜可能不是一个输出字段，而是一个json对象字段

id，[{"fieldName":Field1，"counts":{词组1:m，词组2:mm，...}，{"fieldName":Field2，"counts":{词组1:m2，词组2:mm2，...}，...]

谢谢！

解决方案

以下示例适用于BigQuery标准SQL

  #standardSQL与`project.dataset.table` AS(选择'foo1 foo foo40'str UNION ALLSELECT'test1测试test2测试')，`project.dataset.keywords` AS(选择'foo'键UNION ALL选择测试")SELECT str，ARRAY_AGG(STRUCT(key，ARRAY_LENGTH(REGEXP_EXTRACT_ALL(str，CONCAT(key，r'[^ \ s]')))匹配)))all_matches从`project.dataset.table`交叉加入`project.dataset.keywords`按str分组

有结果

 行str all_matches.key all_matches.matches1 foo1 foo foo40 foo 2测试02 test1测试test2测试foo 0测试2

如果您希望将输出作为json，则可以添加TO_JSON_STRING()，如以下示例所示

  #standardSQL与`project.dataset.table` AS(选择'foo1 foo foo40'str UNION ALLSELECT'test1测试test2测试')，`project.dataset.keywords` AS(选择'foo'键UNION ALL选择测试")SELECT str，TO_JSON_STRING(ARRAY_AGG(STRUCT(key，ARRAY_LENGTH(REGEXP_EXTRACT_ALL(str，CONCAT(key，r'[^ \ s]')))as matchs))))all_matches从`project.dataset.table`交叉加入`project.dataset.keywords`按str分组

有输出

  Row str all_matches1 foo1 foo foo40 [{"key":"foo"，"matches":2}，{"key":"test"，"matches":0}]2 test1测试test2测试[{"key":"foo"，"matches":0}，{"key":"test"，"matches":2}]

有无数种呈现上述输出的方式-希望您将其调整为恰好需要的:o)

Given an arbitrary list of phrases phrase1, phrase2*, ... phraseN (say these are in another table Phrase_Table), how would one get the count of matches for each phrase in a field F in a bigquery table?

Here, "*" means there must be some non-empty/non-blank string after the phrase.

Lets say you have a table with and ID field and two string fields Field1, Field2

Output would look something like

id, CountOfPhrase1InField1, CountOfPhrase2InField1, CountOfPhrase1InField2, CountOfPhrase2InField2

or I guess instead of all of those output fields maybe there's a single json object field

id, [{"fieldName": Field1, "counts": {phrase1: m, phrase2: mm, ...}, {"fieldName": Field2, "counts": {phrase1: m2, phrase2: mm2, ...},...]

Thanks!

解决方案

Below example is for BigQuery Standard SQL

#standardSQL
WITH `project.dataset.table` AS (
SELECT 'foo1 foo foo40' str UNION ALL
SELECT 'test1 test test2 test'
), `project.dataset.keywords` AS (
  SELECT 'foo' key UNION ALL
  SELECT 'test'
)
SELECT str, ARRAY_AGG(STRUCT(key, ARRAY_LENGTH(REGEXP_EXTRACT_ALL(str, CONCAT(key, r'[^\s]'))) as matches)) all_matches
FROM `project.dataset.table` 
CROSS JOIN `project.dataset.keywords`
GROUP BY str

with result

Row str                     all_matches.key all_matches.matches  
1   foo1 foo foo40          foo             2    
                            test            0    
2   test1 test test2 test   foo             0    
                            test            2

If you prefer output as json you can add TO_JSON_STRING() as in below example

#standardSQL
WITH `project.dataset.table` AS (
SELECT 'foo1 foo foo40' str UNION ALL
SELECT 'test1 test test2 test'
), `project.dataset.keywords` AS (
  SELECT 'foo' key UNION ALL
  SELECT 'test'
)
SELECT str, TO_JSON_STRING(ARRAY_AGG(STRUCT(key, ARRAY_LENGTH(REGEXP_EXTRACT_ALL(str, CONCAT(key, r'[^\s]'))) as matches))) all_matches
FROM `project.dataset.table` 
CROSS JOIN `project.dataset.keywords`
GROUP BY str

with output

Row str                     all_matches  
1   foo1 foo foo40          [{"key":"foo","matches":2},{"key":"test","matches":0}]   
2   test1 test test2 test   [{"key":"foo","matches":0},{"key":"test","matches":2}]

there are endless ways of presenting outputs like above - hope you will adjust it to whatever exactly you need :o)

这篇关于如何从bigquery中的另一个表中获取表中字段的匹配项以获取短语列表?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何从bigquery中的另一个表中获取表中字段的匹配项以获取短语列表? [英] How to get count of matches in field of table for list of phrases from another table in bigquery?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何从bigquery中的另一个表中获取表中字段的匹配项以获取短语列表? [英] How to get count of matches in field of table for list of phrases from another table in bigquery?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭