BigQuery使用条件从其他表创建表(管理大量列) [英] BigQuery use conditions to create a table from other tables (manage big number of columns)

查看：162 发布时间：2020/7/29 21:41:01 google-bigquery google-cloud-storage bq

本文介绍了BigQuery使用条件从其他表创建表(管理大量列)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正面临与我的项目有关的问题.这是我想做的事的摘要:

I am facing an issue related to a project of mine. Here is the summary of what i would like to do :

我有一个大的日常文件(100 Go)，其中包含以下摘录(无标题):

I have a big daily file (100 Go) with the following extract (no header) :

ID_A|segment_1
ID_A|segment_2
ID_B|segment_2
ID_B|segment_3
ID_B|segment_4
ID_B|segment_5
ID_C|segment_1
ID_D|segment_2
ID_D|segment_4

每个ID(从A到D)都可以链接到一个或多个网段(从1到5).

Every ID (from A to D) can be linked to one or multiple segments (from 1 to 5).

我想要处理此文件以便获得以下结果(结果文件包含标题):

I would like to process this file in order to have the following result (the result file contains a header) :

ID|segment_1|segment_2|segment_3|segment_4|segment_5
ID_A|1|1|0|0|0
ID_B|0|1|1|1|1
ID_C|1|0|0|0|0
ID_D|0|1|0|1|0

1 表示该段中包含ID， 0 表示不包含该段.

1 means that the ID is included in the segment, 0 means that it is not.

我正在使用以下查询来获取结果:

I am using the following query to get the result :

select id,
       countif(segment = 'segment_1') as segment_1,
       countif(segment = 'segment_2') as segment_2,
       countif(segment = 'segment_3') as segment_3,
       countif(segment = 'segment_4') as segment_4,
       countif(segment = 'segment_5') as segment_5
from staging s cross join
     unnest(split(segments, ',')) as segment
group by id;

此解决方案对我一直有效，直到细分的数量变得更高(900多个细分，而不是我的第一个示例中的5个).这将创建一个庞大的查询，该查询无法通过bq cli作为参数传递.

This solution worked for me until the number of segments became a lot higher (900+ segments instead of 5 in my first example). This is creating a huge query that cannot be passed as an argument through bq cli.

我可以使用任何解决方法吗?

Is there any workaround that i can use ?

感谢大家的帮助.

致谢

推荐答案

下面是BigQuery标准SQL

Below is for BigQuery Standard SQL

EXECUTE IMMEDIATE '''
SELECT id, ''' || (
  SELECT STRING_AGG("COUNTIF(segment = '" || segment || "') AS " || segment ORDER BY segment)
  FROM (SELECT DISTINCT segment FROM staging)  
) || '''  
FROM staging
GROUP BY 1
ORDER BY 1
'''

如果要应用于问题中的样本数据-输出为

If to apply to sample data in your question - output is

Row id      segment_1   segment_2   segment_3   segment_4   segment_5    
1   ID_A    1           1           0           0           0    
2   ID_B    0           1           1           1           1    
3   ID_C    1           0           0           0           0    
4   ID_D    0           1           0           1           0

正如您所看到的，您不必担心段的数量和命名-上面的查询已解决了该问题

and as you can see you don't need to worry about number and naming of segments - it is taken care of by above query

这篇关于BigQuery使用条件从其他表创建表(管理大量列)的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

BigQuery使用条件从其他表创建表(管理大量列) [英] BigQuery use conditions to create a table from other tables (manage big number of columns)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

BigQuery使用条件从其他表创建表(管理大量列) [英] BigQuery use conditions to create a table from other tables (manage big number of columns)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭