如何使用Standart或旧版SQL在BigQuery中使用两个字段创建交叉表 [英] How to create crosstab with two field in bigquery with standart or legacy sql

查看:25
本文介绍了如何使用Standart或旧版SQL在BigQuery中使用两个字段创建交叉表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从表中获得两列并创建一个交叉表,以查看每个客户在哪个产品类别中购买了多少产品.这是我表中的示例数据:

I want to get two columns from table and create a crosstab to see how many product bought in which product category for each customer. Here is an example data from my table:

Row     Customer_ID     Style    
 1      MEM014          BLS87    
 2      KAR810          DR126    
 3      NIKE61          MMQ5     
 4      NIKE61          MMQ5     
 5      STT019          BLS83    
 6      STT019          BLS84    
 7      STT019          BLS87    

我想要这样的结果表:

Customer - DR126 - MMQ5 - BLS83 - BLS84 - BLS87
MEM014       0       0      0       0       1
KAR810       1       0      0       0       0
NIKE61       0       2      0       0       0
STT019       0       0      1       1       1   

推荐答案

以下是BigQuery标准SQL

Below is for BigQuery Standard SQL

步骤1 -生成数据透视查询

  #standardSQL
  SELECT CONCAT(
  "SELECT Customer_ID,", 
  STRING_AGG(CONCAT("COUNTIF(Style='", Style, "') ", Style)), 
  " FROM `project.dataset.your_table` GROUP BY Customer_ID ORDER BY Customer_ID")
  FROM (
    SELECT DISTINCT Style
    FROM `project.dataset.your_table`
    ORDER BY Style
  )    

如果您使用问题中的伪数据运行它,如下所示

If you run it with dummy data from your question like below

  #standardSQL
  WITH `project.dataset.your_table` AS (
    SELECT 'MEM014' Customer_ID, 'BLS87' Style UNION ALL    
    SELECT 'KAR810', 'DR126' UNION ALL    
    SELECT 'NIKE61', 'MMQ5' UNION ALL     
    SELECT 'NIKE61', 'MMQ5' UNION ALL     
    SELECT 'STT019', 'BLS83' UNION ALL    
    SELECT 'STT019', 'BLS84' UNION ALL    
    SELECT 'STT019', 'BLS87' 
  )
  SELECT CONCAT(
  "SELECT Customer_ID,", 
  STRING_AGG(CONCAT("COUNTIF(Style='", Style, "') ", Style)), 
  " FROM `project.dataset.your_table` GROUP BY Customer_ID")
  FROM (
    SELECT DISTINCT Style
    FROM `project.dataset.your_table`
    ORDER BY Style
  )

您将获得以下数据透视查询

you will get following pivot query

SELECT Customer_ID,COUNTIF(Style='BLS83') BLS83,COUNTIF(Style='BLS84') BLS84,COUNTIF(Style='BLS87') BLS87,COUNTIF(Style='DR126') DR126,COUNTIF(Style='MMQ5') MMQ5 FROM `project.dataset.your_table` GROUP BY Customer_ID

步骤2 -运行生成的数据透视查询

Step #2 - run generated pivot query

如果对您的虚拟数据运行它-您会得到预期的结果

if you run it against your dummy data - you get expected result

Row Customer_ID BLS83   BLS84   BLS87   DR126   MMQ5     
1   KAR810      0       0       0       1       0    
2   MEM014      0       0       1       0       0    
3   NIKE61      0       0       0       0       2    
4   STT019      1       1       1       0       0      

注释1 :上面假设您的样式名称符合列名称约定(您的示例中的惯例).如果不是,则需要转义不受支持的字符,以此类推(可轻松调整第1步)
注释2 :未解决的最大查询长度为256 KB.因此,如果您的样式名称与示例中的样式名称相似-上面的解决方案将支持大约8500种样式,这应小于表

Note 1: Above assumes your Style names comply with column names convention (those in your example do). If not - you will need to escape not supported characters and so on (easy adjustment for step 1)
Note 2: Maximum unresolved query length is 256 KB. So if your Style names are similar to those in your example - above solution will support around 8500 styles, which should be less than limit (10K?) for number of columns in table

这篇关于如何使用Standart或旧版SQL在BigQuery中使用两个字段创建交叉表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆