如何使用Standart或旧版SQL在BigQuery中使用两个字段创建交叉表 [英] How to create crosstab with two field in bigquery with standart or legacy sql
问题描述
我想从表中获得两列并创建一个交叉表,以查看每个客户在哪个产品类别中购买了多少产品.这是我表中的示例数据:
I want to get two columns from table and create a crosstab to see how many product bought in which product category for each customer. Here is an example data from my table:
Row Customer_ID Style
1 MEM014 BLS87
2 KAR810 DR126
3 NIKE61 MMQ5
4 NIKE61 MMQ5
5 STT019 BLS83
6 STT019 BLS84
7 STT019 BLS87
我想要这样的结果表:
Customer - DR126 - MMQ5 - BLS83 - BLS84 - BLS87
MEM014 0 0 0 0 1
KAR810 1 0 0 0 0
NIKE61 0 2 0 0 0
STT019 0 0 1 1 1
推荐答案
以下是BigQuery标准SQL
Below is for BigQuery Standard SQL
步骤1 -生成数据透视查询
#standardSQL
SELECT CONCAT(
"SELECT Customer_ID,",
STRING_AGG(CONCAT("COUNTIF(Style='", Style, "') ", Style)),
" FROM `project.dataset.your_table` GROUP BY Customer_ID ORDER BY Customer_ID")
FROM (
SELECT DISTINCT Style
FROM `project.dataset.your_table`
ORDER BY Style
)
如果您使用问题中的伪数据运行它,如下所示
If you run it with dummy data from your question like below
#standardSQL
WITH `project.dataset.your_table` AS (
SELECT 'MEM014' Customer_ID, 'BLS87' Style UNION ALL
SELECT 'KAR810', 'DR126' UNION ALL
SELECT 'NIKE61', 'MMQ5' UNION ALL
SELECT 'NIKE61', 'MMQ5' UNION ALL
SELECT 'STT019', 'BLS83' UNION ALL
SELECT 'STT019', 'BLS84' UNION ALL
SELECT 'STT019', 'BLS87'
)
SELECT CONCAT(
"SELECT Customer_ID,",
STRING_AGG(CONCAT("COUNTIF(Style='", Style, "') ", Style)),
" FROM `project.dataset.your_table` GROUP BY Customer_ID")
FROM (
SELECT DISTINCT Style
FROM `project.dataset.your_table`
ORDER BY Style
)
您将获得以下数据透视查询
you will get following pivot query
SELECT Customer_ID,COUNTIF(Style='BLS83') BLS83,COUNTIF(Style='BLS84') BLS84,COUNTIF(Style='BLS87') BLS87,COUNTIF(Style='DR126') DR126,COUNTIF(Style='MMQ5') MMQ5 FROM `project.dataset.your_table` GROUP BY Customer_ID
步骤2 -运行生成的数据透视查询
Step #2 - run generated pivot query
如果对您的虚拟数据运行它-您会得到预期的结果
if you run it against your dummy data - you get expected result
Row Customer_ID BLS83 BLS84 BLS87 DR126 MMQ5
1 KAR810 0 0 0 1 0
2 MEM014 0 0 1 0 0
3 NIKE61 0 0 0 0 2
4 STT019 1 1 1 0 0
注释1 :上面假设您的样式名称符合列名称约定(您的示例中的惯例).如果不是,则需要转义不受支持的字符,以此类推(可轻松调整第1步)
注释2 :未解决的最大查询长度为256 KB.因此,如果您的样式名称与示例中的样式名称相似-上面的解决方案将支持大约8500种样式,这应小于表
Note 1: Above assumes your Style names comply with column names convention (those in your example do). If not - you will need to escape not supported characters and so on (easy adjustment for step 1)
Note 2: Maximum unresolved query length is 256 KB. So if your Style names are similar to those in your example - above solution will support around 8500 styles, which should be less than limit (10K?) for number of columns in table
这篇关于如何使用Standart或旧版SQL在BigQuery中使用两个字段创建交叉表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!