BigQuery 透视数据行列 [英] BigQuery Pivot Data Rows Columns

查看:18
本文介绍了BigQuery 透视数据行列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前在 BigQuery 中处理数据,然后导出到 Excel 中以制作最终的数据透视表,并希望能够使用 BigQuery 中的 PIVOT 选项创建相同的数据.

我在大查询中的数据集看起来像

Transaction_Month ||消费者 ID ||CUST_createdMonth01/01/2015 ||1 ||01/01/201501/01/2015 ||1 ||01/01/201501/02/2015 ||1 ||01/01/201501/01/2015 ||2 ||01/01/201501/02/2015 ||3 ||01/02/201501/02/2015 ||4 ||01/02/201501/02/2015 ||5 ||01/02/201501/03/2015 ||5 ||01/02/201501/03/2015 ||6 ||01/03/201501/04/2015 ||6 ||01/03/201501/06/2015 ||6 ||01/03/201501/03/2015 ||7 ||01/03/201501/04/2015 ||8 ||01/04/201501/05/2015 ||8 ||01/04/201501/04/2015 ||9 ||01/04/2015

它本质上是一个附加了客户信息的订单表.

当我将此数据放入 excel 时,我将其添加到数据透视表中,将 CUST_createdMonth 添加为行,将 Transaction_Month 添加为列,值是 ConsumerID 的不同计数

输出如下

在 BigQuery 中可以实现这种枢轴吗?

解决方案

在 BigQuery 中没有很好的方法可以做到这一点,但您可以按照以下想法进行

<块引用>

第一步

在查询下方运行

SELECT 'SELECT CUST_createdMonth, ' +GROUP_CONCAT_UNQUOTED('EXACT_COUNT_DISTINCT(IF(Transaction_Month = "' + Transaction_Month + '", ConsumerId, NULL)) 为 [m_' + REPLACE(Transaction_Month, '/', '_') + ']')+ '从 yourTable GROUP BY CUST_createdMonth ORDER BY CUST_createdMonth'从 (SELECT Transaction_Month从你的表GROUP BY Transaction_MonthORDER BY Transaction_Month)

结果 - 您将得到如下所示的字符串(为了便于阅读,其格式如下)

SELECTCUST_createdMonth,EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/01/2015", ConsumerId, NULL)) AS [m_01_01_2015],EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/02/2015", ConsumerId, NULL)) AS [m_01_02_2015],EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/03/2015", ConsumerId, NULL)) AS [m_01_03_2015],EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/04/2015", ConsumerId, NULL)) AS [m_01_04_2015],EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/05/2015", ConsumerId, NULL)) AS [m_01_05_2015],EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/06/2015", ConsumerId, NULL)) AS [m_01_06_2015]从你的表通过...分组CUST_createdMonth订购者CUST_createdMonth

<块引用>

第二步

只需在组合查询上方运行

结果会像下面一样

CUST_createdMonth m_01_01_2015 m_01_02_2015 m_01_03_2015 m_01_04_2015 m_01_05_2015 m_01_06_201501/01/2015 2 1 0 0 0 001/02/2015 0 3 1 0 0 001/03/2015 0 0 2 1 0 101/04/2015 0 0 0 2 1 0

<块引用>

注意

如果您有几个月的时间来处理太多的手动工作,则第 1 步会很有帮助.
在这种情况下 - 第 1 步可帮助您生成查询

<块引用>

你可以在我的其他帖子中看到更多关于旋转的信息.

如何在 BigQuery 中扩展数据透视?
请注意 - 每个表有 10K 列的限制 - 因此您只能拥有 10K 个组织.
您还可以查看下面的简化示例(如果上面的示例过于复杂/冗长):
如何在 BigQuery/SQL 中将行转置为具有大量数据的列?
如何为 Google BigQuery 中的数千个类别创建虚拟变量列?
在 BigQuery 中透视重复字段

Im currently processing data in BigQuery then export into Excel to do the final Pivot table and was hoping to be able to create the same with the PIVOT option in BigQuery.

My Data set in big query looks like

Transaction_Month || ConsumerId || CUST_createdMonth
01/01/2015        || 1          || 01/01/2015
01/01/2015        || 1          || 01/01/2015
01/02/2015        || 1          || 01/01/2015
01/01/2015        || 2          || 01/01/2015
01/02/2015        || 3          || 01/02/2015
01/02/2015        || 4          || 01/02/2015
01/02/2015        || 5          || 01/02/2015
01/03/2015        || 5          || 01/02/2015
01/03/2015        || 6          || 01/03/2015
01/04/2015        || 6          || 01/03/2015
01/06/2015        || 6          || 01/03/2015
01/03/2015        || 7          || 01/03/2015
01/04/2015        || 8          || 01/04/2015
01/05/2015        || 8          || 01/04/2015
01/04/2015        || 9          || 01/04/2015

It is essentially an order table with customer information appended.

When i put this data into excel I add it to a pivot table, I add the CUST_createdMonth as a Row, Transaction_Month as a column and the value is a distinct Count of the ConsumerID

The output looks as follows

Is this sort of pivot possible in BigQuery?

解决方案

There is no nice way of doing this in BigQuery, but you can do it follow below idea

Step 1

Run below query

SELECT 'SELECT CUST_createdMonth, ' + 
   GROUP_CONCAT_UNQUOTED(
      'EXACT_COUNT_DISTINCT(IF(Transaction_Month = "' + Transaction_Month + '", ConsumerId, NULL)) as [m_' + REPLACE(Transaction_Month, '/', '_') + ']'
   ) 
   + ' FROM yourTable GROUP BY CUST_createdMonth ORDER BY CUST_createdMonth'
FROM (
  SELECT Transaction_Month 
  FROM yourTable
  GROUP BY Transaction_Month
  ORDER BY Transaction_Month
) 

As a result - you will get string like below (it is formatted below for readability sake)

SELECT
  CUST_createdMonth,
  EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/01/2015", ConsumerId, NULL)) AS [m_01_01_2015],
  EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/02/2015", ConsumerId, NULL)) AS [m_01_02_2015],
  EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/03/2015", ConsumerId, NULL)) AS [m_01_03_2015],
  EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/04/2015", ConsumerId, NULL)) AS [m_01_04_2015],
  EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/05/2015", ConsumerId, NULL)) AS [m_01_05_2015],
  EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/06/2015", ConsumerId, NULL)) AS [m_01_06_2015]
  FROM yourTable 
GROUP BY
  CUST_createdMonth
ORDER BY
  CUST_createdMonth

Step 2

Just run above composed query

Result will be lik e below

CUST_createdMonth   m_01_01_2015    m_01_02_2015    m_01_03_2015    m_01_04_2015    m_01_05_2015    m_01_06_2015     
01/01/2015          2               1               0               0               0               0    
01/02/2015          0               3               1               0               0               0    
01/03/2015          0               0               2               1               0               1    
01/04/2015          0               0               0               2               1               0   

Note

Step 1 is helpful if you have many months to pivot so too much of manual work.
In this case - Step 1 helps you to generate your query

You can see more about pivoting in my other posts.

How to scale Pivoting in BigQuery?
Please note – there is a limitation of 10K columns per table - so you are limited with 10K organizations.
You can also see below as simplified examples (if above one is too complex/verbose):
How to transpose rows to columns with large amount of the data in BigQuery/SQL?
How to create dummy variable columns for thousands of categories in Google BigQuery?
Pivot Repeated fields in BigQuery

这篇关于BigQuery 透视数据行列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆