如何在Google BigQuery中旋转 [英] How to Pivot in Google BigQuery

查看:78
本文介绍了如何在Google BigQuery中旋转的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我已将以下查询发送到BQ:

Suppose I have the following query sent to BQ:

SELECT shipmentID, category, quantity
FROM [myDataset.myTable]

此外,假设查询返回的数据如下:

Further, suppose that the query returns data such as:

shipmentID  category  quantity
1           shoes     5
1           hats      3
2           shirts    1
2           hats      2
3           toys      3
2           books     1
3           shirts    1

如何在BQ中枢转结果以产生如下输出:

How can I pivot the results, from within BQ, to produce output as follows:

 shipmentID   shoes  hats  shirts  toys  books
 1            5      3     0       0     0
 2            0      2     1       0     1
 3            0      0     1       3     0

作为其他背景知识,我实际上有2000多个需要旋转的类别,并且数据量如此之大,以至于我无法直接通过Python中的Pandas DataFrame进行操作(使用所有内存,然后减速到爬网).我尝试使用关系数据库,但是遇到了列限制,因此即使我必须通过python构建查询,我也希望能够直接在BQ中进行操作.有什么建议吗?

As some additional background, I actually have 2000+ categories that I need to pivot, and the quantity of data is such that I can't do it directly through a Pandas DataFrame in Python (uses all the memory, then slows to a crawl). I tried using a relational database, but ran into a column limit, so I'd like to be able to do it directly in BQ, even if I have to build the query itself through python. Any suggestions?

**编辑1 我应该提到,数据本身的旋转可以分块完成,因此不是问题.真正的麻烦在于后来尝试进行聚合,因此每个shipmentID只有一行.那就是吃掉所有RAM的原因.

** Edit 1 I should mention that pivoting the data itself can be done in chunks and is therefore not the issue. The real trouble comes in trying to do the aggregation afterwards, so that I have only one row for each shipmentID. That's what eats all the RAM.

**编辑2 在尝试了下面可接受的答案之后,我发现尝试使用它创建2k +列数据透视表会导致超出资源"错误.我的BQ团队能够重构查询以将其分解为较小的块并允许其通过.查询的基本结构如下:

** Edit 2 After trying out the accepted answer below, I found that trying to use it to create a 2k+ column pivot table was causing "Resources exceeded" errors. My BQ team was able to refactor the query to break it into smaller chunks and allow it to go through. The basic structure of the query is as follows:

SELECT
  SetA.*,
  SetB.*,
  SetC.*
FROM (
  SELECT
    shipmentID,
    SUM(IF (category="Rocks", qty, 0)),
    SUM(IF (category="Paper", qty, 0)),
    SUM(IF (category="Scissors", qty, 0))
  FROM (
    SELECT
      a.shipmentid shipmentid,
      a.quantity quantity,
      a.category category
    FROM
      [myDataset.myTable] a)
  GROUP EACH BY
    shipmentID ) SetA
INNER JOIN EACH (
  SELECT
    shipmentID,
    SUM(IF (category="Jello Molds", quantity, 0)),
    SUM(IF (category="Torque Wrenches", quantity, 0))
  FROM (
    SELECT
      a.shipmentID shipmentID,
      a.quantity quantity,
      a.category category
    FROM
      [myDataset.myTable] a)
  GROUP EACH BY
    shipmentID ) SetB
ON
  SetA.shipmentid = SetB.shipmentid
INNER JOIN EACH (
  SELECT
    shipmentID,
    SUM(IF (category="Deep Thoughts", qty, 0)),
    SUM(IF (category="Rainbows", qty, 0)),
    SUM(IF (category="Ponies", qty, 0))
  FROM (
    SELECT
      a.shipmentid shipmentid,
      a.quantity quantity,
      a.category category
    FROM
      [myDataset.myTable] a)
  GROUP EACH BY
    shipmentID ) SetC
ON
  SetB.shipmentID = SetC.shipmentID

可以通过依次添加INNER JOIN EACH段来无限期地继续上述模式.对于我的应用程序,BQ能够处理每个块约500列.

The above pattern can be continued indefinitely by adding INNER JOIN EACH segments one after the other. For my application, BQ was able to handle about 500 columns per chunk.

推荐答案

这是一种方法:

select shipmentID,
  sum(IF (category='shoes', quantity, 0)) AS shoes,
  sum(IF (category='hats', quantity, 0)) AS hats,
  sum(IF (category='shirts', quantity, 0)) AS shirts,
  sum(IF (category='toys', quantity, 0)) AS toys,
  sum(IF (category='books', quantity, 0)) AS books,
from
  (select 1 as shipmentID,           'shoes' as category,    5 as quantity),
  (select 1 as shipmentID,           'hats' as category,      3 as quantity),
  (select 2 as shipmentID,           'shirts' as category,    1 as quantity),
  (select 2 as shipmentID,           'hats' as category,      2 as quantity),
  (select 3 as shipmentID,           'toys' as category,      3 as quantity),
  (select 2 as shipmentID,           'books' as category,     1 as quantity),
  (select 3 as shipmentID,           'shirts' as category,    1 as quantity),
group by shipmentID

这将返回:

+-----+------------+-------+------+--------+------+-------+---+
| Row | shipmentID | shoes | hats | shirts | toys | books |   |
+-----+------------+-------+------+--------+------+-------+---+
|   1 |          1 |     5 |    3 |      0 |    0 |     0 |   |
|   2 |          2 |     0 |    2 |      1 |    0 |     1 |   |
|   3 |          3 |     0 |    0 |      1 |    3 |     0 |   |
+-----+------------+-------+------+--------+------+-------+---+

有关其他数据透视表示例,请参见手册.

See the manual for other pivot table example.

这篇关于如何在Google BigQuery中旋转的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆