如何使用Python Cubes取回2维的聚合值? [英] How to get back aggregate values across 2 dimensions using Python Cubes?

查看:94
本文介绍了如何使用Python Cubes取回2维的聚合值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用Python 3,Django 1.9,Cubes 1.1和Postgres 9.5.
这些是我的图形形式的数据表:

Using Python 3, Django 1.9, Cubes 1.1, and Postgres 9.5.
These are my datatables in pictorial form:

相同的文本格式:

存储表

------------------------------
| id  | code | address       |
|-----|------|---------------|
| 1   | S1   | Kings Row     |
| 2   | S2   | Queens Street |
| 3   | S3   | Jacks Place   |
| 4   | S4   | Diamonds Alley|
| 5   | S5   | Hearts Road   |
------------------------------

产品表

------------------------------
| id  | code | name          |
|-----|------|---------------|
| 1   | P1   | Saucer 12     |
| 2   | P2   | Plate 15      |
| 3   | P3   | Saucer 13     |
| 4   | P4   | Saucer 14     |
| 5   | P5   | Plate 16      |
|  and many more ....        |
|1000 |P1000 | Bowl 25       |
|----------------------------|

销售表

----------------------------------------
| id  | product_id | store_id | amount |
|-----|------------|----------|--------|
| 1   | 1          | 1        |7.05    |
| 2   | 1          | 2        |9.00    |
| 3   | 2          | 3        |1.00    |
| 4   | 2          | 3        |1.00    |
| 5   | 2          | 5        |1.00    |
|  and many more ....                  |
| 1000| 20         | 4        |1.00    |
|--------------------------------------|

这些关系是:

  1. 销售属于商店
  2. 销售属于产品
  3. 商店有很多销售
  4. 产品有很多销量

我想实现的目标

我想使用多维数据集以下列方式通过分页进行显示:

What I want to achieve

I want to use cubes to be able to do a display by pagination in the following manner:

给商店S1-S3:

-------------------------
| product | S1 | S2 | S3 |
|---------|----|----|----|
|Saucer 12|7.05|9   | 0  |
|Plate 15 |0   |0   | 2  |
|  and many more ....    |
|------------------------|

请注意以下几点:

  1. 即使商店S3下没有Saucer 12的销售记录,我也显示0,而不是null或全无.
  2. 我希望能够按存储进行排序,例如说S3的降序.
  3. 这些单元格指示在该特定商店中花费的特定产品的总和.
  4. 我也想分页.

我尝试过的

这是我使用的配置:

What I tried

This is the configuration I used:

"cubes": [
    {
        "name": "sales",
        "dimensions": ["product", "store"],
        "joins": [
            {"master":"product_id", "detail":"product.id"},
            {"master":"store_id", "detail":"store.id"}
        ]
    }
],
"dimensions": [
    { "name": "product", "attributes": ["code", "name"] },
    { "name": "store", "attributes": ["code", "address"] }
]

这是我使用的代码:

 result = browser.aggregate(drilldown=['Store','Product'],
                               order=[("Product.name","asc"), ("Store.name","desc"), ("total_products_sale", "desc")])

我没有得到我想要的.
我是这样的:

I didn't get what I want.
I got it like this:

----------------------------------------------
| product_id | store_id | total_products_sale |
|------------|----------|---------------------|
| 1          | 1        |       7.05          |
| 1          | 2        |       9             |
| 2          | 3        |       2.00          |
|  and many more ....                         |
|---------------------------------------------|

这是整个表,没有分页,如果在该商店中未售出商品,则不会显示为零.

which is the whole table with no pagination and if the products not sold in that store it won't show up as zero.

我如何得到想要的东西?

How do I get what I want?

在使用多维数据集运行查询之前,是否需要创建另一个按商店和产品汇总所有数据的数据表?

Do I need to create another data table that aggregates everything by store and product before I use cubes to run the query?

我已经阅读了更多.我意识到我想要的就是所谓的切块,因为我需要跨越两个维度.请参阅: https://en.wikipedia.org/wiki/OLAP_cube#Operations

I have read more. I realised that what I want is called dicing as I needed to go across 2 dimensions. See: https://en.wikipedia.org/wiki/OLAP_cube#Operations

交叉发布在多维数据集GitHub问题上,以获得更多关注.

Cross-posted at Cubes GitHub issues to get more attention.

推荐答案

这是一个纯SQL解决方案,使用来自附加tablefunc模块的 crosstab() 数据透视 .它通常比任何客户端替代方案都具有更好的性能.如果您不熟悉crosstab(),请先阅读此书:

This is a pure SQL solution using crosstab() from the additional tablefunc module to pivot the aggregated data. It typically performs better than any client-side alternative. If you are not familiar with crosstab(), read this first:

关于crosstab()输出中的额外"列的信息:

And this about the "extra" column in the crosstab() output:

SELECT product_id, product
     , COALESCE(s1, 0) AS s1               --  1. ... displayed 0 instead of null
     , COALESCE(s2, 0) AS s2
     , COALESCE(s3, 0) AS s3
     , COALESCE(s4, 0) AS s4
     , COALESCE(s5, 0) AS s5
FROM   crosstab(
     'SELECT s.product_id, p.name, s.store_id, s.sum_amount
      FROM   product p
      JOIN  (
         SELECT product_id, store_id
              , sum(amount) AS sum_amount  -- 3. SUM total of product spent in store
         FROM   sales
         GROUP  BY product_id, store_id
         ) s ON p.id = s.product_id
      ORDER  BY s.product_id, s.store_id;'
   , 'VALUES (1),(2),(3),(4),(5)'          -- desired store_id's
   ) AS ct (product_id int, product text   -- "extra" column
          , s1 numeric, s2 numeric, s3 numeric, s4 numeric, s5 numeric)
ORDER  BY s3 DESC;                         -- 2. ... descending order for S3

精确地产生您想要的结果(加上product_id).

Produces your desired result exactly (plus product_id).

要包含从未销售过的产品,请将[INNER] JOIN替换为LEFT [OUTER] JOIN.

To include products that have never been sold replace [INNER] JOIN with LEFT [OUTER] JOIN.

带有基本查询的 SQL提琴 .
tablefunc模块未安装在sqlfiddle上.

SQL Fiddle with base query.
The tablefunc module is not installed on sqlfiddle.

我包括在product_id中,因为product.name几乎不是唯一的.否则,这可能会导致将两个不同的产品合并在一起的错误消息.

I am including with product_id because product.name is hardly unique. This might otherwise lead to sneaky errors conflating two different products.

如果可以确保参照完整性,则在查询中不需要store表.

You don't need the store table in the query if referential integrity is guaranteed.

ORDER BY s3 DESC起作用,因为s3引用 output 列,其中NULL值已替换为COALESCE.否则,我们将需要DESC NULLS LAST最后对NULL值进行排序:

ORDER BY s3 DESC works, because s3 references the output column where NULL values have been replaced with COALESCE. Else we would need DESC NULLS LAST to sort NULL values last:

要动态构建crosstab()查询,请考虑:

For building crosstab() queries dynamically consider:

  1. 我也想分页.

最后一项是模糊的.可以使用LIMITOFFSET进行简单的分页:

That last item is fuzzy. Simple pagination can be had with LIMIT and OFFSET:

我会考虑使用 MATERIALIZED VIEW 来实现分页前的结果.如果您的页面尺寸稳定,则可以在MV中添加页码,以便轻松快速地获得结果.

I would consider a MATERIALIZED VIEW to materialize results before pagination. If you have a stable page size I would add page numbers to the MV for easy and fast results.

要优化 big 结果集的性能,请考虑:

To optimize performance for big result sets, consider:

  • SQL syntax term for 'WHERE (col1, col2) < (val1, val2)'
  • Optimize query with OFFSET on large table

这篇关于如何使用Python Cubes取回2维的聚合值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆