列向量和派生位向量的加权和-版本2 [英] Weighted sum of a column vector and a derived bit vector - Version 2

查看:80
本文介绍了列向量和派生位向量的加权和-版本2的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个报价表和两个买家的规模.尺寸为 s 的出价 p 表示买方愿意以价格 p 购买产品的 s 号.我们有一个表,其中包含几列(例如时间戳记,有效性标志)以及以下四列:

  • 两个买家 pA pB 的出价.
  • 出价大小, sA sB .

我们的工作是在表中添加一个新的最佳尺寸列( bS ),该列以最佳价格返回尺寸.如果两个买家的价格相同,则 bS 等于 sA + sB ,否则,我们需要采用提供更高价格的买家的出价大小. /p>

下面是带有所需输出的示例表(忽略既不是价格也不是大小的列).

该问题的简单解决方案:

SELECT *,
  CASE
    WHEN pA = pB THEN sA + sB
    WHEN pA > pB THEN sA
    ELSE sB
  END AS bS
FROM t

现在让我们将问题推广到四个买家.一个标准的SQL解决方案是

WITH t_ext AS (
SELECT *, GREATEST(pA, pB, pC, pD) as bP
FROM `t` 
)
SELECT *, (sA * CAST(pA = bP AS INT64) + 
           sB * CAST(pB = bP AS INT64) + 
           sC * CAST(pC = bP AS INT64) +
           sD * CAST(pD = bP AS INT64)) 
AS bS FROM t_ext

问题:

是否存在一个简化的查询

  • 使用函数SUM而不是手动添加四个项目
  • 避免重复投射?

请注意,我们无法按索引标识价格和尺寸列,而只能按名称标识 .否则,我们可以使用

提出的解决方案

列向量的加权和和派生的位向量

顺便说一句.我写了一个博客文章关于这个问题的重点是Python和Q的解决方案,我想知道标准sql中最好的解决方案的样子.

解决方案

以下是BigQuery标准SQL

请注意,我们无法按索引标识价格和尺寸列,而只能按名称标识

#standardSQL
WITH t_ext AS (
  SELECT * EXCEPT(arr), 
    ARRAY(SELECT CAST(val AS INT64) FROM UNNEST(arr) val WITH OFFSET WHERE OFFSET < ARRAY_LENGTH(arr) / 2) AS prices,
    ARRAY(SELECT CAST(val AS INT64) FROM UNNEST(arr) val WITH OFFSET WHERE OFFSET >= ARRAY_LENGTH(arr) / 2) AS sizes,
    (SELECT MAX(CAST(val AS INT64)) FROM UNNEST(arr) val WITH OFFSET WHERE OFFSET < ARRAY_LENGTH(arr) / 2) AS bestPrice
  FROM (
    SELECT *, REGEXP_EXTRACT_ALL(TO_JSON_STRING(T), r'(?:"(?:pA|pB|pC|pD|sA|sB|sC|sD)"):(\d+)') AS arr
    FROM `project.dataset.table` t
  )
)
SELECT * EXCEPT(prices, sizes), 
  (SELECT SUM(size)
    FROM UNNEST(prices) price WITH OFFSET
    JOIN UNNEST(sizes) size WITH OFFSET
    USING(OFFSET) 
    WHERE price = bestPrice
  ) AS bS
FROM t_ext

您将看到-唯一需要提供的是价格和尺寸列名称列表,如以下示例所示

pA|pB|pC|pD|sA|sB|sC|sD    

如果要应用于以下虚拟数据

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'a' id, 1 pA, 2 pB, 3 pC, 4 pD, 'x' extra_col1, 1 sA, 1 sB, 1 sC, 5 sD UNION ALL
  SELECT 'b', 1, 4, 2, 4, 'y', 1, 6, 1, 5 UNION ALL
  SELECT 'c', 5, 4, 2, 1, 'z', 7, 1, 1, 1
), t_ext AS (
  SELECT * EXCEPT(arr), 
    ARRAY(SELECT CAST(val AS INT64) FROM UNNEST(arr) val WITH OFFSET WHERE OFFSET < ARRAY_LENGTH(arr) / 2) AS prices,
    ARRAY(SELECT CAST(val AS INT64) FROM UNNEST(arr) val WITH OFFSET WHERE OFFSET >= ARRAY_LENGTH(arr) / 2) AS sizes,
    (SELECT MAX(CAST(val AS INT64)) FROM UNNEST(arr) val WITH OFFSET WHERE OFFSET < ARRAY_LENGTH(arr) / 2) AS bestPrice
  FROM (
    SELECT *, REGEXP_EXTRACT_ALL(TO_JSON_STRING(T), r'(?:"(?:pA|pB|pC|pD|sA|sB|sC|sD)"):(\d+)') AS arr
    FROM `project.dataset.table` t
  )
)
SELECT * EXCEPT(prices, sizes), 
  (SELECT SUM(size)
    FROM UNNEST(prices) price WITH OFFSET
    JOIN UNNEST(sizes) size WITH OFFSET
    USING(OFFSET) 
    WHERE price = bestPrice
  ) AS bS
FROM t_ext

结果是

Row id  pA  pB  pC  pD  extra_col1  sA  sB  sC  sD  bestPrice   bS   
1   a   1   2   3   4   x           1   1   1   5   4           5    
2   b   1   4   2   4   y           1   6   1   5   4           11   
3   c   5   4   2   1   z           7   1   1   1   5           7      

希望,这就是您要寻找的

We have a table of bid prices and sizes of two buyers. Bid price p with size s means that the buyer is open to buy s number of product at price p. We have a table that contains a few columns (like timestamp, validity flag) together with these four columns:

  • bid prices offered by the two buyers, pA and pB.
  • bid sizes, sA and sB.

Our job is to add a new best size column (bS) to the table, that returns the size at the best price. If the two buyers have the same price then bS is equal to sA + sB, otherwise, we need to take the bid size of the buyer that offers the higher price.

An example table (ignoring columns that are neither prices nor sizes) with the desired output is below.

A simple solution to the problem:

SELECT *,
  CASE
    WHEN pA = pB THEN sA + sB
    WHEN pA > pB THEN sA
    ELSE sB
  END AS bS
FROM t

Now let us generalize the problem to four buyers. A standard SQL solution is

WITH t_ext AS (
SELECT *, GREATEST(pA, pB, pC, pD) as bP
FROM `t` 
)
SELECT *, (sA * CAST(pA = bP AS INT64) + 
           sB * CAST(pB = bP AS INT64) + 
           sC * CAST(pC = bP AS INT64) +
           sD * CAST(pD = bP AS INT64)) 
AS bS FROM t_ext

Question:

Is there a simplified query that

  • uses function SUM instead of adding four items manually
  • avoids repeated casting?

Note that we cannot identify the price and size columns by indices but only by name. Otherwise, we could use the solution proposed at

Weighted sum of a column vector and a derived bit vector

Btw. I wrote a blog post about this problem that focuses on solutions in Python and Q and I am wondering how the best solution in standard sql looks like.

解决方案

Below is for BigQuery Standard SQL

Note that we cannot identify the price and size columns by indices but only by name

#standardSQL
WITH t_ext AS (
  SELECT * EXCEPT(arr), 
    ARRAY(SELECT CAST(val AS INT64) FROM UNNEST(arr) val WITH OFFSET WHERE OFFSET < ARRAY_LENGTH(arr) / 2) AS prices,
    ARRAY(SELECT CAST(val AS INT64) FROM UNNEST(arr) val WITH OFFSET WHERE OFFSET >= ARRAY_LENGTH(arr) / 2) AS sizes,
    (SELECT MAX(CAST(val AS INT64)) FROM UNNEST(arr) val WITH OFFSET WHERE OFFSET < ARRAY_LENGTH(arr) / 2) AS bestPrice
  FROM (
    SELECT *, REGEXP_EXTRACT_ALL(TO_JSON_STRING(T), r'(?:"(?:pA|pB|pC|pD|sA|sB|sC|sD)"):(\d+)') AS arr
    FROM `project.dataset.table` t
  )
)
SELECT * EXCEPT(prices, sizes), 
  (SELECT SUM(size)
    FROM UNNEST(prices) price WITH OFFSET
    JOIN UNNEST(sizes) size WITH OFFSET
    USING(OFFSET) 
    WHERE price = bestPrice
  ) AS bS
FROM t_ext

As you can see - the only what you should supply is the list of price and size column names as in below example

pA|pB|pC|pD|sA|sB|sC|sD    

If to apply to dummy data as below

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'a' id, 1 pA, 2 pB, 3 pC, 4 pD, 'x' extra_col1, 1 sA, 1 sB, 1 sC, 5 sD UNION ALL
  SELECT 'b', 1, 4, 2, 4, 'y', 1, 6, 1, 5 UNION ALL
  SELECT 'c', 5, 4, 2, 1, 'z', 7, 1, 1, 1
), t_ext AS (
  SELECT * EXCEPT(arr), 
    ARRAY(SELECT CAST(val AS INT64) FROM UNNEST(arr) val WITH OFFSET WHERE OFFSET < ARRAY_LENGTH(arr) / 2) AS prices,
    ARRAY(SELECT CAST(val AS INT64) FROM UNNEST(arr) val WITH OFFSET WHERE OFFSET >= ARRAY_LENGTH(arr) / 2) AS sizes,
    (SELECT MAX(CAST(val AS INT64)) FROM UNNEST(arr) val WITH OFFSET WHERE OFFSET < ARRAY_LENGTH(arr) / 2) AS bestPrice
  FROM (
    SELECT *, REGEXP_EXTRACT_ALL(TO_JSON_STRING(T), r'(?:"(?:pA|pB|pC|pD|sA|sB|sC|sD)"):(\d+)') AS arr
    FROM `project.dataset.table` t
  )
)
SELECT * EXCEPT(prices, sizes), 
  (SELECT SUM(size)
    FROM UNNEST(prices) price WITH OFFSET
    JOIN UNNEST(sizes) size WITH OFFSET
    USING(OFFSET) 
    WHERE price = bestPrice
  ) AS bS
FROM t_ext

result is

Row id  pA  pB  pC  pD  extra_col1  sA  sB  sC  sD  bestPrice   bS   
1   a   1   2   3   4   x           1   1   1   5   4           5    
2   b   1   4   2   4   y           1   6   1   5   4           11   
3   c   5   4   2   1   z           7   1   1   1   5           7      

Hope, this is what you are looking for

这篇关于列向量和派生位向量的加权和-版本2的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆