将多行合并为一 [英] Combining multiple rows into one

查看:62
本文介绍了将多行合并为一的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在PostgreSQL中有一个数据库结构,看起来像这样:

I have a database structure in PostgreSQL that looks something like this:

DROP TABLE IF EXISTS  medium  CASCADE;
DROP TABLE IF EXISTS  works   CASCADE;
DROP DOMAIN IF EXISTS nameVal CASCADE;
DROP DOMAIN IF EXISTS numID   CASCADE;
DROP DOMAIN IF EXISTS alphaID CASCADE;

CREATE DOMAIN alphaID   AS VARCHAR(10);
CREATE DOMAIN numID     AS INT;
CREATE DOMAIN nameVal   AS VARCHAR(40);

CREATE TABLE works (
   w_alphaID    alphaID     NOT NULL,
   w_numID      numID       NOT NULL,
   w_title      nameVal     NOT NULL,
   PRIMARY KEY(w_alphaID,w_numID));


CREATE TABLE medium (
   m_alphaID    alphaID     NOT NULL,
   m_numID      numID       NOT NULL,
   m_title      nameVal     NOT NULL,
   FOREIGN KEY(m_alphaID,m_numID) REFERENCES 
      works ON UPDATE CASCADE ON DELETE CASCADE);

INSERT INTO works VALUES('AB',1,'Sunset'),
                        ('CD',2,'Beach'),
                        ('EF',3,'Flower');

INSERT INTO medium VALUES('AB',1,'Wood'),
                         ('AB',1,'Oil'),
                         ('CD',2,'Canvas'),
                         ('CD',2,'Oil'),
                         ('CD',2,'Bronze'),
                         ('EF',3,'Paper'),
                         ('EF',3,'Pencil');
SELECT * FROM works;
SELECT * FROM medium;

SELECT w_alphaID AS alphaID, w_numID AS numID, w_title AS
       Name_of_work, m_title AS Material_used 
     FROM works, medium WHERE 
       works.w_alphaID = medium.m_alphaID 
       AND works.w_numID = medium.m_numID;

输出看起来像这样:

 w_alphaid | w_numid | w_title 
-----------+---------+---------
 AB        |       1 | Sunset
 CD        |       2 | Beach
 EF        |       3 | Flower
(3 rows)

 m_alphaid | m_numid | m_title 
-----------+---------+---------
 AB        |       1 | Wood
 AB        |       1 | Oil
 CD        |       2 | Canvas
 CD        |       2 | Oil
 CD        |       2 | Bronze
 EF        |       3 | Paper
 EF        |       3 | Pencil
(7 rows)

 alphaid | numid | name_of_work | material_used 
---------+-------+--------------+---------------
 AB      |     1 | Sunset       | Wood
 AB      |     1 | Sunset       | Oil
 CD      |     2 | Beach        | Canvas
 CD      |     2 | Beach        | Oil
 CD      |     2 | Beach        | Bronze
 EF      |     3 | Flower       | Paper
 EF      |     3 | Flower       | Pencil
(7 rows)

现在我的问题是,我应该使用哪种查询来使最后一个SELECT语句的格式看起来像这样:

Now my question is what query should I use to have the format of the last SELECT statement to look something like this:

 alphaid | numid | name_of_work | material_used_1 | material_used_2 | material_used_3 
---------+-------+--------------+-----------------+-----------------+---------------
 AB      |     1 | Sunset       | Wood            | Oil             |
 CD      |     2 | Beach        | Canvas          | Oil             | Bronze
 EF      |     3 | Flower       | Paper           | Pencil          |
(3 rows)

我研究了使用string_agg()的方法,但这将这些值放入一个单元格中,但是我希望为每个值都使用一个单独的单元格.我尝试使用join来查看是否可以实现这样的输出,但到目前为止没有成功.感谢您抽出宝贵的时间来研究这个问题.

I looked into using string_agg() but that puts the values into one cell but I am looking to have a separate cell for each value. I tried using join to see if I can achieve such output but with no success so far. I appreciate you taking the time to look at this question.

推荐答案

使用更简单的模式会更简单:

This would be simpler with a simpler schema:

  • 没有域类型(目的是什么?)
  • 向表medium
  • 添加实际PK
  • 而是使用代理PK(serial列)代替两种域类型上的多列PK和FK.
    或者,对于内容相同的列,至少使用相同的(简单的)列名:仅使用alpha_id代替m_alphaIDw_alphaID等.
  • No domain types (what's the purpose?)
  • Add an actual PK to table medium
  • Rather use a surrogate PKs (a serial column) instead of the multicolumn PK and FK over two domain types.
    Or at least use the same (simpler) column name for columns with the same content: just alpha_id instead of m_alphaID and w_alphaID etc.

此外,以下是您设置的解决方案 原样 :

That aside, here are solutions for your setup as is:

您的 crosstab() 查询有一些特定的困难:

There are several specific difficulties for your crosstab() query:

  • 没有任何一列可以用作 row_name .
  • 多列.
  • 没有类别列.
  • 没有定义值的顺序(所以我改用任意顺序).
  • No single column that can serve as row_name.
  • Multiple extra columns.
  • No category column.
  • No defined order for values (so I use arbitrary order instead).

基本知识(请先阅读!! ):

对于您的特殊情况:

  • Pivot on Multiple Columns using Tablefunc
  • Dynamic alternative to pivot with CASE and GROUP BY

解决方案:

SELECT alphaid, numid, name_of_work, material_1, material_2, material_3
FROM   crosstab(
  'SELECT rn, w.alphaid, w.numid, w.name_of_work
        , row_number() OVER (PARTITION BY rn) AS mat_nr  -- order undefined!
        , m_title AS Material_used 
   FROM  (
      SELECT w_alphaID AS alphaid, w_numID AS numid, w_title AS name_of_work
           , row_number() OVER (ORDER BY w_alphaID, w_numID) AS rn
       FROM  works
      ) w
   JOIN   medium m ON w.alphaid = m.m_alphaID 
                  AND w.numid   = m.m_numID
   ORDER  BY rn, mat_nr'
 , 'VALUES (1), (2), (3)'  -- add more ...
)
 AS ct (
    rn bigint, alphaid text, numid int, name_of_work text
  , material_1 text, material_2 text, material_3 text  -- add more ...
   );

使用标准SQL的可怜人的交叉表

如果附加模块 tablefunc 无法安装,或者性能并不重要,这个更简单的查询会执行相同的操作,而且速度较慢:

Poor man's crosstab with standard SQL

If the additional module tablefunc cannot be installed or if top performance is not important, this simpler query does the same, slower:

SELECT w_alphaid AS alphaid, w_numid AS numid, w_title AS name_of_work
     , arr[1] AS material_used_1
     , arr[2] AS material_used_2
     , arr[3] AS material_used_3 -- add more?
FROM   works w
LEFT  JOIN (
   SELECT m_alphaid, m_numid, array_agg(m_title::text) AS arr
   FROM   medium
   GROUP  BY m_alphaid, m_numid
   ) m ON w.w_alphaid = m.m_alphaid 
      AND w.w_numid   = m.m_numid;

  • 强制转换为text(或varchar ...)是必要的,因为您的自定义域没有预定义的数组类型.或者,您可以定义缺少的数组类型.

    • The cast to text (or varchar ...) is necessary because there is no predefined array type for your custom domain. Alternatively you could define the missing array type.

      与上面的一个细微差别:在这里使用LEFT JOIN而不是仅使用JOIN来保留works中完全没有 相关材料的works中的行.

      One subtle difference to the above: using LEFT JOIN here instead of just JOIN to preserve rows from works that have no related materials in medium at all.

      由于您返回了整个表,因此在您加入之前在medium 中聚集行会更便宜.对于较小的选择,先加入然后然后聚集可能会更便宜.相关:

      Since you return the whole table, it's cheaper to aggregate rows in medium before you join. For a small selection it might be cheaper to join first and then aggregate. Related:

      这篇关于将多行合并为一的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆