将多行合并为一 [英] Combining multiple rows into one
问题描述
我在PostgreSQL中有一个数据库结构,看起来像这样:
I have a database structure in PostgreSQL that looks something like this:
DROP TABLE IF EXISTS medium CASCADE;
DROP TABLE IF EXISTS works CASCADE;
DROP DOMAIN IF EXISTS nameVal CASCADE;
DROP DOMAIN IF EXISTS numID CASCADE;
DROP DOMAIN IF EXISTS alphaID CASCADE;
CREATE DOMAIN alphaID AS VARCHAR(10);
CREATE DOMAIN numID AS INT;
CREATE DOMAIN nameVal AS VARCHAR(40);
CREATE TABLE works (
w_alphaID alphaID NOT NULL,
w_numID numID NOT NULL,
w_title nameVal NOT NULL,
PRIMARY KEY(w_alphaID,w_numID));
CREATE TABLE medium (
m_alphaID alphaID NOT NULL,
m_numID numID NOT NULL,
m_title nameVal NOT NULL,
FOREIGN KEY(m_alphaID,m_numID) REFERENCES
works ON UPDATE CASCADE ON DELETE CASCADE);
INSERT INTO works VALUES('AB',1,'Sunset'),
('CD',2,'Beach'),
('EF',3,'Flower');
INSERT INTO medium VALUES('AB',1,'Wood'),
('AB',1,'Oil'),
('CD',2,'Canvas'),
('CD',2,'Oil'),
('CD',2,'Bronze'),
('EF',3,'Paper'),
('EF',3,'Pencil');
SELECT * FROM works;
SELECT * FROM medium;
SELECT w_alphaID AS alphaID, w_numID AS numID, w_title AS
Name_of_work, m_title AS Material_used
FROM works, medium WHERE
works.w_alphaID = medium.m_alphaID
AND works.w_numID = medium.m_numID;
输出看起来像这样:
w_alphaid | w_numid | w_title
-----------+---------+---------
AB | 1 | Sunset
CD | 2 | Beach
EF | 3 | Flower
(3 rows)
m_alphaid | m_numid | m_title
-----------+---------+---------
AB | 1 | Wood
AB | 1 | Oil
CD | 2 | Canvas
CD | 2 | Oil
CD | 2 | Bronze
EF | 3 | Paper
EF | 3 | Pencil
(7 rows)
alphaid | numid | name_of_work | material_used
---------+-------+--------------+---------------
AB | 1 | Sunset | Wood
AB | 1 | Sunset | Oil
CD | 2 | Beach | Canvas
CD | 2 | Beach | Oil
CD | 2 | Beach | Bronze
EF | 3 | Flower | Paper
EF | 3 | Flower | Pencil
(7 rows)
现在我的问题是,我应该使用哪种查询来使最后一个SELECT
语句的格式看起来像这样:
Now my question is what query should I use to have the format of the last SELECT
statement to look something like this:
alphaid | numid | name_of_work | material_used_1 | material_used_2 | material_used_3
---------+-------+--------------+-----------------+-----------------+---------------
AB | 1 | Sunset | Wood | Oil |
CD | 2 | Beach | Canvas | Oil | Bronze
EF | 3 | Flower | Paper | Pencil |
(3 rows)
我研究了使用string_agg()
的方法,但这将这些值放入一个单元格中,但是我希望为每个值都使用一个单独的单元格.我尝试使用join来查看是否可以实现这样的输出,但到目前为止没有成功.感谢您抽出宝贵的时间来研究这个问题.
I looked into using string_agg()
but that puts the values into one cell but I am looking to have a separate cell for each value. I tried using join to see if I can achieve such output but with no success so far. I appreciate you taking the time to look at this question.
推荐答案
使用更简单的模式会更简单:
This would be simpler with a simpler schema:
- 没有域类型(目的是什么?)
- 向表
medium
添加实际PK
- 而是使用代理PK(
serial
列)代替两种域类型上的多列PK和FK.
或者,对于内容相同的列,至少使用相同的(简单的)列名:仅使用alpha_id
代替m_alphaID
和w_alphaID
等.
- No domain types (what's the purpose?)
- Add an actual PK to table
medium
- Rather use a surrogate PKs (a
serial
column) instead of the multicolumn PK and FK over two domain types.
Or at least use the same (simpler) column name for columns with the same content: justalpha_id
instead ofm_alphaID
andw_alphaID
etc.
此外,以下是您设置的解决方案 原样 :
That aside, here are solutions for your setup as is:
您的 crosstab()
查询有一些特定的困难:
There are several specific difficulties for your crosstab()
query:
- 没有任何一列可以用作 row_name .
- 多列.
- 没有类别列.
- 没有定义值的顺序(所以我改用任意顺序).
- No single column that can serve as row_name.
- Multiple extra columns.
- No category column.
- No defined order for values (so I use arbitrary order instead).
基本知识(请先阅读!! ):
对于您的特殊情况:
- Pivot on Multiple Columns using Tablefunc
- Dynamic alternative to pivot with CASE and GROUP BY
解决方案:
SELECT alphaid, numid, name_of_work, material_1, material_2, material_3
FROM crosstab(
'SELECT rn, w.alphaid, w.numid, w.name_of_work
, row_number() OVER (PARTITION BY rn) AS mat_nr -- order undefined!
, m_title AS Material_used
FROM (
SELECT w_alphaID AS alphaid, w_numID AS numid, w_title AS name_of_work
, row_number() OVER (ORDER BY w_alphaID, w_numID) AS rn
FROM works
) w
JOIN medium m ON w.alphaid = m.m_alphaID
AND w.numid = m.m_numID
ORDER BY rn, mat_nr'
, 'VALUES (1), (2), (3)' -- add more ...
)
AS ct (
rn bigint, alphaid text, numid int, name_of_work text
, material_1 text, material_2 text, material_3 text -- add more ...
);
使用标准SQL的可怜人的交叉表
如果附加模块 tablefunc 无法安装,或者性能并不重要,这个更简单的查询会执行相同的操作,而且速度较慢:
Poor man's crosstab with standard SQL
If the additional module tablefunc cannot be installed or if top performance is not important, this simpler query does the same, slower:
SELECT w_alphaid AS alphaid, w_numid AS numid, w_title AS name_of_work
, arr[1] AS material_used_1
, arr[2] AS material_used_2
, arr[3] AS material_used_3 -- add more?
FROM works w
LEFT JOIN (
SELECT m_alphaid, m_numid, array_agg(m_title::text) AS arr
FROM medium
GROUP BY m_alphaid, m_numid
) m ON w.w_alphaid = m.m_alphaid
AND w.w_numid = m.m_numid;
-
强制转换为
text
(或varchar
...)是必要的,因为您的自定义域没有预定义的数组类型.或者,您可以定义缺少的数组类型.The cast to
text
(orvarchar
...) is necessary because there is no predefined array type for your custom domain. Alternatively you could define the missing array type.与上面的一个细微差别:在这里使用
LEFT JOIN
而不是仅使用JOIN
来保留works
中完全没有 相关材料的works
中的行.One subtle difference to the above: using
LEFT JOIN
here instead of justJOIN
to preserve rows fromworks
that have no related materials inmedium
at all.由于您返回了整个表,因此在您加入之前在
medium
中聚集行会更便宜.对于较小的选择,先加入然后然后聚集可能会更便宜.相关:Since you return the whole table, it's cheaper to aggregate rows in
medium
before you join. For a small selection it might be cheaper to join first and then aggregate. Related:这篇关于将多行合并为一的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!