带有TRUE/FALSE标记的Postgresql中的数据透视 [英] Pivot in Postgresql with TRUE/FALSE markings
问题描述
我想知道如何使用TRUE/FALSE值将几个数组值放入列名中. 我给你具体的例子:
I wonder how to put several array values into the column names, with TRUE/FALSE values. I'll give you the concrete example:
我所拥有的是重复的行,由于结果不同,最后一列重复了:
What I have is repeated rows, with the last column duplicate because of different results:
DATE ID Species Illness Tag
20180101 001 Dog Asthma Mucus
20180101 001 Dog Asthma Noisy
20180101 001 Dog Asthma Respiratory
20180102 002 Cat Osteoarthritis Locomotor
20180102 002 Cat Osteoarthritis Limp
...
20180131 003 Bird Avian Pox Itchy
我想得到的是这样:
DATE ID Species Illness Mucus Noisy ... Limp Itchy
20180101 001 Dog Asthma TRUE TRUE ... FALSE FALSE
20180102 002 Cat Osteoarth. FALSE FALSE ... TRUE FALSE
...
20180131 003 Bird Avian Pox FALSE FALSE ... FALSE TRUE
我仅针对部分标签尝试了交叉表"功能,但它给了我不存在的功能错误:
I tried with "crosstab" function just for the part of tags, but it gives me error of unexisting function:
select *
from crosstab (
'select c.id, tg."name"
FROM taggings t
join consultations c
on c.id=t.taggable_id
join tags tg
on t.tag_id=tg.id
group by c.id, tg."name"'
) as final_result(dermatological BOOLEAN, behaviour BOOLEAN)
顺便说一句.我大约有350个标签,所以它不是最佳的功能:/
Btw. I have around 350 tags, so it's not the most optimal function :/
最后,我添加了tablefunc扩展名,并尝试使用crosstab(),但出现以下错误:
Finally I add tablefunc extension, and I tried with crosstab(), but I got the following error:
查询执行失败原因:SQL错误[22023]:错误:无效 源数据SQL语句详细信息:提供的SQL必须返回3 列:rowid,类别和值.
Query execution failed Reason: SQL Error [22023]: ERROR: invalid source data SQL statement Detail: The provided SQL must return 3 columns: rowid, category, and values.
我将尝试在此处找到解决方案并对其进行更新,但是与此同时,如果有人知道如何解决该问题,请分享:)谢谢!
I'll try to find a solution and update it here, but in the meantime if someone knows how to solve it, please share :) Thanks!
经过几天的阅读并尝试了建议的解决方案,这对我有用:
After days of reading and trying suggested solutions, this worked for me:
我要做的是获取3个单独的表,然后将第一个和第三个表连接起来以获取所需的信息,如果标签存在于特定ID中,则将标签作为具有值1/0的列添加. 再进行一次修改=>我实际上并不需要日期,因此我将表格基于咨询ID.
What I did is to get 3 separate tables, and then join the first and the third to get the info I need, plus the tags as columns with the value 1/0 if the tag exists in the certain ID. One more edit => I didn’t actually need the date, so I based the tables on IDs of consultations.
表1: 获取一张需要按ID分组的所有列的表格,并获取一个ID拥有的所有标签.
TABLE 1: Get a table of all the columns you need grouped by IDs, and get all the tags one ID has.
ID Species Age Illness Tag
001 Dog 2 Asthma Mucus
001 Dog 2 Asthma Noisy
001 Dog 2 Asthma Respiratory
002 Cat 5 Osteoarthritis Locomotor
002 Cat 5 Osteoarthritis Limp
...
003 Bird 1 Avian Pox Itchy
表2: 获得将跨所有咨询并带有所有不同标签的列表的笛卡尔乘积,并对它们进行crosstab()函数排序. (交叉表功能需要三列; ID,标签和值)
TABLE 2: Get the Cartesian product that will cross all the consultations with a list of all distinct tags, and order them for crosstab() function. (crosstab function needs to have 3 columns; ID, tags and values)
With consultation_tags as
(here put the query of the TABLE 1),
tag_list as
(select tags."name"
from tags
join taggings t on t.tag_id = tags.id
join consultations c on c.id = t.taggable_id a
group by 1), —-> gets the list of all possible tags in the DB
cartesian_consultations_tags as
(select consultations_tags.id, tag_list.name,
case when tag_list.name = consultations_tags.tag_name then 1
else 0 --> "case" gets the value 1/0 if the tag is present in an ID
end as tag_exists
from
consultations_tags
cross join
tag_list)
select cartesian_consul_tags.id, cartesian_consul_tags.name,
SUM(cartesian_consul_tags.tag_exists) --> for me, the values were duplicated, and so were tags
from cartesian_consul_tags
group by 1, 2
order by 1, 2
—>标记的顺序在这里非常重要,因为您是在交叉表函数中命名列的人.它不会将某些标签转换为列,而只会传递该标签位置的值,因此,如果您弄乱了命名顺序,则这些值将无法正确对应.
—> the order of tags is really important here, because you are the one who names the columns in crosstab function; it doesn’t transform certain tag to a column, it only transfers the value of that tag position, so if you mess the naming order, the values will not correspond correctly.
表3: 第二个表的交叉表—>它使笛卡尔乘积表或本表2成为透视表.
TABLE 3: Crosstab of the second table —> it pivots the cartesian product table, or in this case the TABLE 2.
SELECT *
FROM crosstab(‘ COPY THE TABLE 2 ‘) --> if you have some conditions like "where species = ‘Dogs’", you will need to put double apostrophe in the string value —> where species = ‘’Dogs’’
AS ct(id int4,"Itchy" int8,
"Limp" int8,
"Locomotor" int8,
"Mucus" int8,
"Noisy" int8) --> your tag list. You can prepare it in excel, so all the tags are in quotation marks and has corresponding datatype. The datatype of the tags has to be the same as the datatype of the "value" in the table 2
最终,我想要的最终表是将表1和3联接在一起,所以我从咨询ID中获得了我所需的信息,并获得了标记列表,列的值为0/1如果在某些咨询中有该标签.
FINALLY, the final table I wanted was to join the tables 1 and 3, so I have info I need from the consultation IDs, and a list of tags as columns with the values 0/1 if the tag is present in certain consultation.
with table1 as ( Copy the query of table1),
table3 as ( Copy the query of table3)
select *
from table1
join table3 on
table1.id=table3.id
order by 1
决赛桌如下:
ID Species Illness Mucus Noisy ... Limp Itchy
001 Dog Asthma 1 1 ... 0 0
002 Cat Osteoarth. 0 0 ... 1 0
...
003 Bird Avian Pox 0 0 ... 0 1
推荐答案
根据显示查询结果的方式,您可以考虑采用另一种方法,即在单个JSONB列中为每个标记获取所有true/false标志,而不是350个动态列.
Depending on how you display the results of the query, you might consider a different approach where you get all true/false flags per tag in a single JSONB column, rather than 350 dynamic columns.
我不确定我是否正确理解了您的数据模型,但是从我收集到的数据来看,我认为是这样的:
I am not sure if I understood your data model correctly, but from what I gathered I think it's something like this:
create table tags (id int, tag text);
create table consultations (id int, species text, illness text);
create table taggings (taggable_id int, tag_id int);
insert into tags
(id, tag)
values
(1, 'Mucus'),
(2, 'Noisy'),
(3, 'Limp'),
(4, 'Itchy'),
(5, 'Locomotor'),
(6, 'Respiratory');
insert into consultations
(id, species, illness)
values
(1, 'Dog', 'Asthma'),
(2, 'Cat', 'Osteoarthritis'),
(3, 'Bird', 'Avian Pox');
insert into taggings
(taggable_id, tag_id)
values
(1, 1), (1, 2), (1, 6), -- the dog
(2, 5), (2, 3), -- the cat
(3, 4); -- the bird
然后,您可以使用此查询获取一个JSON列:
Then you can get a single JSON column using this query:
select c.id, c.species, c.illness,
(select jsonb_object_agg(t.tag, tg.taggable_id is not null)
from tags t
left join taggings tg
on tg.tag_id = t.id
and tg.taggable_id = c.id) as tags
from consultations c;
使用上述示例数据,查询将返回:
With the above sample data the query returns:
id | species | illness | tags
---+---------+----------------+---------------------------------------------------------------------------------------------------------
1 | Dog | Asthma | {"Limp": false, "Itchy": false, "Mucus": true, "Noisy": true, "Locomotor": false, "Respiratory": true}
2 | Cat | Osteoarthritis | {"Limp": true, "Itchy": false, "Mucus": false, "Noisy": false, "Locomotor": true, "Respiratory": false}
3 | Bird | Avian Pox | {"Limp": false, "Itchy": true, "Mucus": false, "Noisy": false, "Locomotor": false, "Respiratory": false}
编写查询的另一种方法是使用横向联接:
An alternative way of writing the query is to use a lateral join:
select c.id, c.species, c.illness, ti.tags
from consultations c
left join lateral (
select jsonb_object_agg(t.tag, tg.taggable_Id is not null) as tags
from tags t
left join taggings tg on tg.tag_id = t.id and tg.taggable_id = c.id
) as ti on true
这篇关于带有TRUE/FALSE标记的Postgresql中的数据透视的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!