SQL唯一组合 [英] SQL unique combinations

查看:114
本文介绍了SQL唯一组合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个表,该表具有三列,分别带有ID,治疗类别和通用名称.治疗类可以映射到多个通用名称.

I have a table with three columns with an ID, a therapeutic class, and then a generic name. A therapeutic class can be mapped to multiple generic names.

ID     therapeutic_class       generic_name
1           YG4                    insulin
1           CJ6                    maleate
1           MG9                    glargine
2           C4C                    diaoxy
2           KR3                    supplies
3           YG4                    insuilin
3           CJ6                    maleate
3           MG9                    glargine

我需要首先查看治疗类别和通用名称的各个组合,然后再计算有多少患者具有相同的组合.我希望我的输出有三列:一列是通用名称的组合,治疗类别的组合以及具有这种组合的患者人数的计数:

I need to first look at the individual combinations of therapeutic class and generic name and then want to count how many patients have the same combination. I want my output to have three columns: one being the combo of generic names, the combo of therapeutic classes and the count of the number of patients with the combination like this:

Count          Combination_generic                   combination_therapeutic
2              insulin, maleate, glargine                 YG4, CJ6, MG9
1              supplies, diaoxy                           C4C, KR3

推荐答案

通过对对(therapeutic_class, generic_name)匹配患者的一种方法是在所需的输出中创建逗号分隔的字符串,并按它们分组并计数.为此,您需要一种识别对的方法.请参阅原始问题下的我的评论"和对戈登答案的我的评论",以了解其中的一些问题.

One way to match patients by the sets of pairs (therapeutic_class, generic_name) is to create the comma-separated strings in your desired output, and to group by them and count. To do this right, you need a way to identify the pairs. See my Comment under the original question and my Comments to Gordon's Answer to understand some of the issues.

我在以下解决方案的一些初步工作中进行了此识别.正如我在评论中提到的那样,如果数据模型中已经存在对和唯一ID,那会更好.我可以即时创建它们.

I do this identification in some preliminary work in the solution below. As I mentioned in my Comment, it would be better if the pairs and unique ID's existed already in your data model; I create them on the fly.

重要提示:这是假定逗号分隔的列表不会太长.如果您超过4000个字符(或在Oracle 12中约为32000个字符,并且启用了某些选项),则可以将字符串聚合为CLOB,但是不能GROUP BY CLOB(通常,不仅是这种情况),因此这种方法将失败.一种更可靠的方法是匹配对的集合,而不是它们的某些集合.解决方案更为复杂,除非您的问题需要它,否则我将不予解决.

Important note: This assumes the comma-separated lists don't become too long. If you exceed 4000 characters (or approx. 32000 characters in Oracle 12, with certain options turned on), you CAN aggregate the strings into CLOBs, but you CAN'T GROUP BY CLOBs (in general, not just in this case), so this approach will fail. A more robust approach is to match the sets of pairs, not some aggregation of them. The solution is more complicated, I will not cover it unless it is needed in your problem.

with
         -- Begin simulated data (not part of the solution)
         test_data ( id, therapeutic_class, generic_name ) as (
           select 1, 'GY6', 'insulin'  from dual union all
           select 1, 'MH4', 'maleate'  from dual union all
           select 1, 'KJ*', 'glargine' from dual union all
           select 2, 'GY6', 'supplies' from dual union all
           select 2, 'C4C', 'diaoxy'   from dual union all
           select 3, 'GY6', 'insulin'  from dual union all
           select 3, 'MH4', 'maleate'  from dual union all
           select 3, 'KJ*', 'glargine' from dual
         ),
         -- End of simulated data (for testing purposes only).
         -- SQL query solution continues BELOW THIS LINE
     valid_pairs ( pair_id, therapeutic_class, generic_name ) as (
       select rownum, therapeutic_class, generic_name
       from   (
                select distinct therapeutic_class, generic_name
                from   test_data
              )
     ),
     first_agg ( id, tc_list, gn_list ) as (
       select t.id, 
              listagg(p.therapeutic_class, ',') within group (order by p.pair_id),
              listagg(p.generic_name     , ',') within group (order by p.pair_id)
       from   test_data t join valid_pairs p
                           on t.therapeutic_class = p.therapeutic_class
                          and t.generic_name      = p.generic_name
       group by t.id
     )
select   count(*) as cnt, tc_list, gn_list
from     first_agg
group by tc_list, gn_list
;

输出:

CNT TC_LIST            GN_LIST                      
--- ------------------ ------------------------------
  1 GY6,C4C            supplies,diaoxy               
  2 GY6,KJ*,MH4        insulin,glargine,maleate     

这篇关于SQL唯一组合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆