获取每个值的UNIQUE记录总数 [英] Get counts of UNIQUE records OVERALL per value

查看:91
本文介绍了获取每个值的UNIQUE记录总数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这与以下问题有关: 根据计数总和将记录拆分到存储桶中

新的问题是,一个人可以有很多传真,并且由于我试图吸引一定数量的人:我最终得到的东西少于所需的数量,因为同一个人可以放入多个传真桶中.

在所附的示例中,是否有一种方法可以获取不重复的人数?

这是我的桌子,叫做NR_PVO_120

OtherID     Fax
12365092    2762364204
12005656    2762364204
12484936    2762364204
39003042    2762364204
12365597    2762364204
12635922    2762364204
12332346    2762364204
12365092    4387267572
12005656    4387267572
12365092    4422911281
12005656    4422911281
12484936    4422911281
12651239    4422911281
12388710    4422911281
12686953    4422911281
12365092    4423311213
12005656    4423311213
12709544    4423311213
12484936    4423311213
12005656    4424450542
12346839    4424450542
12365120    4424450542
12484936    4424450542
12086512    4424450542

基于此表,我创建一个查询,该查询在使用以下查询链接的函数中使用

SELECT   Fax
     ,COUNT(OtherID) CountOfPracs
    FROM NR_PVO_120
GROUP BY Fax

所以计数看起来像这样

Fax         CountOfPeople
4422911281  6
4387267572  2
4423311213  4
4424450542  5
2762364204  7

如果将所有计数加起来,您将得到24个人,但实际上只有14个人,因为一个人可以进行多份传真.

有没有一种方法可以只计算第二个传真中未包括在第一个传真中的人数.那么对于第三个传真,只计算前两个中未计算的那些,依此类推?

所以结果应该是

2762364204  7
4387267572  0
4422911281  3
4423311213  1
4424450542  3

the first fax has 7 people 
the second fax has 2 but both of those were already counted in the first fax so no new unique people were added
the third fax has 4 but only 1 of those hasn't already been counted 
the fourth fax has 5 but only 1 hasn't been counted earlier 
the fifth fax has 7 but only 3 weren't counted earlier

我知道这不是创建计数的方法,它们不是正确的数据明智的方法,但是没关系.我只是想获取所有具有一定人数的传真号码.假设我需要10个人,我必须挑选10个人,但同时确保所有传真号码都在一起.如果您查看我的NR_PVO_120表并查看前10个人,您会看到9号开始的另一个传真号码一直传播到11.我不会接受该传真.我找到了一个有1个人附着的传真,或者,如果没有,则我停在9.要点是要容纳10个人,但要确保所有具有相同传真的人都组合在一起.

或者还有其他方法可以仅对UNIQUE提供程序进行计数(应该等于14)吗?

解决方案

我制作了用于测试的表:

create table nr_pvo_120 (
   otherid,
   fax
)
as
select 12365092    , 2762364204 from dual union all
select 12005656    , 2762364204 from dual union all
select 12484936    , 2762364204 from dual union all
select 39003042    , 2762364204 from dual union all
select 12365597    , 2762364204 from dual union all
select 12635922    , 2762364204 from dual union all
select 12332346    , 2762364204 from dual union all
select 12365092    , 4387267572 from dual union all
select 12005656    , 4387267572 from dual union all
select 12365092    , 4422911281 from dual union all
select 12005656    , 4422911281 from dual union all
select 12484936    , 4422911281 from dual union all
select 12651239    , 4422911281 from dual union all
select 12388710    , 4422911281 from dual union all
select 12686953    , 4422911281 from dual union all
select 12365092    , 4423311213 from dual union all
select 12005656    , 4423311213 from dual union all
select 12709544    , 4423311213 from dual union all
select 12484936    , 4423311213 from dual union all
select 12005656    , 4424450542 from dual union all
select 12346839    , 4424450542 from dual union all
select 12365120    , 4424450542 from dual union all
select 12484936    , 4424450542 from dual union all
select 12086512    , 4424450542 from dual
/

我的第一枪是:对于每个人(otherid),仅获取他的 first 传真号码,然后进行常规分组并以此为依据:

select first_fax, count(*) firstcount
  from (
   select otherid, min(fax) first_fax
     from nr_pvo_120
    group by otherid
       )
 group by first_fax
 order by first_fax
/

输出将变为:

 FIRST_FAX FIRSTCOUNT
---------- ----------
2762364204          7
4422911281          3
4423311213          1
4424450542          3

然后,我注意到您想要的输出包括第五个传真号码,但计数为零.例如,可以这样做:

select fax, count(*) normalcount, count(otherid_on_first_fax) countunused
  from (
   select fax, otherid,
          case
             when fax = min(fax) over (partition by otherid order by fax)
             then otherid
          end otherid_on_first_fax
     from nr_pvo_120
       )
 group by fax
 order by fax
/

在此输出中,列NORMALCOUNT是具有该传真的人数. COUNTUNUSED列是先前计数中尚未已使用" 的人数:

       FAX NORMALCOUNT COUNTUNUSED
---------- ----------- -----------
2762364204           7           7
4387267572           2           0
4422911281           6           3
4423311213           4           1
4424450542           5           3

诀窍在于,otherid_on_first_fax在人员第一个传真号码上仅具有otherid的值,对于其余人员的传真号码otherid_on_first_fax为NULL. count(otherid_on_first_fax)然后计算所有非空值,对于传真4387267572则不存在.

This is related to the question below: split records into buckets based on a sum of counts

The new problem is that one person can have many faxes, and since I'm trying to get a certain number of people: I usually end up with less than I need, because the same people land in more than one faxbucket

Is there a way to get a count of UNIQUE people in the attached example?

here's my table, it's called NR_PVO_120

OtherID     Fax
12365092    2762364204
12005656    2762364204
12484936    2762364204
39003042    2762364204
12365597    2762364204
12635922    2762364204
12332346    2762364204
12365092    4387267572
12005656    4387267572
12365092    4422911281
12005656    4422911281
12484936    4422911281
12651239    4422911281
12388710    4422911281
12686953    4422911281
12365092    4423311213
12005656    4423311213
12709544    4423311213
12484936    4423311213
12005656    4424450542
12346839    4424450542
12365120    4424450542
12484936    4424450542
12086512    4424450542

Based on this table i create a query that's used in the function linked using the following query

SELECT   Fax
     ,COUNT(OtherID) CountOfPracs
    FROM NR_PVO_120
GROUP BY Fax

So the counts would look like this

Fax         CountOfPeople
4422911281  6
4387267572  2
4423311213  4
4424450542  5
2762364204  7

If you add up all counts you're getting 24 people but in reality there are only 14 because one person can have multiple faxes.

is there a way to only count those people for the second fax that don't weren't counted in the first one. then for the 3rd fax only count those that weren't counted in the first two and so on?

so the results would be

2762364204  7
4387267572  0
4422911281  3
4423311213  1
4424450542  3

the first fax has 7 people 
the second fax has 2 but both of those were already counted in the first fax so no new unique people were added
the third fax has 4 but only 1 of those hasn't already been counted 
the fourth fax has 5 but only 1 hasn't been counted earlier 
the fifth fax has 7 but only 3 weren't counted earlier

i know this isn't the way to create counts, they're not correct data wise but that's OK. i'm just trying to get all the fax numbers that have a certain number of people. let's say i need 10 people, i have to pick out that 10 but at the same time make sure that all the fax numbers stay together. if you look at my NR_PVO_120 table and look at the first 10 people you see that the 9th starts another fax number that spreads until 11. i won't take that fax. i ether find a fax with 1 person attached or, if there are none, i stop at 9. the point is to get 10 people but make sure all people with the same fax are grouped together.

or is there some other way to only count the UNIQUE providers (which should equal to 14)?

解决方案

I made the table for testing:

create table nr_pvo_120 (
   otherid,
   fax
)
as
select 12365092    , 2762364204 from dual union all
select 12005656    , 2762364204 from dual union all
select 12484936    , 2762364204 from dual union all
select 39003042    , 2762364204 from dual union all
select 12365597    , 2762364204 from dual union all
select 12635922    , 2762364204 from dual union all
select 12332346    , 2762364204 from dual union all
select 12365092    , 4387267572 from dual union all
select 12005656    , 4387267572 from dual union all
select 12365092    , 4422911281 from dual union all
select 12005656    , 4422911281 from dual union all
select 12484936    , 4422911281 from dual union all
select 12651239    , 4422911281 from dual union all
select 12388710    , 4422911281 from dual union all
select 12686953    , 4422911281 from dual union all
select 12365092    , 4423311213 from dual union all
select 12005656    , 4423311213 from dual union all
select 12709544    , 4423311213 from dual union all
select 12484936    , 4423311213 from dual union all
select 12005656    , 4424450542 from dual union all
select 12346839    , 4424450542 from dual union all
select 12365120    , 4424450542 from dual union all
select 12484936    , 4424450542 from dual union all
select 12086512    , 4424450542 from dual
/

My first shot would be: For each person (otherid) get his first fax number only and then do a normal group by and count on that:

select first_fax, count(*) firstcount
  from (
   select otherid, min(fax) first_fax
     from nr_pvo_120
    group by otherid
       )
 group by first_fax
 order by first_fax
/

The output will become:

 FIRST_FAX FIRSTCOUNT
---------- ----------
2762364204          7
4422911281          3
4423311213          1
4424450542          3

Then I noticed your desired output included the fifth fax number but with a count of zero. That can for example be done like this:

select fax, count(*) normalcount, count(otherid_on_first_fax) countunused
  from (
   select fax, otherid,
          case
             when fax = min(fax) over (partition by otherid order by fax)
             then otherid
          end otherid_on_first_fax
     from nr_pvo_120
       )
 group by fax
 order by fax
/

In this output, column NORMALCOUNT is the number of people having that fax. Column COUNTUNUSED is the number of people that haven't already been "used" in the previous counts:

       FAX NORMALCOUNT COUNTUNUSED
---------- ----------- -----------
2762364204           7           7
4387267572           2           0
4422911281           6           3
4423311213           4           1
4424450542           5           3

The trick is that otherid_on_first_fax only has the value of otherid on the persons first fax number, for the rest of the persons fax numbers otherid_on_first_fax is NULL. count(otherid_on_first_fax) then counts all the non-null values, of which there are none for fax 4387267572.

这篇关于获取每个值的UNIQUE记录总数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆