比较并保持相同的角色观察 [英] Compare and keep character observations that are the same
问题描述
我有以下数据集:
Date ID Company
Jan05 1 Coca-Cola
Jan05 2 Coca-Cola
Jan05 3 Coca-Cola
Jan05 4 Apple
Jan05 5 Apple
Jan05 6 Apple
Jan05 7 Microsoft
Feb05 1 McDonald
Feb05 2 McDonald
Feb05 3 McDonald
Feb05 4 McDonald
Feb05 5 McDonald
Feb05 6 Microsoft
.
.
.
Jan06 1 Apple
Jan06 2 Apple
Jan06 3 Apple
Jan06 4 Apple
Jan06 5 Apple
Jan06 6 Apple
Jan06 7 Apple
Feb06 1 McDonald
Feb06 2 McDonald
Feb06 3 McDonald
Feb06 4 McDonald
Feb06 5 McDonald
Feb06 6 Lenova
Feb06 7 Lenova
.
.
Jan07 1 Apple
Jan07 2 Apple
Jan07 3 Apple
Jan07 4 Microsoft
Jan07 5 Lenovo
Jan07 6 Apple
Jan07 7 Apple
Feb07 1 TJmax
Feb07 2 TJMax
Feb07 3 TJMax
Feb07 4 TJMax
Feb07 5 TJMax
Feb07 6 TJMax
Feb07 7 TJMax
.
.
.
.
until July15
我想要做的是以下几点: 1:比较1月05日与1月06日,然后比较1月06日与1月7日... 2月05日与2月06日,2月06日与2月07日...对于两个日期. 2:每次计算ID的中间值时,我都不需要新的数据集.我只想确保在Jan05和Jan06中都存在两家公司,然后计算ID的中位数.
What I want to do are the following: 1: Compare January 05 with January 06, then January 06 with January 07...February 05 with February 06, February 06 with February 07....so on for each month get compute a median for ID when the same companies are present for both dates. 2: I don't want a new dataset each time I compute a median for ID. I merely want to make sure that both companies are present for lets say in Jan05 and Jan06, then compute a median for ID.
在SAS中执行此操作的最佳方法是什么?
Whats the best way to do this in SAS?
我的最终结果将如下所示:
My end result will look like this:
Date Median_ID
Jan05 2
Jan06 4
Jan06 4
Jan07 3
Feb05 3
Feb06 3
Feb06 0
Feb07 0
从结果中可以看到:在1月5日和06日,唯一匹配的公司是Apple.在1月6日和1月7日,唯一匹配的公司是Apple. 因此,我们以公司匹配时间的ID中位数为准.
As you can see from the result: In Jan05 and 06, the only company that matches is Apple. In Jan06 and Jan07, the only company that matches again is Apple. So we take the median of ID for the time the companies match.
推荐答案
目前尚不清楚如何根据示例数据计算出最终结果-如果您包括所有中间步骤,则更容易按照您的解释进行操作一个月,例如Jan05
.但是,您似乎可以使用类似于以下内容的SQL进行处理:
It isn't clear how you've calculated the end results from your sample data - it would be easier to follow your explanation if you included all the intermediate steps for one month, e.g. Jan05
. However, this seems like something that you could approach with some SQL similar to the following:
data have;
input Date monyy5. ID Company $32.;
format Date monyy5.;
cards;
Jan05 1 Coca-Cola
Jan05 2 Coca-Cola
Jan05 3 Coca-Cola
Jan05 4 Apple
Jan05 5 Apple
Jan05 6 Apple
Jan05 7 Microsoft
Feb05 1 McDonald
Feb05 2 McDonald
Feb05 3 McDonald
Feb05 4 McDonald
Feb05 5 McDonald
Feb05 6 Microsoft
Jan06 1 Apple
Jan06 2 Apple
Jan06 3 Apple
Jan06 4 Apple
Jan06 5 Apple
Jan06 6 Apple
Jan06 7 Apple
Feb06 1 McDonald
Feb06 2 McDonald
Feb06 3 McDonald
Feb06 4 McDonald
Feb06 5 McDonald
Feb06 6 Lenova
Feb06 7 Lenova
Jan07 1 Apple
Jan07 2 Apple
Jan07 3 Apple
Jan07 4 Microsoft
Jan07 5 Lenovo
Jan07 6 Apple
Jan07 7 Apple
Feb07 1 TJmax
Feb07 2 TJMax
Feb07 3 TJMax
Feb07 4 TJMax
Feb07 5 TJMax
Feb07 6 TJMax
Feb07 7 TJMax
;
run;
proc sql;
create table want as
select a.date, median(a.ID) as Median_ID from have a inner join have b
on month(a.date)= month(b.date)
and year(a.date) = year(b.date) - 1
and a.ID = b.ID
and a.company = b.company
group by a.date
;
quit;
这篇关于比较并保持相同的角色观察的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!