如何检查 SAS 中的百分比重叠 [英] How to check percentage overlap in SAS
问题描述
我是 SAS 新手,需要找到一种方法来执行以下操作:
I'm new to SAS and need to find a way to do the following:
我有两个数据集:
用户(user_id,朋友)
(朋友是用,"分隔的user_id
)评论(user_id、review_id、business_id、文本)
Users (user_id, friends)
(friends areuser_id
s seperated by a ",")Reviews (user_id, review_id, business_id, text)
我已在 user_id
上合并了两者.现在我需要知道用户朋友的评论中有多少百分比是关于用户评论过的同一家企业.
I've merged both on user_id
. Now I need to know what percentage of the reviews of the friends of a user is about the same business(es) a user has reviewed.
我想我需要一个存储过程(但我也是 SQL 新手).任何提示如何开始?
I guess I need a stored procedure for this (but I'm new to SQL also). Any tips how to start on it?
推荐答案
我会从重构您的用户表开始,将朋友存储为单独的记录而不是单个列表值:
I would start with refactoring your users table to store friends as separate records rather than as a single list value:
data users(drop=friends);
set users;
do i=1 to countw(compress(friends_list),',');
friend=scan(compress(friends_list),i,',');
output;
end;
run;
然后,您可以通过将带有 reviews
的表格加入两次来计算该百分比,每个用户一次,每个朋友一次:
you can then calculate that percentage by joining that table with reviews
twice, once per user and once per friend:
proc sql;
create table want as
select t1.user_id
,sum(case when t3.business_id=t2.business_id then 1 else 0 end)/count(*) as percentage
from users t1
inner join reviews t2
on t1.user_id=t2.user_id
inner join reviews t3
on t1.friend=t3.user_id
group by t1.user_id
;
quit;
这篇关于如何检查 SAS 中的百分比重叠的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!