如何总结不同年龄组的不同疾病组合? [英] How to summarize different combinations of diseases for different age groups?
问题描述
我们对在不同时间点和不同年龄组参加研究的人进行了一项研究.他们已经被跟踪了二十年,在此期间他们发展了 1-5 种疾病.这些疾病在不同的时间点发展.以下是 SAS 中示例数据的代码:
We have a study of people who have been enrolled to a study at different time points and from different age groups. They have been followed up for two decades and during this time they have developed 1-5 diseases. The diseases are developing at different time points. Here is the code for an example data in SAS:
proc format;
value agegrp
30-39 = '30-39'
40-49 = '40-49'
50-59 = '50-59'
60-69 = '60-69'
70-79 = '70-79'
;
invalue agegrp
'30-39' = 30
'40-49' = 40
'50-59' = 50
'60-69' = 60
'70-79' = 70
;
run;
* generate some sample data;
%macro RandBetween(min, max);
(&min + floor((1+&max-&min)*rand("uniform")))
%mend;
data have;
call streaminit(123);
do id = 1 to 10000;
enrolled = '01jan2000'd + (1 + floor((1+3650-1)*rand("uniform")));
age = 30 + %RandBetween(0, 49);
flag1 = rand('uniform') < 0.25;
date1 = enrolled + %RandBetween(0, 2500);
flag2 = rand('uniform') < 0.25;
date2 = date1 + %RandBetween(0,2500);
flag3 = rand('uniform') < 0.25;
date3 = date2 + %RandBetween(0,2500);
flag4 = rand('uniform') < 0.25;
date4 = date3 + %RandBetween(0,2500);
flag5 = rand('uniform') < 0.25;
date5 = date4 + %RandBetween(0,2500);
output;
end;
format enrolled date: yymmdd10. flag: 1.;
run;
我已经总结了在基线时他们年龄不同疾病组合的人的比例.但现在我想找出每个年龄组患有不同疾病组合的人数.例如,统计 40-49 岁患有疾病 1+疾病 2 的人数等.比例将是他们在该年龄时占所有个体的比例.
I have summarized the proportion of people with different combinations of disease for their age at the baseline. But now I want to find the number of people having different combinations of diseases at each age group. e.g.to count the number of people who at the age of 40-49 years had disease1+disease2, etc. And the proportion would be the proportion they represent of all individuals while at that age.
输出应如下所示:
Disease combination 30-39 40-49 50-59 60-69 70-79
------------------------------------------------------------------
Combinations of length 2 xx% yy% ...
flag1+flag2
flag2+flag3
...
length 3
length 4
length 5
你有什么想法怎么能做到这一点?
Do you have any thoughts how could one do this?
推荐答案
从诊断的角度来看,数据有点不寻常,但是,如果标志用于诊断遵循某种时间进展模型的疾病或疾病家族,则数据可能有道理.
The data is somewhat unusual from a diagnoses standpoint, however, if the flags are for diagnosis of a disease or disease family that follows some temporal progression model the data might make sense.
注意事项
- 需要为每个标志单独计算疾病标志日期的年龄.
- 双轴旋转创建宽结构,标志由 at_age 分组分隔
TABULATE
有内置的百分比计算.- 疾病标志断言条件存在"状态到其相应疾病名称的映射是使用自定义格式实现的
- The age at disease flag date needs to be separately computed for each flag.
- Double pivoting creates wide structure with flags segregated by at_age grouping
TABULATE
has built-in percentage calculations.- The mapping of a 'disease flag asserts condition is present' state to it's corresponding disease name is effectuated using a custom format
例子:
考虑标记 5 个与可怕的狼人进展相关的诊断.
Consider the flagging of 5 diagnoses related to the dreaded Werewolf progression.
proc format;
value agegrp
30-39 = '30-39'
40-49 = '40-49'
50-59 = '50-59'
60-69 = '60-69'
70-79 = '70-79'
80-high = '80 +'
;
invalue agegrp
'30-39' = 30
'40-49' = 40
'50-59' = 50
'60-69' = 60
'70-79' = 70
;
* flag1 to flag5 are progression of Werewolf!;
value $flag_state_to_disease
flag1_1='Animal Bite'
flag2_1='Hallucination'
flag3_1='Onychogryphosis'
flag4_1='Hypertrichosis'
flag5_1='Hyperdontia'
other=' '
;
run;
* generate some sample data;
%macro RandBetween(min, max);
(&min + floor((1+&max-&min)*rand("uniform")))
%mend;
data have;
call streaminit(123);
do id = 1 to 10000;
enrolled = '01jan2000'd + (1 + floor((1+3650-1)*rand("uniform")));
age_at_enroll = 30 + %RandBetween(0, 49);
flag1 = rand('uniform') < 0.25;
date1 = enrolled + %RandBetween(0, 2500);
flag2 = rand('uniform') < 0.25;
date2 = date1 + %RandBetween(0,2500);
flag3 = rand('uniform') < 0.25;
date3 = date2 + %RandBetween(0,2500);
flag4 = rand('uniform') < 0.25;
date4 = date3 + %RandBetween(0,2500);
flag5 = rand('uniform') < 0.25;
date5 = date4 + %RandBetween(0,2500);
output;
end;
* force a 5 disease situation for each age group;
enrolled = '01jan2000'd;
do age_at_enroll = 30 to 70 by 10;
flag1=1; flag2=1; flag3=1; flag4=1; flag5=1;
date1=enrolled+10; date2=date1+10; date3=date2+10; date4=date3+10; date5=date4+10;
output;
id + 1;
end;
format enrolled date: yymmdd10. flag: 1.;
run;
* pivot to tall structure;
data tall(keep=id at_age disease);
set have;
array dates date1-date5;
array flags flag1-flag5;
* row wise transposition of flags as disease names and computed at_age;
do _n_ = 1 to dim(dates);
at_age = age_at_enroll + intck('year', enrolled, dates(_n_));
flag_state = catx('_', vname(flags(_n_)), flags(_n_));
disease = put(flag_state, flag_state_to_disease.);
output;
end;
run;
* pivot back to wide structure, segregating within id the at_age groups;
proc transpose data=tall out=wide1 (label='diseases per id agegroup') prefix=disease;
by id at_age;
var disease;
format at_age agegrp. ;
run;
* computed values for tabulation;
data wide2(keep=at_age disease_count disease_list);
set wide1;
disease_count = 5 - cmiss(of disease1-disease5);
length disease_list $100;
disease_list = coalescec (catx(', ', of disease1-disease5), '* NONE *');
run;
ods html file='tabulation.html' style=plateau;
title;
proc tabulate data=wide2;
class disease_count disease_list at_age;
table
disease_count*disease_list
,
at_age * (n*f=comma9. colpctn)
/
nocellmerge
;
run;
ods html close;
HTML 输出图像
这篇关于如何总结不同年龄组的不同疾病组合?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!