如何总结不同年龄组的不同疾病组合? [英] How to summarize different combinations of diseases for different age groups?

查看:17
本文介绍了如何总结不同年龄组的不同疾病组合?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们对在不同时间点和不同年龄组参加研究的人进行了一项研究.他们已经被跟踪了二十年,在此期间他们发展了 1-5 种疾病.这些疾病在不同的时间点发展.以下是 SAS 中示例数据的代码:

We have a study of people who have been enrolled to a study at different time points and from different age groups. They have been followed up for two decades and during this time they have developed 1-5 diseases. The diseases are developing at different time points. Here is the code for an example data in SAS:

proc format;
  value agegrp
    30-39 = '30-39'
    40-49 = '40-49'
    50-59 = '50-59'
    60-69 = '60-69'
    70-79 = '70-79'
  ;
  invalue agegrp
    '30-39' = 30
    '40-49' = 40
    '50-59' = 50
    '60-69' = 60
    '70-79' = 70
  ;
run;

* generate some sample data;
%macro RandBetween(min, max);
   (&min + floor((1+&max-&min)*rand("uniform")))
%mend;


data have;
  call streaminit(123);
 
  do id = 1 to 10000;
    enrolled = '01jan2000'd + (1 + floor((1+3650-1)*rand("uniform")));
    age = 30 + %RandBetween(0, 49);

    flag1 = rand('uniform') < 0.25;
    date1 = enrolled + %RandBetween(0, 2500);

    flag2 = rand('uniform') < 0.25;
    date2 = date1 + %RandBetween(0,2500);

    flag3 = rand('uniform') < 0.25;
    date3 = date2 + %RandBetween(0,2500);

    flag4 = rand('uniform') < 0.25;
    date4 = date3 + %RandBetween(0,2500);

    flag5 = rand('uniform') < 0.25;
    date5 = date4 + %RandBetween(0,2500);
    output;
  end;
 format enrolled date: yymmdd10. flag: 1.;
run;

我已经总结了在基线时他们年龄不同疾病组合的人的比例.但现在我想找出每个年龄组患有不同疾病组合的人数.例如,统计 40-49 岁患有疾病 1+疾病 2 的人数等.比例将是他们在该年龄时占所有个体的比例.

I have summarized the proportion of people with different combinations of disease for their age at the baseline. But now I want to find the number of people having different combinations of diseases at each age group. e.g.to count the number of people who at the age of 40-49 years had disease1+disease2, etc. And the proportion would be the proportion they represent of all individuals while at that age.

输出应如下所示:

Disease combination           30-39  40-49  50-59  60-69  70-79
------------------------------------------------------------------
Combinations of length 2       xx%    yy%  ...
flag1+flag2
flag2+flag3
...


length 3

length 4

length 5

你有什么想法怎么能做到这一点?

Do you have any thoughts how could one do this?

推荐答案

从诊断的角度来看,数据有点不寻常,但是,如果标志用于诊断遵循某种时间进展模型的疾病或疾病家族,则数据可能有道理.

The data is somewhat unusual from a diagnoses standpoint, however, if the flags are for diagnosis of a disease or disease family that follows some temporal progression model the data might make sense.

注意事项

  • 需要为每个标志单独计算疾病标志日期的年龄.
  • 双轴旋转创建宽结构,标志由 at_age 分组分隔
  • TABULATE 有内置的百分比计算.
  • 疾病标志断言条件存在"状态到其相应疾病名称的映射是使用自定义格式实现的
  • The age at disease flag date needs to be separately computed for each flag.
  • Double pivoting creates wide structure with flags segregated by at_age grouping
  • TABULATE has built-in percentage calculations.
  • The mapping of a 'disease flag asserts condition is present' state to it's corresponding disease name is effectuated using a custom format

例子:

考虑标记 5 个与可怕的狼人进展相关的诊断.

Consider the flagging of 5 diagnoses related to the dreaded Werewolf progression.

proc format;
  value agegrp
    30-39 = '30-39'
    40-49 = '40-49'
    50-59 = '50-59'
    60-69 = '60-69'
    70-79 = '70-79'
    80-high = '80 +'
  ;
  invalue agegrp
    '30-39' = 30
    '40-49' = 40
    '50-59' = 50
    '60-69' = 60
    '70-79' = 70
  ;

  * flag1 to flag5 are progression of Werewolf!;
  value $flag_state_to_disease
    flag1_1='Animal Bite'      
    flag2_1='Hallucination'    
    flag3_1='Onychogryphosis'  
    flag4_1='Hypertrichosis'   
    flag5_1='Hyperdontia'
    other=' '
  ;
run;

* generate some sample data;
%macro RandBetween(min, max);
   (&min + floor((1+&max-&min)*rand("uniform")))
%mend;


data have;
  call streaminit(123);
 
  do id = 1 to 10000;
    enrolled = '01jan2000'd + (1 + floor((1+3650-1)*rand("uniform")));
    age_at_enroll = 30 + %RandBetween(0, 49);

    flag1 = rand('uniform') < 0.25;              
    date1 = enrolled + %RandBetween(0, 2500);

    flag2 = rand('uniform') < 0.25;
    date2 = date1 + %RandBetween(0,2500);

    flag3 = rand('uniform') < 0.25;                
    date3 = date2 + %RandBetween(0,2500);

    flag4 = rand('uniform') < 0.25;
    date4 = date3 + %RandBetween(0,2500);

    flag5 = rand('uniform') < 0.25;
    date5 = date4 + %RandBetween(0,2500);
    output;
  end;

  * force a 5 disease situation for each age group;
  enrolled = '01jan2000'd;
  do age_at_enroll = 30 to 70 by 10;
    flag1=1; flag2=1; flag3=1; flag4=1; flag5=1; 
    date1=enrolled+10; date2=date1+10; date3=date2+10; date4=date3+10; date5=date4+10;
    output;
    id + 1;
  end;

  format enrolled date: yymmdd10. flag: 1.;
run;

* pivot to tall structure;
data tall(keep=id at_age disease);
  set have;
  array dates date1-date5;
  array flags flag1-flag5;

  * row wise transposition of flags as disease names and computed at_age;
  do _n_ = 1 to dim(dates);
    at_age = age_at_enroll + intck('year', enrolled, dates(_n_));
    flag_state = catx('_', vname(flags(_n_)), flags(_n_));
    disease = put(flag_state, flag_state_to_disease.);
    output;
  end;
run;

* pivot back to wide structure, segregating within id the at_age groups;
proc transpose data=tall out=wide1 (label='diseases per id agegroup') prefix=disease;
  by id at_age;
  var disease;
  format at_age agegrp. ;
run;

* computed values for tabulation;
data wide2(keep=at_age disease_count disease_list);
  set wide1;
  disease_count = 5 - cmiss(of disease1-disease5);
  length disease_list $100;
  disease_list = coalescec (catx(', ', of disease1-disease5), '* NONE *');
run;

ods html file='tabulation.html' style=plateau;
title;

proc tabulate data=wide2;
  class disease_count disease_list at_age;
  table 
    disease_count*disease_list
    ,
    at_age * (n*f=comma9. colpctn)
    /
    nocellmerge
  ;
run;
ods html close;

HTML 输出图像

这篇关于如何总结不同年龄组的不同疾病组合?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆