通过某个变量对连续观察值求和 [英] Sum consecutive observations by some variable

查看:8
本文介绍了通过某个变量对连续观察值求和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我只是在学习使用 SAS,所以请耐心等待.我有以下关于处方使用情况的样本患者数据,我想尝试结合观察来形成更多患者故事,但要保持时间线不变:

I'm just learning to use SAS, so bear with me a bit. I have the following sample patient data on prescription usage and I'd like to try to combine observations to form more of a patient story, but keep the timeline intact:

data have;
 input dose $2. id $4. supply date $8.;
 datalines;
 "5" 1234 30 01012015
 "10" 1234 30 02012015
 "10" 1234 30 03012015
 "5" 1234 30 04012015
 "2" 1234 30 05012015
 "5" 4321 30 07012016
 "2" 9876 30 05012016
 "2" 9876 30 06012016
 "10" 9876 30 07012016
 ;
run;

其中 dose 是以 mg 为单位的剂量,id 是患者 ID,supply 是药物供应的天数,date 是补充药物的日期.

Where dose is the dosage in mg, id is patient ID, supply is the number of days' supply of the medication, and date is the date of the refill.

我想巩固一些观察结果,以便当我们查看患者 1234 时,我们可以看到他们服用 5mg 30 天,然后 10mg 60 天,然后再 5mg 30 天,等等.所有的我学到的 summation 和 group by 命令会将观察 1 和 4 结合在一起,但患者的故事是剂量增加然后减少,我想保持不变,但不知道如何.

I'd like to consolidate some of the observations so that when we look at patient 1234 we can see they were taking 5mg for 30 days, then 10mg for 60 days, then 5 mg again for 30 days, etc. All of the summation and group by commands I've learned would combine observations 1 and 4 together, but the patient story was that the dosage was increased and then decreased, and I'd like to keep that intact but don't know how.

所以它看起来像这样:

data want;
 input dose $2. id $4. supply date $8.;
 datalines;
 "5" 1234 30 01012015
 "10" 1234 60 02012015
 "5" 1234 30 04012015
 "2" 1234 30 05012015
 "5" 4321 30 07012016
 "2" 9876 60 05012016
 "10" 9876 30 07012016
 ;
run;

见观察 3 卷成 2、8 卷成 7 等.

See observation 3 rolled up into 2, 8 into 7, etc.

任何提示将不胜感激!

推荐答案

这是一个依赖 retain 变量的解决方案.它只是众多中的一种,它使用了相当先进的技术,可能会吓坏初学者.您已被警告;)

Here is one solution relying on retain variables. It is only one among many, and it uses rather advanced techniques that could scare the crap out of a beginner. You have been warned ;)

goto的使用&标签(以 : 结尾)不是很常见,在大多数 情况下可以避免.但在这种情况下,它似乎是有道理的,主要是为了简洁.

The use of goto & labels (ending with :) is not very common and in most cases can be avoided. But in a situation like this, it seems warranted, mainly for concision.

data have;
  informat id 4. dose 3. supply 3. date mmddyy8.;
  format date mmddyy10.;
  input id dose supply date;
  datalines;
1234  5 30 01012015
1234 10 30 02012015
1234 10 30 03012015
1234  5 30 04012015
1234  2 30 05012015
4321  5 30 07012016
9876  2 30 05012016
9876  2 30 06012016
9876 10 30 07012016
;

我们首先确保我们的数据正确排序.

We first make sure our data is properly sorted.

proc sort data=have;
  by id date;
run;

解决方案

retain 语句将使得声明变量的值在数据步骤迭代 have 数据集的行时保留在内存中.

The Solution

The retain statement will make it so that values for the declared variables are kept in memory as the data step iterates over rows of the have data set.

注意,_i 后缀是从have 中添加到现有变量中的,i 代表input.

Note that the _i suffix is added to the existing variables from have, i standing for input.

data want(drop=id_i dose_i supply_i date_i);
  format id dose supply 8. date mmddyy10.;
  retain id dose supply date;
  set have(rename=(id=id_i dose=dose_i supply=supply_i date=date_i)) end=last;

  if _N_ = 1 then goto propagate;

  if id_i = id and dose_i = dose then do;
    supply = supply + supply_i;
    goto checklast;
  end;

  * When id or dose is different from previous row, ;
  * we write the observation to the want table.     ;
  output;

  propagate:
  id     = id_i;
  dose   = dose_i;
  supply = supply_i;
  date   = date_i;

  checklast:
  if last then output;
run;

这里有几点需要注意:

  • _N_是表示当前迭代次数的自动SAS变量
  • end=last(用作 set 语句的参数)创建一个名为 last 的变量(这是一个任意名称),它当从 have 读取最后一个观察值时,将采用值 1,否则采用 0.我们在数据步骤结束时将其用作布尔变量.
  • 请记住,在试图弄清楚这一点时,数据步骤的功能就像 for 循环一样,迭代其源表的行.
  • _N_ is an automatic SAS variable indicating the current iteration number
  • end=last (used as a parameter to the set statement) creates a variable called last (this is an arbitrary name) that will take on value 1 when the last observation is read from have, and 0 otherwise. We use it as a boolean variable at the end of the data step.
  • Keep in mind, in trying to figure this out, that a data step functions just like a for loop, iterating over rows of its source table.
id    dose   supply    date
1234    5       30    01/01/2015
1234    10      60    02/01/2015
1234    5       30    04/01/2015
1234    2       30    05/01/2015
4321    5       30    07/01/2016
9876    2       60    05/01/2016
9876    10      30    07/01/2016

这篇关于通过某个变量对连续观察值求和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆