对某个变量的连续观察求和 [英] Sum consecutive observations by some variable

查看:51
本文介绍了对某个变量的连续观察求和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚刚在学习使用 SAS,所以请耐心等待.我有以下关于处方使用情况的示例患者数据,我想尝试结合观察以形成更多的患者故事,但保持时间线完整:

数据有;输入剂量 $2.身份证 4 美元.供货日期 $8.;数据线;5" 1234 30 01012015十" 1234 30 02012015十" 1234 30 030120155" 1234 30 040120152" 1234 30 050120155" 4321 30 070120162" 9876 30 050120162" 9876 30 06012016十" 9876 30 07012016;跑步;

其中,dose 是以毫克为单位的剂量,id 是患者 ID,supply 是药物的供应天数,date 是重新填充的日期.

我想合并一些观察结果,以便当我们查看患者 1234 时,我们可以看到他们服用 5 毫克 30 天,然后服用 10 毫克服用 60 天,然后又服用 5 毫克服用 30 天,等等.所有的我学到的按命令求和和分组会将观察 1 和 4 结合在一起,但患者的故事是剂量增加然后减少,我想保持不变,但不知道如何.

所以它看起来像这样:

需要的数据;输入剂量 $2.身份证 4 美元.供货日期 $8.;数据线;5" 1234 30 01012015十" 1234 60 020120155" 1234 30 040120152" 1234 30 050120155" 4321 30 070120162" 9876 60 05012016十" 9876 30 07012016;跑步;

将观察结果 3 卷成 2,将 8 卷成 7,以此类推

任何提示将不胜感激!

解决方案

这是一种依赖于 retain 变量的解决方案.它只是众多中的一个,它使用了相当先进的技术,可以吓跑初学者.你已被警告;)

goto 的使用 &标签(以 : 结尾)不是很常见,在大多数情况下可以避免.但在这种情况下,似乎是有道理的,主要是为了简洁.

数据有;信息 ID 4. 剂量 3. 供应 3. 日期 mmddyy8.;格式化日期 mmddyy10.;输入 id 剂量供应日期;数据线;1234 5 30 010120151234 10 30 020120151234 10 30 030120151234 5 30 040120151234 2 30 050120154321 5 30 070120169876 2 30 050120169876 2 30 060120169876 10 30 07012016;

我们首先确保我们的数据正确排序.

proc sort data=have;按身份证日期;跑步;

解决方案

retain 语句将使声明的变量的值保存在内存中,因为数据步骤遍历 have 数据集的行.

请注意,_i 后缀是添加到来自 have 的现有变量中,i 代表 input.

data want(drop=id_i dose_i supply_i date_i);格式 id 剂量供应 8. 日期 mmddyy10.;保留 id 剂量供应日期;设置有(重命名=(id=id_i 剂量=dose_i supply=supply_i date=date_i))end=last;如果 _N_ = 1 然后转到传播;如果 id_i = id 且剂量_i = 剂量,则执行;供应 = 供应 + supply_i;转到最后一个;结尾;* 当 id 或剂量与前一行不同时,;* 我们将观察结果写入到想要的表中.;输出;传播:id = id_i;剂量 = 剂量_i;供应 = supply_i;日期 = date_i;检查最后:如果最后然后输出;跑步;

这里有几点需要注意:

  • _N_ 是一个自动 SAS 变量,指示当前迭代次数
  • end=last(用作 set 语句的参数)创建一个名为 last(这是一个任意名称)的变量当从 have 读取最后一个观察值时,值为 1,否则为 0.我们在数据步骤结束时将其用作布尔变量.
  • 请记住,在试图弄清楚这一点时,数据步骤的功能就像 for 循环一样,迭代其源表的行.

结果

id 剂量供应日期1234 5 30 01/01/20151234 10 60 02/01/20151234 5 30 04/01/20151234 2 30 05/01/20154321 5 30 07/01/20169876 2 60 05/01/20169876 10 30 07/01/2016

I'm just learning to use SAS, so bear with me a bit. I have the following sample patient data on prescription usage and I'd like to try to combine observations to form more of a patient story, but keep the timeline intact:

data have;
 input dose $2. id $4. supply date $8.;
 datalines;
 "5" 1234 30 01012015
 "10" 1234 30 02012015
 "10" 1234 30 03012015
 "5" 1234 30 04012015
 "2" 1234 30 05012015
 "5" 4321 30 07012016
 "2" 9876 30 05012016
 "2" 9876 30 06012016
 "10" 9876 30 07012016
 ;
run;

Where dose is the dosage in mg, id is patient ID, supply is the number of days' supply of the medication, and date is the date of the refill.

I'd like to consolidate some of the observations so that when we look at patient 1234 we can see they were taking 5mg for 30 days, then 10mg for 60 days, then 5 mg again for 30 days, etc. All of the summation and group by commands I've learned would combine observations 1 and 4 together, but the patient story was that the dosage was increased and then decreased, and I'd like to keep that intact but don't know how.

So it would look like this:

data want;
 input dose $2. id $4. supply date $8.;
 datalines;
 "5" 1234 30 01012015
 "10" 1234 60 02012015
 "5" 1234 30 04012015
 "2" 1234 30 05012015
 "5" 4321 30 07012016
 "2" 9876 60 05012016
 "10" 9876 30 07012016
 ;
run;

See observation 3 rolled up into 2, 8 into 7, etc.

Any tips would be greatly appreciated!

解决方案

Here is one solution relying on retain variables. It is only one among many, and it uses rather advanced techniques that could scare the crap out of a beginner. You have been warned ;)

The use of goto & labels (ending with :) is not very common and in most cases can be avoided. But in a situation like this, it seems warranted, mainly for concision.

data have;
  informat id 4. dose 3. supply 3. date mmddyy8.;
  format date mmddyy10.;
  input id dose supply date;
  datalines;
1234  5 30 01012015
1234 10 30 02012015
1234 10 30 03012015
1234  5 30 04012015
1234  2 30 05012015
4321  5 30 07012016
9876  2 30 05012016
9876  2 30 06012016
9876 10 30 07012016
;

We first make sure our data is properly sorted.

proc sort data=have;
  by id date;
run;

The Solution

The retain statement will make it so that values for the declared variables are kept in memory as the data step iterates over rows of the have data set.

Note that the _i suffix is added to the existing variables from have, i standing for input.

data want(drop=id_i dose_i supply_i date_i);
  format id dose supply 8. date mmddyy10.;
  retain id dose supply date;
  set have(rename=(id=id_i dose=dose_i supply=supply_i date=date_i)) end=last;

  if _N_ = 1 then goto propagate;

  if id_i = id and dose_i = dose then do;
    supply = supply + supply_i;
    goto checklast;
  end;

  * When id or dose is different from previous row, ;
  * we write the observation to the want table.     ;
  output;

  propagate:
  id     = id_i;
  dose   = dose_i;
  supply = supply_i;
  date   = date_i;

  checklast:
  if last then output;
run;

A few things to note here:

  • _N_ is an automatic SAS variable indicating the current iteration number
  • end=last (used as a parameter to the set statement) creates a variable called last (this is an arbitrary name) that will take on value 1 when the last observation is read from have, and 0 otherwise. We use it as a boolean variable at the end of the data step.
  • Keep in mind, in trying to figure this out, that a data step functions just like a for loop, iterating over rows of its source table.

Results

id    dose   supply    date
1234    5       30    01/01/2015
1234    10      60    02/01/2015
1234    5       30    04/01/2015
1234    2       30    05/01/2015
4321    5       30    07/01/2016
9876    2       60    05/01/2016
9876    10      30    07/01/2016

这篇关于对某个变量的连续观察求和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆