通过 SAS 中的组处理 [英] By group processing in SAS

查看:36
本文介绍了通过 SAS 中的组处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有索引日期时间字段的非常大的表.我想按月对数据集进行分组处理,只输出每个月的最后一次观察.

I have a very large table with an indexed datetime field. I want to do by group processing on the dataset by month and only output the last observation in each month.

问题是它不包含月份字段,所以我不能使用这样的东西:

The problem is that it doesn't contain a month field so I can't use something like this:

if last.month then do;
  output;
end;

有没有一种方法可以实现这种行为,而不必在之前的数据步骤中添加月份字段?该表压缩了 50 演出,因此我想避免任何不必要的步骤.

Is there a way I can achieve this kind of behaviour without having to add a month field in a previous datastep? The table is 50 gig compressed so I want to avoid any unnecessary steps.

谢谢

推荐答案

您实际上可以针对原始数据集使用by groupformat"来实现这一点,将日期时间字段格式化为dtmonyy5".顾名思义,这是按格式化的值而不是原始值进行分组.

You can actually achieve this using 'by groupformat' against your original dataset, formatting the datetime field as 'dtmonyy5.' As the name implies, this groups by the formatted values instead of the original.

data new1;
set old;
format datetime dtmonyy5.;
by groupformat datetime;
if last.datetime;
run;

另一种方法是使用 Proc Summary,尽管这可能会占用大量内存,尤其是对于大型数据集.这是代码.

Another method is to use Proc Summary, although this can be memory intensive, particularly against large datasets. Here is the code.

proc summary data=old nway;
class datetime;
format datetime dtmonyy5.;
output out=new2 (drop=_:) maxid(datetime(_all_))=;
run;

只是对上一个答案的快速说明,'month' 函数适用于日期字段,而不是日期时间,因此您需要将 datepart 函数添加到该行中.

Just a quick note on the previous answer, the 'month' function works against date fields, not datetime, so you would need to add the datepart function to the line.

month = month(datepart(datetime));

这篇关于通过 SAS 中的组处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆