删除 SAS 中的所有重复实例 [英] Remove all instances of duplicates in SAS

查看:19
本文介绍了删除 SAS 中的所有重复实例的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在按 ID 号合并两个 SAS 数据集,并希望删除所有重复 ID 的实例,即如果一个 ID 号在合并的数据集中出现两次,则具有该 ID 的 both 观察将被删除.

I am merging two SAS datasets by ID number and would like to remove all instances of duplicate IDs, i.e. if an ID number occurs twice in the merged dataset then both observations with that ID will be deleted.

Web 搜索建议了一些 sql 方法和 nodupkey,但这些方法不起作用,因为它们用于典型的重复清理,即保留一个实例,然后删除多个实例.

Web searches have suggested some sql methods and nodupkey, but these are not working because they are for typical duplicate cleansing where one instance is kept and then the multiples are deleted.

推荐答案

假设您正在使用带有 BY id 的 DATA 步;声明,然后添加:

Assuming you are using a DATA step with a BY id; statement, then adding:

if NOT (first.id and last.id) then delete;

应该这样做.如果这不起作用,请出示您的代码.

should do it. If that doesn't work, please show your code.

实际上,我非常喜欢将删除的记录写入单独的数据集,这样您就可以跟踪在不同点删除了多少条记录.所以我会这样编码:

I'm actually a fan of writing dropped records to a separate dataset so you can track how many records were dropped at different points. So I would code this something like:

data want
     drop_dups
;
  merge a b ;
  by id ;
  if first.id and last.id then output want ;
  else output drop_dups ;
run ;

这篇关于删除 SAS 中的所有重复实例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆