合并以SAS中的3个变量为条件的两个数据集 [英] Merging two datasets conditional on 3 variables in SAS

查看:1153
本文介绍了合并以SAS中的3个变量为条件的两个数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对使用SAS还是很陌生,想在合并两个不包含一个键标识符但需要成功组合三个条件的数据集时寻求帮助.

I am quite new to using SAS and would like to ask for your help on merging two datasets that do not contain one key identifier but require a combination of three conditions to be merged successfully.

我有两个数据集:

  • 数据集1包含有关客户端访问的信息,包括StartDateEndDateEmployeeNumberCustomerNumber.
  • 数据集2包含有关访问期间可能已执行的某些活动的信息,以及进行活动和EmployeeNumberCustomerNumber
  • 时的日期".
  • Dataset 1 contains information on client visits including StartDate, EndDate, EmployeeNumber and CustomerNumber.
  • Dataset 2 contains information on certain activities that may have been performed during the visits in addition to the "Date" when the activity has been carried out and EmployeeNumber and CustomerNumber

目标是保留第一个数据集,并在第二个数据集可用的情况下添加信息(基于日期,员工编号和客户编号的匹配项).

The goal is to keep the first dataset and add information from the second dataset in case it is available - based on a match of date, employee number and customer number.

重要的是,活动数据集中的日期变量仅是一天,而访问数据集中的日期变量则在开始日期和结束日期之间.因此,数据集如下所示: 基础数据集

It is important that the date variable in the activity dataset is just on day while in the visits dataset it is a range between start and end date. So, the data sets look like this: Base datasets

从逻辑上讲,它应该是基于以下三个条件的合并:

Logically, it should be a merge based on three conditions:

  1. IF EmployeeNumber = EmployeeNumber
  2. IF客户编号=客户编号
  3. 如果日期在StartDate和EndDate之间
    然后:使用活动数据集中的信息更新访问数据集
  1. IF EmployeeNumber = EmployeeNumber
  2. IF CustomerNumber = CustomerNumber
  3. IF Date is between StartDate and EndDate
    Then: Update visits dataset with information from activity dataset

因此,理想的新数据集将如下所示: 合并的预期结果
如果活动数据集中没有条目,则新数据集应仅包含各个变量的空白.对于所有匹配项,都应添加相应的变量.

So, the ideal new dataset would look like this: Intended result of merge
If there is no entry in the activity dataset, the new data set should simply contain a blank for the respective variables. For all matches, it should add the respective variable.

在SAS中尝试(相当简单)合并语句时,通常会收到一条错误消息,指出"BY变量未在数据集上正确排序".此外,我不确定如何正确执行条件(尤其是日期与范围的匹配).

When trying (rather simple) merge statements in SAS, I usually get an error message stating that "BY variables are not properly sorted on data set". Furthermore, I am not sure how to implement the conditions (in particular the match of date to a range) properly.

可以在SAS或SQL中完成这种合并吗?
任何帮助都将不胜感激!!!

Can this sort of merge be done in SAS or SQL?
Any help is greatly appreciated!!!

推荐答案

我倾向于将proc sql用于此类内容,这只是SQL擅长的相对简单的联接.尝试这样的事情:

I tend to use proc sql for this kind of stuff, it's just a relatively simple join in which SQL excels. Try something like this:

proc sql;

create table merged_data as
  select a.*, b.other_variable, b.other_variable2
  from dataset1 as a
  left join dataset2 as b 
    on a.EmployeeNumber = b.EmployeeNumber 
    and a.CustomerNumber = b.CustomerNumber
    and b.Date between a.StartDate and a.EndDate
;

quit;

这篇关于合并以SAS中的3个变量为条件的两个数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆