SAS 中的合并工作(使用 IN=) [英] Working of Merge in SAS (with IN=)

查看：15 发布时间：2022/1/8 17:05:19 sql sas

本文介绍了SAS 中的合并工作(使用 IN=)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有两个 dataset data1 和 data2

I have two dataset data1 and data2

data data1; 
input sn id $;
datalines;
1 a
2 a
3 a
;
run;

data data2; 
input id $ sales x $;
datalines;
a 10 x
a 20 y 
a 30 z
a 40 q
;
run;

我正在从下面的代码中合并它们:

I am merging them from below code:

data join;
merge data1(in=a) data2(in=b);
by id;
if a and b;
run;

结果:(我期待的是 Inner Join 结果，但事实并非如此)

Result: (I was expecting an Inner Join result which is not the case)

1   a   10  x
2   a   20  y
2   a   30  z
2   a   40  w

proc sql 内部连接的结果.

proc sql;
select data1.id,sn,sales,x from data2 inner join data1 on data1.hh_id;
quit;

结果:(正如内部连接所预期的那样)

Result: (As expected from an inner join)

a  1  10  x
a  1  20  y
a  1  30  z
a  1  40  w
a  2  10  x
a  2  20  y
a  2  30  z
a  2  40  w
b  3  10  x
b  3  20  y
b  3  30  z
b  3  40  w

我想知道merge语句在SAS中In=的概念和STEP BY STEP的工作原理并证明上述结果.

I want to know the concept and STEP BY STEP working of merge statement in SAS with In= and proving the above result.

PS:我已阅读 this，上面写着

PS: I have read this, and it says

这些变量的一个明显用途是控制什么样的合并"将发生，使用 if 语句.例如，如果ThisRecordIsFromYourData 和 ThisRecordIsFromOtherData;将使SAS仅包含与两个输入数据中的 by 变量匹配的行集合(如内部连接).

An obvious use for these variables is to control what kind of 'merge' will occur, using if statements. For example, if ThisRecordIsFromYourData and ThisRecordIsFromOtherData; will make SAS only include rows that match on the by variables from both input data sets (like an inner join).

我猜，(如内部连接)并非总是如此.

which I guess, (like an Inner Join) is not always the case.

推荐答案

基本上，这是由于 SAS 数据步骤和 SQL 处理各自连接/合并的方式不同造成的.

Basically, this is a result of the difference in how the SAS data step and SQL process their respective join/merges.

SQL 为每个可能的键组合创建一个单独的记录.这是笛卡尔积(在关键级别).

SQL creates a separate record for each possible combination of keys. This is a Cartesian Product (at the key level).

但是，SAS 数据步骤的过程合并非常不同.MERGE 实际上只不过是 SET 的一个特例.它仍然迭代地处理行，一次一个 - 它永远不会返回，并且从 PDV 中的任何数据集中一次永远不会超过一行.因此，它无法在其正常过程中创建笛卡尔积 - 这需要随机访问，而 SAS 数据步通常不会这样做.

SAS data step, however, process merges very differently. MERGE is really nothing more than a special case of SET. It still processes rows iteratively, one at a time - it never goes back, and never has more than one row from any dataset in the PDV at once. Thus, it cannot create a Cartesian product in its normal process - that would require random access, which the SAS datastep doesn't do normally.

它的作用:

For each unique BY value
  Take the next record from the left side dataset, if one exists with that BY value
  Take the next record from the right side dataset, if one exists with that BY value
  Output a row
Continue until both datasets are exhausted for that BY value

使用 BY 值在任一侧(或两侧)产生每个值的唯一记录，它实际上与 SQL 相同.但是，使用 BY 值会在两侧产生重复项，您将得到您所拥有的:并排合并，如果一个在另一个之前用完，则来自较短数据集的最后一行的值(为此值)或多或少地被复制下来.(它们实际上是保留的，因此如果您用更改覆盖它们，它们将不会在较长数据集中的新记录上重置).

With BY values that yield unique records per value on either side (or both), it is effectively identical to SQL. However, with BY values that yield duplicates on BOTH sides, you get what you have there: a side-by-side merge, and if one runs out before the other, the values from the last row of the shorter dataset (for that by value) are more-or-less copied down. (They're actually RETAINED, so if you overwrite them with changes, they will not reset on new records from the longer dataset).

所以，如果 left 有 3 条记录，而 right 有 4 条记录键值 a，就像在您的示例中一样，那么您将获得数据来自以下记录(假设您之后不更改数据):

So, if left has 3 records and right has 4 records for key value a, like in your example, then you get data from the following records (assuming you don't alter the data after):

left  right
1     1
2     2
3     3
3     4

这篇关于SAS 中的合并工作(使用 IN=)的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

SAS 中的合并工作(使用 IN=) [英] Working of Merge in SAS (with IN=)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

SAS 中的合并工作(使用 IN=) [英] Working of Merge in SAS (with IN=)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭