如何连接(合并)数据框(内部,外部,左侧,右侧) [英] How to join (merge) data frames (inner, outer, left, right)

查看:122
本文介绍了如何连接(合并)数据框(内部,外部,左侧,右侧)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出两个数据帧:

df1 = data.frame(CustomerId = c(1:6), Product = c(rep("Toaster", 3), rep("Radio", 3)))
df2 = data.frame(CustomerId = c(2, 4, 6), State = c(rep("Alabama", 2), rep("Ohio", 1)))

df1
#  CustomerId Product
#           1 Toaster
#           2 Toaster
#           3 Toaster
#           4   Radio
#           5   Radio
#           6   Radio

df2
#  CustomerId   State
#           2 Alabama
#           4 Alabama
#           6    Ohio

如何进行数据库样式(即 sql样式联接)?也就是说,我如何获得:

How can I do database style, i.e., sql style, joins? That is, how do I get:

  • df1df2内部联接:
    仅返回左表中右表中具有匹配键的行.
  • df1df2外部联接:
    返回两个表中的所有行,并从左侧返回在右侧表中具有匹配键的连接记录.
  • <的左外部联接(或简称为左联接) c0>和df2
    返回左侧表中的所有行,以及右侧表中具有匹配键的所有行.
  • df1df2右外部联接
    返回右侧表中的所有行,以及左侧表中具有匹配键的所有行.
  • An inner join of df1 and df2:
    Return only the rows in which the left table have matching keys in the right table.
  • An outer join of df1 and df2:
    Returns all rows from both tables, join records from the left which have matching keys in the right table.
  • A left outer join (or simply left join) of df1 and df2
    Return all rows from the left table, and any rows with matching keys from the right table.
  • A right outer join of df1 and df2
    Return all rows from the right table, and any rows with matching keys from the left table.

额外功劳:

如何执行SQL样式选择语句?

How can I do a SQL style select statement?

推荐答案

使用merge函数及其可选参数:

By using the merge function and its optional parameters:

内部联接: merge(df1, df2)将适用于这些示例,因为R通过通用变量名称自动联接框架,但是您很可能希望指定merge(df1, df2, by = "CustomerId")确保您仅在所需的字段上进行匹配.如果匹配的变量在不同的数据帧中具有不同的名称,则还可以使用by.xby.y参数.

Inner join: merge(df1, df2) will work for these examples because R automatically joins the frames by common variable names, but you would most likely want to specify merge(df1, df2, by = "CustomerId") to make sure that you were matching on only the fields you desired. You can also use the by.x and by.y parameters if the matching variables have different names in the different data frames.

外部联接: merge(x = df1, y = df2, by = "CustomerId", all = TRUE)

左外部: merge(x = df1, y = df2, by = "CustomerId", all.x = TRUE)

右外部: merge(x = df1, y = df2, by = "CustomerId", all.y = TRUE)

交叉加入: merge(x = df1, y = df2, by = NULL)

就像内部联接一样,您可能希望将"CustomerId"作为匹配变量显式传递给R.我认为几乎总是最好明确声明要在其上标识的标识符合并;如果输入data.frames发生意外更改,则更安全,以后更易于阅读.

Just as with the inner join, you would probably want to explicitly pass "CustomerId" to R as the matching variable. I think it's almost always best to explicitly state the identifiers on which you want to merge; it's safer if the input data.frames change unexpectedly and easier to read later on.

您可以通过给by一个向量(例如by = c("CustomerId", "OrderId"))来合并多个列.

You can merge on multiple columns by giving by a vector, e.g., by = c("CustomerId", "OrderId").

如果要合并的列名称不同,则可以指定,例如by.x = "CustomerId_in_df1", by.y = "CustomerId_in_df2",其中CustomerId_in_df1是第一个数据框中的列名称,而CustomerId_in_df2是其中的第一个数据框.第二个数据帧. (如果您需要在多列上合并,这些也可以是向量.)

If the column names to merge on are not the same, you can specify, e.g., by.x = "CustomerId_in_df1", by.y = "CustomerId_in_df2" where CustomerId_in_df1 is the name of the column in the first data frame and CustomerId_in_df2 is the name of the column in the second data frame. (These can also be vectors if you need to merge on multiple columns.)

这篇关于如何连接(合并)数据框(内部,外部,左侧,右侧)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆