如何加入(合并)数据框(内、外、左、右) [英] How to join (merge) data frames (inner, outer, left, right)

查看:39
本文介绍了如何加入(合并)数据框(内、外、左、右)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定两个数据框:

df1 = data.frame(CustomerId = c(1:6), Product = c(rep("Toaster", 3), rep("Radio", 3)))
df2 = data.frame(CustomerId = c(2, 4, 6), State = c(rep("Alabama", 2), rep("Ohio", 1)))

df1
#  CustomerId Product
#           1 Toaster
#           2 Toaster
#           3 Toaster
#           4   Radio
#           5   Radio
#           6   Radio

df2
#  CustomerId   State
#           2 Alabama
#           4 Alabama
#           6    Ohio

我如何做数据库风格,即sql风格,加入?也就是说,我如何获得:

How can I do database style, i.e., sql style, joins? That is, how do I get:

  • df1内连接df2:
    仅返回左表在右表中具有匹配键的行.
  • df1外连接df2:
    返回两个表中的所有行,从左侧连接右侧表中具有匹配键的记录.
  • 左外连接(或简单的左连接)的<代码>df1 和 df2
    返回左表中的所有行,以及右表中具有匹配键的所有行.
  • df1右外连接> 和 df2
    返回右表中的所有行,以及左表中具有匹配键的所有行.
  • An inner join of df1 and df2:
    Return only the rows in which the left table have matching keys in the right table.
  • An outer join of df1 and df2:
    Returns all rows from both tables, join records from the left which have matching keys in the right table.
  • A left outer join (or simply left join) of df1 and df2
    Return all rows from the left table, and any rows with matching keys from the right table.
  • A right outer join of df1 and df2
    Return all rows from the right table, and any rows with matching keys from the left table.

额外功劳:

如何做一个 SQL 风格的选择语句?

How can I do a SQL style select statement?

推荐答案

通过使用merge函数及其可选参数:

By using the merge function and its optional parameters:

内连接: merge(df1, df2) 将适用于这些示例,因为 R 自动通过公共变量名称连接框架,但是您很可能希望指定 merge(df1, df2, by = "CustomerId") 以确保您只匹配所需的字段.如果匹配变量在不同数据框中具有不同名称,您还可以使用 by.xby.y 参数.

Inner join: merge(df1, df2) will work for these examples because R automatically joins the frames by common variable names, but you would most likely want to specify merge(df1, df2, by = "CustomerId") to make sure that you were matching on only the fields you desired. You can also use the by.x and by.y parameters if the matching variables have different names in the different data frames.

外连接: merge(x = df1, y = df2, by = "CustomerId", all = TRUE)

左外: merge(x = df1, y = df2, by = "CustomerId", all.x = TRUE)

右外: merge(x = df1, y = df2, by = "CustomerId", all.y = TRUE)

交叉连接: merge(x = df1, y = df2, by = NULL)

就像内连接一样,您可能希望将CustomerId"作为匹配变量显式传递给 R. 我认为最好明确说明您想要使用的标识符合并;如果输入的 data.frames 发生意外更改并且以后更容易阅读,则更安全.

Just as with the inner join, you would probably want to explicitly pass "CustomerId" to R as the matching variable. I think it's almost always best to explicitly state the identifiers on which you want to merge; it's safer if the input data.frames change unexpectedly and easier to read later on.

您可以通过给 by 一个向量来合并多个列,例如 by = c("CustomerId", "OrderId").

You can merge on multiple columns by giving by a vector, e.g., by = c("CustomerId", "OrderId").

如果要合并的列名不同,您可以指定,例如,by.x = "CustomerId_in_df1", by.y = "CustomerId_in_df2" where CustomerId_in_df1code> 是第一个数据框中的列名,CustomerId_in_df2 是第二个数据框中的列名.(如果您需要在多列上合并,这些也可以是向量.)

If the column names to merge on are not the same, you can specify, e.g., by.x = "CustomerId_in_df1", by.y = "CustomerId_in_df2" where CustomerId_in_df1 is the name of the column in the first data frame and CustomerId_in_df2 is the name of the column in the second data frame. (These can also be vectors if you need to merge on multiple columns.)

这篇关于如何加入(合并)数据框(内、外、左、右)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆