如何加入(合并)数据帧(内,外,左,右)? [英] How to join (merge) data frames (inner, outer, left, right)?

查看:187
本文介绍了如何加入(合并)数据帧(内,外,左,右)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定两个数据框:

  df1 = data.frame(CustomerId = c(1:6),Product = c (rep(Toaster,3),rep(Radio,3)))
df2 = data.frame(CustomerId = c(2,4,6))State = c(rep(Alabama ,2),rep(Ohio,1)))

df1
#CustomerId产品
#1烤面包机
#2烤面包机
#3烤面包机
#4收音机
#5收音机
#6收音机

df2
#CustomerId状态
#2阿拉巴马州
# 4阿拉巴马州
#6俄亥俄州

我如何做数据库样式,即 sql风格,加入?也就是说,如何得到:





额外功劳:



如何使用SQL样式选择语句?

解决方案

使用 merge 函数及其可选参数:



内连接 merge df1,df2)将适用于这些示例,因为R通过常用变量名自动加入框架,但您最有可能要指定 merge(df1,df2,by = CustomerId),以确保您只匹配所需的字段。如果匹配的变量在不同的数据中有不同的名称,也可以使用 by.x by.y 参数框架。



外部加入: merge(x = df1,y = df2, by =CustomerId,all = TRUE)



左外: code> merge(x = df1,y = df2,by =CustomerId,all.x = TRUE)



strong> 右外: merge(x = df1,y = df2,by =CustomerId,all.y = TRUE)



交叉加入: merge(x = df1,y = df2,by = NULL)



与内部连接一样,您可能希望将CustomerId显式传递给R作为匹配变量我认为明确说明要合并的标识符几乎总是最好的如果输入数据框架意外更改,稍后更容易阅读,则更安全。


Given two data frames:

df1 = data.frame(CustomerId = c(1:6), Product = c(rep("Toaster", 3), rep("Radio", 3)))
df2 = data.frame(CustomerId = c(2, 4, 6), State = c(rep("Alabama", 2), rep("Ohio", 1)))

df1
#  CustomerId Product
#           1 Toaster
#           2 Toaster
#           3 Toaster
#           4   Radio
#           5   Radio
#           6   Radio

df2
#  CustomerId   State
#           2 Alabama
#           4 Alabama
#           6    Ohio

How can I do database style, i.e., sql style, joins? That is, how do I get:

  • An inner join of df1 and df2:
    Return only the rows in which the left table have matching keys in the right table.
  • An outer join of df1 and df2:
    Returns all rows from both tables, join records from the left which have matching keys in the right table.
  • A left outer join (or simply left join) of df1 and df2
    Return all rows from the left table, and any rows with matching keys from the right table.
  • A right outer join of df1 and df2
    Return all rows from the right table, and any rows with matching keys from the left table.

Extra credit:

How can I do a SQL style select statement?

解决方案

By using the merge function and its optional parameters:

Inner join: merge(df1, df2) will work for these examples because R automatically joins the frames by common variable names, but you would most likely want to specify merge(df1, df2, by = "CustomerId") to make sure that you were matching on only the fields you desired. You can also use the by.x and by.y parameters if the matching variables have different names in the different data frames.

Outer join: merge(x = df1, y = df2, by = "CustomerId", all = TRUE)

Left outer: merge(x = df1, y = df2, by = "CustomerId", all.x = TRUE)

Right outer: merge(x = df1, y = df2, by = "CustomerId", all.y = TRUE)

Cross join: merge(x = df1, y = df2, by = NULL)

Just as with the inner join, you would probably want to explicitly pass "CustomerId" to R as the matching variable. I think it's almost always best to explicitly state the identifiers on which you want to merge; it's safer if the input data.frames change unexpectedly and easier to read later on.

这篇关于如何加入(合并)数据帧(内,外,左,右)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆