如何加入(合并)数据帧(内,外,左,右)? [英] How to join (merge) data frames (inner, outer, left, right)?
问题描述
df1 = data.frame(CustomerId = c(1:6),Product = c (rep(Toaster,3),rep(Radio,3)))
df2 = data.frame(CustomerId = c(2,4,6))State = c(rep(Alabama ,2),rep(Ohio,1)))
df1
#CustomerId产品
#1烤面包机
#2烤面包机
#3烤面包机
#4收音机
#5收音机
#6收音机
df2
#CustomerId状态
#2阿拉巴马州
# 4阿拉巴马州
#6俄亥俄州
我如何做数据库样式,即 sql风格,加入?也就是说,如何得到:
- An df1 和
df2
中的28SQL%29#Inner_joinrel =nofollow noreferrer>内部连接:
只返回右表中具有匹配键的行。 - An df1 和
df2 $的外部连接
c $ c>
从两个表中返回所有的行,从左边的连接记录中,右边的表中有匹配的键。 - A
df1 >的左外连接(或简单的左连接)
code>和df2
从左侧表格中返回所有行,并从右侧表格中选择具有匹配键的所有行。 - 右边的外部joi n $
df1
和df2
从右表中返回所有行,以及从左侧表格中具有匹配键的任何行。
额外功劳:
如何使用SQL样式选择语句?
使用 merge
函数及其可选参数:
内连接 merge df1,df2)
将适用于这些示例,因为R通过常用变量名自动加入框架,但您最有可能要指定 merge(df1,df2,by = CustomerId)
,以确保您只匹配所需的字段。如果匹配的变量在不同的数据中有不同的名称,也可以使用 by.x
和 by.y
参数框架。
外部加入: merge(x = df1,y = df2, by =CustomerId,all = TRUE)
左外: code> merge(x = df1,y = df2,by =CustomerId,all.x = TRUE)
strong> 右外: merge(x = df1,y = df2,by =CustomerId,all.y = TRUE)
交叉加入: merge(x = df1,y = df2,by = NULL)
与内部连接一样,您可能希望将CustomerId显式传递给R作为匹配变量我认为明确说明要合并的标识符几乎总是最好的如果输入数据框架意外更改,稍后更容易阅读,则更安全。
Given two data frames:
df1 = data.frame(CustomerId = c(1:6), Product = c(rep("Toaster", 3), rep("Radio", 3)))
df2 = data.frame(CustomerId = c(2, 4, 6), State = c(rep("Alabama", 2), rep("Ohio", 1)))
df1
# CustomerId Product
# 1 Toaster
# 2 Toaster
# 3 Toaster
# 4 Radio
# 5 Radio
# 6 Radio
df2
# CustomerId State
# 2 Alabama
# 4 Alabama
# 6 Ohio
How can I do database style, i.e., sql style, joins? That is, how do I get:
- An inner join of
df1
anddf2
:
Return only the rows in which the left table have matching keys in the right table. - An outer join of
df1
anddf2
:
Returns all rows from both tables, join records from the left which have matching keys in the right table. - A left outer join (or simply left join) of
df1
anddf2
Return all rows from the left table, and any rows with matching keys from the right table. - A right outer join of
df1
anddf2
Return all rows from the right table, and any rows with matching keys from the left table.
Extra credit:
How can I do a SQL style select statement?
By using the merge
function and its optional parameters:
Inner join: merge(df1, df2)
will work for these examples because R automatically joins the frames by common variable names, but you would most likely want to specify merge(df1, df2, by = "CustomerId")
to make sure that you were matching on only the fields you desired. You can also use the by.x
and by.y
parameters if the matching variables have different names in the different data frames.
Outer join: merge(x = df1, y = df2, by = "CustomerId", all = TRUE)
Left outer: merge(x = df1, y = df2, by = "CustomerId", all.x = TRUE)
Right outer: merge(x = df1, y = df2, by = "CustomerId", all.y = TRUE)
Cross join: merge(x = df1, y = df2, by = NULL)
Just as with the inner join, you would probably want to explicitly pass "CustomerId" to R as the matching variable. I think it's almost always best to explicitly state the identifiers on which you want to merge; it's safer if the input data.frames change unexpectedly and easier to read later on.
这篇关于如何加入(合并)数据帧(内,外,左,右)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!