合并两个包含重复元素的数据框 [英] Merge two dataframes containing duplicate elements

查看：48 发布时间：2021/6/2 20:19:34 r merge

本文介绍了合并两个包含重复元素的数据框的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

给定两个名称部分重叠的数据帧，foo 和 bar:

Given two dataframes whose names overlap partially, foo and bar:

foo <- iris[1:10,-c(4,5)]
#   Sepal.Length Sepal.Width Petal.Length
# 1           5.1         3.5          1.4
# 2           4.9         3.0          1.4
# 3           4.7         3.2          1.3
# 4           4.6         3.1          1.5
# 5           5.0         3.6          1.4
# 6           5.4         3.9          1.7
# 7           4.6         3.4          1.4
# 8           5.0         3.4          1.5
# 9           4.4         2.9          1.4
# 10          4.9         3.1          1.5

bar <- iris[3:13,-c(3,5)]
bar[1:8, ] <- bar[1:8, ] * 2
#    Sepal.Length Sepal.Width Petal.Width
# 3           9.4         6.4         0.4
# 4           9.2         6.2         0.4
# 5          10.0         7.2         0.4
# 6          10.8         7.8         0.8
# 7           9.2         6.8         0.6
# 8          10.0         6.8         0.4
# 9           8.8         5.8         0.4
# 10          9.8         6.2         0.2
# 11          5.4         3.7         0.2
# 12          4.8         3.4         0.2
# 13          4.8         3.0         0.1

如何合并数据帧，以便为缺失的情况填充行和列，同时为重叠元素优先处理一个数据帧的结果?在这个例子中，我希望优先处理 bar 中的重叠结果.

How can I merge the dataframes such that both rows and columns are padded for missing cases, while prioritising the results of one dataframe for overlapping elements? In this example, it is the overlapping results in bar that I wish to prioritise.

merge(..., by = "row.names", all = TRUE) 接近，因为它保留了所有 13 行，并将缺失值作为 NA 返回:

merge(..., by = "row.names", all = TRUE) is close, in that it retains all 13 rows, and returns missing values as NA:

foobar <- merge(foo, bar, by = "row.names", all = TRUE)
#    Row.names Sepal.Length.x Sepal.Width.x Petal.Length Sepal.Length.y Sepal.Width.y Petal.Width
# 1          1            5.1           3.5          1.4             NA            NA          NA
# 2         10            4.9           3.1          1.5            9.8           6.2         0.2
# 3         11             NA            NA           NA            5.4           3.7         0.2
# 4         12             NA            NA           NA            4.8           3.4         0.2
# 5         13             NA            NA           NA            4.8           3.0         0.1
# 6          2            4.9           3.0          1.4             NA            NA          NA
# 7          3            4.7           3.2          1.3            9.4           6.4         0.4
# 8          4            4.6           3.1          1.5            9.2           6.2         0.4
# 9          5            5.0           3.6          1.4           10.0           7.2         0.4
# 10         6            5.4           3.9          1.7           10.8           7.8         0.8
# 11         7            4.6           3.4          1.4            9.2           6.8         0.6
# 12         8            5.0           3.4          1.5           10.0           6.8         0.4
# 13         9            4.4           2.9          1.4            8.8           5.8         0.4

但是，它为组成数据框中的每一列创建了一个不同的列，无论它们共享名称的事实如何.

However, it creates a distinct column for each column in the constituent dataframes, regardless of the fact that they share names.

所需的输出如下:

#    Sepal.Length Sepal.Width Petal.Length Petal.Width
# 1           5.1         3.5          1.4          NA # unique to foo
# 2           4.9         3.0          1.4          NA # unique to foo
# 3           9.4         6.4          1.3          0.4 # overlap, retained from bar
# 4           9.2         6.2          1.5          0.4 # 
# 5          10.0         7.2          1.4          0.4 # .
# 6          10.8         7.8          1.7          0.8 # .
# 7           9.2         6.8          1.4          0.6 # .
# 8          10.0         6.8          1.5          0.4 # 
# 9           8.8         5.8          1.4          0.4 # 
# 10          9.8         6.2          1.5          0.2 # overlap, retained from bar
# 11          5.4         3.7           NA          0.2 # unique to bar
# 12          4.8         3.4           NA          0.2 # unique to bar
# 13          4.8         3.0           NA          0.1 # unique to bar

我的直觉是将数据分成两个不相交的集合，以及 bar 中相交元素的集合，然后合并这些，但我相信有一个更优雅的解决方案！

My intuition is to subset the data into two disjoint sets, and the set of intersecting elements in bar, then merge these, but I'm sure there is a more elegant solution!

合并两个包含重复元素的数据框 [英] Merge two dataframes containing duplicate elements

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

合并两个包含重复元素的数据框 [英] Merge two dataframes containing duplicate elements

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭