在R中合并导致的行多于数据帧之一 [英] merge in R results in more rows than one of the data frames

查看：89 发布时间：2020/5/9 0:34:58 r merge dataframe rstudio

本文介绍了在R中合并导致的行多于数据帧之一的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有两个数据帧，第一个包含9994行，第二个包含60431行.我想合并两个数据帧，以便合并的数据帧包含两个数据帧的合并列，但仅包含9994行.

I have two data frames, the first contains 9994 rows and the second contains 60431 rows. I want to merge the two data frames such that the merged data frame contains combined columns of both data frames but only contains 9994 rows.

但是，合并后我得到了超过9994行.我如何确保不会发生这种情况?

However, I get more than 9994 rows upon merge. How can I make sure this does not happen?

df1 = readRDS('data1.RDS')
nrow(df1)
# [1] 9994

df2 = readRDS('data2.RDS')
nrow(df2)
# [1] 60431

df = merge(df1,df2,by=c("col1","col2"))
nrow(df)
# [1] 10057

df = merge(df1,df2,by=c("col1","col2"),all.x=TRUE)
nrow(df)
# [1] 10057
nrow(na.omit(df))
# [1] 10057

遵循akrun的评论. 是的，第二个数据帧中有重复项

EDIT : Following akrun's comment. Yes, there were duplicates in the second data frame

nrow(unique(df2[,c("col1","col2")]))
# [1] 60263
nrow(df2)
# [1] 60431

如果同一{col1，col2}组合有多个行，如何从数据帧中仅取一行.合并时，我只希望有9994行.

How can I take only one row from a data frame if there are multiple for the same {col1,col2} combination. When I merge, I would like to have only 9994 rows.

推荐答案

这应该可行，请确保首先对df2进行排序，以便选择正确的行.

This should work, be sure to sort df2 first so you select the right rows.

df = merge(
  df1,
  df2[!duplicated(df2[, c("col1","col2")], ],
  by=c("col1","col2"),
  all.x=TRUE
)

这篇关于在R中合并导致的行多于数据帧之一的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在R中合并导致的行多于数据帧之一 [英] merge in R results in more rows than one of the data frames

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在R中合并导致的行多于数据帧之一 [英] merge in R results in more rows than one of the data frames

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭