返回仅在组中出现一次的观察结果 [英] Returning observations that only occur once in a group

查看：60 发布时间：2020/11/21 1:14:39 r dataframe grouping

本文介绍了返回仅在组中出现一次的观察结果的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图通过因子变量对data.frame进行分组，然后返回data.frame的行，这些行与每组中恰好发生一次的观察值相对应.例如，考虑以下数据:

I am trying to group a data.frame by a factor variable, and then return rows of the data.frame that correspond to observations that occur exactly once in each group. For example, consider the following data:

x = matrix(c(1,1,2,2,2,3,4,4,5,4), nrow = 5, ncol = 2, byrow = F)
x = data.frame(x)
x

#   X1 X2
# 1  1  3
# 2  1  4
# 3  2  4
# 4  2  5
# 5  2  4

我想按第1列中的值对数据进行分组，然后返回一组中第2列中的值仅出现一次的行.在这里，该函数将返回第一行，第二行和第四行.

I would like to group the data by the values in column 1, then return the rows for which the value in column 2 occurs only once within a group. Here, the function would return the first, second, and fourth rows.

所需的输出

我希望将此方法应用于行数大于1mm的数据集.

I am looking to apply this to a dataset with >1mm rows.

推荐答案

在基本R中，您可以尝试ave:

In base R, you can try ave:

x[with(x, ave(X2, X1, X2, FUN = length)) == 1, ]
#   X1 X2
# 1  1  3
# 2  1  4
# 4  2  5

由于当有多个组和多个分组变量时，ave的伸缩性很差，因此您可能要先创建一个新组:

Because ave scales very poorly when there are multiple groups and multiple grouping variables, you may want to create a new group first:

x[with(x, ave(X2, sprintf("%s__%s", X1, X2), FUN = length)) == 1, ]

根据数据的性质，速度会有很大的不同.

The speeds will vary widely according to the nature of your data.

您也可以尝试:

library(dplyr)
x %>%
  group_by(X1, X2) %>%
  filter(n() == 1)
# Source: local data frame [3 x 2]
# Groups: X1, X2 [3]
# 
#      X1    X2
#   (dbl) (dbl)
# 1     1     3
# 2     1     4
# 3     2     5

这篇关于返回仅在组中出现一次的观察结果的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

返回仅在组中出现一次的观察结果 [英] Returning observations that only occur once in a group

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

返回仅在组中出现一次的观察结果 [英] Returning observations that only occur once in a group

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭