在data.frame中查找最常见的值组合 [英] Find most frequent combination of values in a data.frame

查看:168
本文介绍了在data.frame中查找最常见的值组合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在data.frame中找到最常见的值组合。

I would like to find the most frequent combination of values in a data.frame.

以下是一些示例数据:

dat <- data.frame(age=c(50,55,60,50,55),sex=c(1,1,1,0,1),bmi=c(20,25,30,20,25))

在此示例中,我正在寻找的结果是age = 55,sex = 1和bmi = 25的组合,因为这是列值最常见的组合。

In this example the result I am looking for is the combination of age=55, sex=1 and bmi=25, since that is the most frequent combination of column values.

我的实际数据有大约30000行和20列。在30000次观察中,找到这20个值最常见的组合是什么?

My real data has about 30000 rows and 20 columns. What would be an efficient way to find the most common combination of these 20 values among the 30000 observations?

非常感谢!

推荐答案

$ c> data.table :

Here's an approach with data.table:

dt <- data.table(dat)
setkeyv(dt, names(dt))
dt[, .N, by = key(dt)]
dt[, .N, by = key(dt)][N == max(N)]
#    age sex bmi N
# 1:  55   1  25 2

还有一个基本R的方法:

And an approach with base R:

x <- data.frame(table(dat))
x[x$Freq == max(x$Freq), ]
#    age sex bmi Freq
# 11  55   1  25    2

我不知道这些尺度有多好,特别是如果组合数量会很大。所以,测试并报告!

I don't know how well either of these scale though, particularly if the number of combinations is going to be large. So, test back and report!

替换 x $ Freq == max(x $ Freq ) with which.max(x $ Freq) N == max(N) which.max(N)如果您真的只对一行结果感兴趣。

Replace x$Freq == max(x$Freq) with which.max(x$Freq) and N == max(N) with which.max(N) if you are really just interested in one row of results.

这篇关于在data.frame中查找最常见的值组合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆