用plyr按类别计算最频繁的级别 [英] Calculating most frequent level by category with plyr

查看:83
本文介绍了用plyr按类别计算最频繁的级别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用以下代码按plyr按类别计算最频繁的因子水平.数据框b显示请求的结果.为什么c$mlevels仅具有数字"值?

I would like calculate the most frequent factor level by category with plyr using the code below. The data frame b shows the requested result. Why does c$mlevels only have the value "numeric"?

require(plyr)
set.seed(0)
a <- data.frame(cat=round(runif(100, 1, 3)),
                levels=factor(round(runif(100, 1, 10))))
mode <- function(x) names(table(x))[which.max(table(x))]
b <- data.frame(cat=1:3,
                mlevels=c(mode(a$levels[a$cat==1]),
                       mode(a$levels[a$cat==2]),
                       mode(a$levels[a$cat==3])))
c <- ddply(a, .(cat), summarise,
           mlevels=mode(levels))

推荐答案

使用summarise时,在检查base中的函数之前,plyr似乎看不到"在全局环境中声明的函数:

When you use summarise, plyr seems to "not see" the function declared in the global environment before checking for function in base:

我们可以使用Hadley方便的pryr软件包进行检查.您可以通过以下命令进行安装:

We can check this using Hadley's handy pryr package. You can install it by these commands:

library(devtools)
install_github("pryr")


require(pryr)
require(plyr)
c <- ddply(a, .(cat), summarise, print(where("mode")))
# <environment: namespace:base>
# <environment: namespace:base>
# <environment: namespace:base>

基本上,它不读取/不知道/请参见您的 mode功能.有两个替代方案.首先是@AnandaMahto的建议,我会做同样的事情,并建议您坚持使用.另一种选择是不使用summarise并使用function(.)进行调用,以便全局环境中的mode函数是可见的".

Basically, it doesn't read/know/see your mode function. There are two alternatives. The first is what @AnandaMahto suggested and I'd do the same and would advice you to stick with it. The other alternative is to not use summarise and call it using function(.) so that the mode function in your global environment is "seen".

c <- ddply(a, .(cat), function(x) mode(x$levels))
#   cat V1
# 1   1  6
# 2   2  5
# 3   3  9

为什么这样做?

c <- ddply(a, .(cat), function(x) print(where("mode")))
# <environment: R_GlobalEnv>
# <environment: R_GlobalEnv>
# <environment: R_GlobalEnv>

因为如上所述,它会读取位于global environment中的函数.

Because as you see above, it reads your function that sits in the global environment.

> mode # your function
# function(x)
#     names(table(x))[which.max(table(x))]
> environment(mode) # where it sits
# <environment: R_GlobalEnv>

相对于:

> base::mode # base's mode function
# function (x) 
# {
#     some lines of code to compute mode
# }
# <bytecode: 0x7fa2f2bff878>
# <environment: namespace:base>

这是environments上一个很棒的Wiki 哈德利,如果您有兴趣进一步阅读/探索它.

Here's an awesome wiki on environments from Hadley if you're interested in giving it a reading/exploring further.

这篇关于用plyr按类别计算最频繁的级别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆