使用data.table找到组的模态值的出现次数[R] [英] Find number of occurrences of modal value for a group using data.table [R]

查看:162
本文介绍了使用data.table找到组的模态值的出现次数[R]的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在使用这里的出色答案找到具有数据表的组的模式。然而,我也想找到每组变量y的x的模态值的出现次数。我如何做到这一点?

I've been using the excellent answer here to find the mode for groups with data table. However, I'd also like to find the number of occurrences of the modal value of x for each group of variable y. How can I do this?

编辑:有一个更快的方式来找到模式比上面链接的答案。我找不到答案我得到它(请编辑和链接,如果你这样做),但它使用此功能(并找到多种模式,如果他们存在):

there is a faster way to find mode than in the answer linked above. I can't find the answer I got it from (please edit and link if you do), but it uses this function (and finds multiple modes if they exist):

 MultipleMode <- function(x) {
  ux <- unique(x)
  tab <- tabulate(match(x, ux)); ux[tab == max(tab)]
}

只有第一种模式,当有两个时:

Here is a version which arbitrarily takes only the first mode when there are two:

SingleMode <- function(x) {
  ux <- unique(x)
 ux[which.max(tabulate(match(x, ux)))]

}

我现在使用这个作为基本代码,我从这里写一个函数来找到模式的频率,

I'm now using this as the base code from which I write a function to find the frequency of the mode, as seen below, instead of the answer I linked to above.

推荐答案

您可以为每个组创建一个频率表,其将具有在顶部具有最高频率的模式(或在两个的情况下任意选择的模式中的一个)。然后,您可以使用该表的最大频率来查找模式发生的次数,并使用以下函数和代码:

You can create a frequency table for each group, which will have the mode (or an arbitrarily selected one of the modes, in the event of two) at the top with the highest frequency. You can then take the maximum frequency of that table to find the number of times the mode occurs, with the following function and code:

mod_count_fun <- function(x) max(table(x))
DT[,modal_count := mod_count_fun(x),by=y]

希望有助于自我!

编辑:
其实,我发现了更快的方法这个。请改用:

Actually, I found a faster way to do this. Instead, use:

SingleModeVal <- function(x) {
 ux <- unique(x)
 max(tabulate(match(x, ux)))
}
DT[,modal_count := SingleModeVal(x),by=y]

这将比我以前的答案快10倍,因为它使用表格和向量,并且是基于一个聪明的计算模式的方式我将链接到主岗位。

This will go approximately 10x faster than my previous answer because of its use of tabulate and vectors, and is based off a clever way of calculating modes I will link to in the main post.

这篇关于使用data.table找到组的模态值的出现次数[R]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆