使用data.table根据另一列中的类别来计算发生百分比 [英] Use data.table to calculate the percentage of occurrence depending on the category in another column

查看：79 发布时间：2020/10/15 19:23:52 r data.table

本文介绍了使用data.table根据另一列中的类别来计算发生百分比的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

最近，我正在R中使用data.table，它非常流行且高效。目前，我遇到了一个问题，我认为可以使用data.table来解决。

Recently I'm working with data.table in R and it is quite popular and efficient. Currently I come across a problem which I think could be solved using data.table.

我有一个像这样的数据集：

I have a data set like this:

event | group_ind 
  1   | group1
  1   | group1
  1   | group1
  2   | group1
  2   | group1
  1   | group2
  1   | group2
  2   | group2
  2   | group3
  2   | group3

此数据集的结果很明显：第1组的事件1为60％，第2组的为67％，第3组为0。实际上，数据集具有更多的观察结果，且具有两种以上的事件类型，并且未按特定顺序对行进行排序。在R中，我可以通过非常虚拟的方式获得想要的结果（通过将事件列中的发生次数除以每个组中的总观察数），但是我认为应该有一种更理想的方法。

Now I want to know the percentage of event 1 occurs in each group. The result for this data set is obvious: 60% for event 1 in group1, 67% in group2 and 0 in group3. In reality the data set has many more observations with more than 2 event types and rows are not sorted in a certain order. I can get what I want in a very dummy way in R (by counting occurrence in event column divided by total observations in each group) but I think there should be a fancier way of doing this.

所以我想要的结果是这样的：

So the result I want would be like this:

 event | group_ind | percentage
   1   | group1    | 0.6
   2   | group1    | 0.4
   1   | group2    | 0.67
   2   | group2    | 0.33
   1   | group3    | 0
   2   | group3    | 100

我希望可以在data.table中完成此操作。非常感谢您的帮助。

I hope this can be done in data.table. Much appreciate for the help.

推荐答案

一个简单的解决方案就是

A simple solution would be just

setDT(DT)[, .(event = 1:2, percentage = tabulate(event)/.N), by = group_ind]
#    group_ind event percentage
# 1:    group1     1  0.6000000
# 2:    group1     2  0.4000000
# 3:    group2     1  0.6666667
# 4:    group2     2  0.3333333
# 5:    group3     1  0.0000000
# 6:    group3     2  1.0000000

尽管更通用的解决方案是使用在事件上是唯一的（并且还可以对其进行预购-如@EdM的建议）。

Though a more general solution would be to use unique on event (and also pre-order it - as suggested by @EdM).

setDT(DT)[order(event), .(event = unique(event), percentage = tabulate(event)/.N), by = group_ind]

这篇关于使用data.table根据另一列中的类别来计算发生百分比的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用data.table根据另一列中的类别来计算发生百分比 [英] Use data.table to calculate the percentage of occurrence depending on the category in another column

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用data.table根据另一列中的类别来计算发生百分比 [英] Use data.table to calculate the percentage of occurrence depending on the category in another column

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭