计算数据帧中的列的百分比 - “分组”按列 [英] Calculate the percentages of a column in a data frame - "grouped" by column

查看:137
本文介绍了计算数据帧中的列的百分比 - “分组”按列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是一名R初学者,来到这一点,我需要计算数据框中的值的百分比,但是经常会以另一个列值分组。

I am an R beginner and came to the point, where I need the possibility to calculate percentages of values in a data frame, but "grouped" by an other column value, quite often.

我有一个包含mediatype,version,collection(= year)和count(今年)的大约1000行的数据框。我可以过滤它们,只得到一个特定的媒体:

I have a data frame with around 1000 rows, containing mediatype, version, collection (= year) and count (for this year). I can filter them, to get only a specific mediatye:

trSpdf <- trS[trS$Mediatype == 'application/pdf',]

并获得以下示例性输出:

and get the following exemplary output:

> trSpdf 

        Mediatype Version Collection      Count
39 application/pdf      -1     co2008         2.0
40 application/pdf      -1     co2009         5.0
43 application/pdf       1     co2008         1.0
44 application/pdf       1     co2009         1.0
48 application/pdf     1.1     co2008        16.0
52 application/pdf     1.2     co2008        20.0
53 application/pdf     1.2     co2009        90.0
... (continuing) ...

我想要的是计算每个集合的每个版本的百分比=年)与本集合中的所有版本相比,因此在此示例中,结果应为:

What I want, is to calculate the percentage of each version for each collection (= year) compared to all versions in this collection, so for this example the result should be:

5.12% of all versions in co2008 were version -1 (2.0 / total sum for co2008)
2.56% of all versions in co2008 were version 1 (1.0 / total sum for co2008)
...
93,75% of all versions in co2009 were version 1.2 (90.0 / total sum for co2009)
...

提前感谢我如何解决这个问题的任何答案。

Thanks in advance for any answers on how I could solve this.

推荐答案

首先,使用 ave 添加一列,列出每个 Mediatype Collection

First, use ave to add a column giving the total count per Mediatype and Collection:

trS <- transform(trS, Tot.Count = ave(Count, Mediatype, Collection, FUN = sum))

然后,很明显如何计算百分比:

Then, it is easy pretty obvious how to compute the percentage:

trS <- transform(trS, percentage = 100 * Count/Tot.Count)

或者如果你想要它很好地格式化(例如5.13%)然后使用 sprintf

Or if you want it nicely formatted (e.g. "5.13%") then use sprintf:

trS <- transform(trS, percentage = paste0(sprintf("%.2f", 100 * Count/Tot.Count),
                                          "%"))

这篇关于计算数据帧中的列的百分比 - “分组”按列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆