在R中,如何计算数据框中列的百分比统计? (表功能扩展百分比) [英] In R, how can I compute percentage statistics on a column in a dataframe ? (table function extended with percentages)

查看:1753
本文介绍了在R中,如何计算数据框中列的百分比统计? (表功能扩展百分比)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个简单的问题,但是我无法弄清楚如何使用prop.table,我经常需要这个功能。

This is a simple question but I could not figure out how to use prop.table for this and I need this functionality very very often.

我有数据这个

> library(ggplot2)
> #sample data
> head(tips,3)
  total_bill tip    sex smoker day   time size
1         17 1.0 Female     No Sun Dinner    2
2         10 1.7   Male     No Sun Dinner    3
3         21 3.5   Male     No Sun Dinner    3
> #how often there is a non-smoker
> table(tips$smoker)

 No Yes 
151  93 
> #how many subjects
> nrow(tips)
[1] 244

我需要知道吸烟者的比例vs非吸烟者
这样的东西(丑陋的代码):

And I need to know percentage of smokers vs. non smokers Something like this (ugly code):

> #percentage of smokers
> options(digits=2)
> transform(as.data.frame(table(tips$smoker)),percentage_column=Freq/nrow(tips)*100)
  Var1 Freq percentage_column
1   No  151                62
2  Yes   93                38
> 

有更好的方法吗?

(甚至更好的是在一组列(我列举的)并且输出有些格式很好)
(例如,吸烟者,日期和时间)

(even better it would be to do this on a set of columns (which I enumerate) and have output somewhat nicely formatted) (e.g., smoker, day, and time)

推荐答案

如果简单,你可能会喜欢:

If it's conciseness you're after, you might like:

prop.table(table(tips$smoker))

然后缩放100如果你喜欢的话或者更像您的确切输出:

and then scale by 100 and round if you like. Or more like your exact output:

tbl <- table(tips$smoker)
cbind(tbl,prop.table(tbl))

如果你想为多列执行此操作,有很多不同您可以根据自己的口味告诉你的方向是干净的输出方式,但这里有一个选项:

If you wanted to do this for multiple columns, there are lots of different directions you could go depending on what your tastes tell you is clean looking output, but here's one option:

tblFun <- function(x){
    tbl <- table(x)
    res <- cbind(tbl,round(prop.table(tbl)*100,2))
    colnames(res) <- c('Count','Percentage')
    res
}

do.call(rbind,lapply(tips[3:6],tblFun))
       Count Percentage
Female    87      35.66
Male     157      64.34
No       151      61.89
Yes       93      38.11
Fri       19       7.79
Sat       87      35.66
Sun       76      31.15
Thur      62      25.41
Dinner   176      72.13
Lunch     68      27.87

如果你不喜欢堆栈的话您可以将 do.call 放在一个列表中。

If you don't like stack the different tables on top of each other, you can ditch the do.call and leave them in a list.

这篇关于在R中,如何计算数据框中列的百分比统计? (表功能扩展百分比)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆