在R中,如何计算数据框中列的百分比统计? (表功能扩展百分比) [英] In R, how can I compute percentage statistics on a column in a dataframe ? (table function extended with percentages)
问题描述
这是一个简单的问题,但是我无法弄清楚如何使用prop.table,我经常需要这个功能。
This is a simple question but I could not figure out how to use prop.table for this and I need this functionality very very often.
我有数据这个
> library(ggplot2)
> #sample data
> head(tips,3)
total_bill tip sex smoker day time size
1 17 1.0 Female No Sun Dinner 2
2 10 1.7 Male No Sun Dinner 3
3 21 3.5 Male No Sun Dinner 3
> #how often there is a non-smoker
> table(tips$smoker)
No Yes
151 93
> #how many subjects
> nrow(tips)
[1] 244
我需要知道吸烟者的比例vs非吸烟者
这样的东西(丑陋的代码):
And I need to know percentage of smokers vs. non smokers Something like this (ugly code):
> #percentage of smokers
> options(digits=2)
> transform(as.data.frame(table(tips$smoker)),percentage_column=Freq/nrow(tips)*100)
Var1 Freq percentage_column
1 No 151 62
2 Yes 93 38
>
有更好的方法吗?
(甚至更好的是在一组列(我列举的)并且输出有些格式很好)
(例如,吸烟者,日期和时间)
(even better it would be to do this on a set of columns (which I enumerate) and have output somewhat nicely formatted) (e.g., smoker, day, and time)
推荐答案
如果简单,你可能会喜欢:
If it's conciseness you're after, you might like:
prop.table(table(tips$smoker))
然后缩放100如果你喜欢的话或者更像您的确切输出:
and then scale by 100 and round if you like. Or more like your exact output:
tbl <- table(tips$smoker)
cbind(tbl,prop.table(tbl))
如果你想为多列执行此操作,有很多不同您可以根据自己的口味告诉你的方向是干净的输出方式,但这里有一个选项:
If you wanted to do this for multiple columns, there are lots of different directions you could go depending on what your tastes tell you is clean looking output, but here's one option:
tblFun <- function(x){
tbl <- table(x)
res <- cbind(tbl,round(prop.table(tbl)*100,2))
colnames(res) <- c('Count','Percentage')
res
}
do.call(rbind,lapply(tips[3:6],tblFun))
Count Percentage
Female 87 35.66
Male 157 64.34
No 151 61.89
Yes 93 38.11
Fri 19 7.79
Sat 87 35.66
Sun 76 31.15
Thur 62 25.41
Dinner 176 72.13
Lunch 68 27.87
如果你不喜欢堆栈的话您可以将 do.call
放在一个列表中。
If you don't like stack the different tables on top of each other, you can ditch the do.call
and leave them in a list.
这篇关于在R中,如何计算数据框中列的百分比统计? (表功能扩展百分比)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!