R将汇总结果(所有数据框列的统计数据)转换为数据框 [英] R convert summary result (statistics with all dataframe columns) into dataframe
问题描述
[我是 R 的新手...] 我有这个 数据框:
[I'm new to R...] I have this dataframe:
df1 <- data.frame(c(2,1,2), c(1,2,3,4,5,6), seq(141,170)) #create data.frame
names(df1) <- c('gender', 'age', 'height') #column names
我希望 df1
的摘要位于如下所示的数据框对象中:
I want the df1
's summary in a dataframe object that looks like this:
count mean std min 25% 50% 75% max
age 30.0000 3.5000 1.7370 1.0000 2.0000 3.5000 5.0000 6.0000
gender 30.0000 1.6667 0.4795 1.0000 1.0000 2.0000 2.0000 2.0000
height 30.0000 155.5000 8.8034 141.0000 148.2500 155.5000 162.7500 170.0000
我用 df1.describe().T
在 Python 中生成了这个.我怎样才能在 R 中做到这一点?
I've generated this in Python with df1.describe().T
. How can I do this in R?
如果我的摘要数据框包含dtype"、null"(NULL
值的数量)、(数量)唯一"和范围"值,那将是免费的有一个全面的汇总统计:
It would be a gratis if my summary dataframe would contain the "dtype", "null" (number of NULL
values), (number of) "unique" and "range" values as well to have a comprehensive summary statistics:
count mean std min 25% 50% 75% max null unique range dtype
age 30.0000 3.5000 1.7370 1.0000 2.0000 3.5000 5.0000 6.0000 0 6 5 int64
gender 30.0000 1.6667 0.4795 1.0000 1.0000 2.0000 2.0000 2.0000 0 2 1 int64
height 30.0000 155.5000 8.8034 141.0000 148.2500 155.5000 162.7500 170.0000 0 30 29 int64
以上结果的Python代码为:
The Python code of above result is:
df1.describe().T.join(pd.DataFrame(df1.isnull().sum(), columns=['null']))\
.join(pd.DataFrame.from_dict({i:df1[i].nunique() for i in df1.columns}, orient='index')\
.rename(columns={0:'unique'}))\
.join(pd.DataFrame.from_dict({i:(df1[i].max() - df1[i].min()) for i in df1.columns}, orient='index')\
.rename(columns={0:'range'}))\
.join(pd.DataFrame(df1.dtypes, columns=['dtype']))
谢谢!
推荐答案
我通常使用一个小函数(改编自网上的一个脚本)来做这种转换:
I commonly use a little function (adapted from a script found on the net) to do this kind of transformation:
sumstats = function(x) {
null.k <- function(x) sum(is.na(x))
unique.k <- function(x) {if (sum(is.na(x)) > 0) length(unique(x)) - 1
else length(unique(x))}
range.k <- function(x) max(x, na.rm=TRUE) - min(x, na.rm=TRUE)
mean.k=function(x) {if (is.numeric(x)) round(mean(x, na.rm=TRUE), digits=2)
else "N*N"}
sd.k <- function(x) {if (is.numeric(x)) round(sd(x, na.rm=TRUE), digits=2)
else "N*N"}
min.k <- function(x) {if (is.numeric(x)) round(min(x, na.rm=TRUE), digits=2)
else "N*N"}
q05 <- function(x) quantile(x, probs=.05, na.rm=TRUE)
q10 <- function(x) quantile(x, probs=.1, na.rm=TRUE)
q25 <- function(x) quantile(x, probs=.25, na.rm=TRUE)
q50 <- function(x) quantile(x, probs=.5, na.rm=TRUE)
q75 <- function(x) quantile(x, probs=.75, na.rm=TRUE)
q90 <- function(x) quantile(x, probs=.9, na.rm=TRUE)
q95 <- function(x) quantile(x, probs=.95, na.rm=TRUE)
max.k <- function(x) {if (is.numeric(x)) round(max(x, na.rm=TRUE), digits=2)
else "N*N"}
sumtable <- cbind(as.matrix(colSums(!is.na(x))), sapply(x, null.k), sapply(x, unique.k), sapply(x, range.k), sapply(x, mean.k), sapply(x, sd.k),
sapply(x, min.k), sapply(x, q05), sapply(x, q10), sapply(x, q25), sapply(x, q50),
sapply(x, q75), sapply(x, q90), sapply(x, q95), sapply(x, max.k))
sumtable <- as.data.frame(sumtable); names(sumtable) <- c('count', 'null', 'unique',
'range', 'mean', 'std', 'min', '5%', '10%', '25%', '50%', '75%', '90%',
'95%', 'max')
return(sumtable)
}
sumstats(df1)
count null unique range mean std var min 5% 10% 25% 50% 75% 90% 95% max
gender 30.00 0.00 2.00 1.00 1.67 0.48 0.23 1.00 1.00 1.00 1.00 2.00 2.00 2.00 2.00 2.00
age 30.00 0.00 6.00 5.00 3.50 1.74 3.02 1.00 1.00 1.00 2.00 3.50 5.00 6.00 6.00 6.00
height 30.00 0.00 30.00 29.00 155.50 8.80 77.50 141.00 142.45 143.90 148.25 155.50 162.75 167.10 168.55 170.00
您可以轻松地调整它以添加更多描述性列,例如分位数、空值、范围等.它确实返回一个 data.frame.您可能还想提前在参数中指定 NA 的行为.
You might easily adapt it to add more descriptive columns, such as quantiles, nulls, range, etc. It does return a data.frame. You also might want to specify in advance the behaviour with NAs in the arguments.
希望有帮助.
这篇关于R将汇总结果(所有数据框列的统计数据)转换为数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!