R-describe()输出到数据帧 [英] R - describe() output to a data frame
问题描述
我想使用describe()函数创建一个数据框.正在考虑的数据集是虹膜.数据框应如下所示:
I want to create a data frame using describe() function. Dataset under consideration is iris. The data frame should look like this:
Variable n missing unique Info Mean 0.05 0.1 0.25 0.5 0.75 0.9 0.95
Sepal.Length 150 0 35 1 5.843 4.6 4.8 5.1 5.8 6.4 6.9 7.255
Sepal.Width 150 0 23 0.99 3.057 2.345 2.5 2.8 3 3.3 3.61 3.8
Petal.Length 150 0 43 1 3.758 1.3 1.4 1.6 4.35 5.1 5.8 6.1
Petal.Width 150 0 22 0.99 1.199 0.2 0.2 0.3 1.3 1.8 2.2 2.3
Species 150 0 3
是否有一种方法可以将describe()的输出强制转换为data.frame类型?当我尝试胁迫时,出现如下错误:
Is there a way out to coerce the output of describe() to data.frame type? When I try to coerce, I get an error as shown below:
library(Hmisc)
statistics <- describe(iris)
statistics[1]
first_vec <- statistics[1]$Sepal.Length
as.data.frame(first_vec)
#Error in as.data.frame.default(first_vec) : cannot coerce class ""describe"" to a data.frame
谢谢
推荐答案
弄清楚这一点的方法是使用str()
检查对象:
The way to figure this out is to examine the objects with str()
:
data(iris)
library(Hmisc)
di <- describe(iris)
di
# iris
#
# 5 Variables 150 Observations
# -------------------------------------------------------------
# Sepal.Length
# n missing unique Info Mean .05 .10 .25 .50 .75 .90 .95
# 150 0 35 1 5.843 4.600 4.800 5.100 5.800 6.400 6.900 7.255
#
# lowest : 4.3 4.4 4.5 4.6 4.7, highest: 7.3 7.4 7.6 7.7 7.9
# -------------------------------------------------------------
# ...
# -------------------------------------------------------------
# Species
# n missing unique
# 150 0 3
#
# setosa (50, 33%), versicolor (50, 33%)
# virginica (50, 33%)
# -------------------------------------------------------------
str(di)
# List of 5
# $ Sepal.Length:List of 6
# ..$ descript : chr "Sepal.Length"
# ..$ units : NULL
# ..$ format : NULL
# ..$ counts : Named chr [1:12] "150" "0" "35" "1" ...
# .. ..- attr(*, "names")= chr [1:12] "n" "missing" "unique" "Info" ...
# ..$ intervalFreq:List of 2
# .. ..$ range: atomic [1:2] 4.3 7.9
# .. .. ..- attr(*, "Csingle")= logi TRUE
# .. ..$ count: int [1:100] 1 0 3 0 0 1 0 0 4 0 ...
# ..$ values : Named chr [1:10] "4.3" "4.4" "4.5" "4.6" ...
# .. ..- attr(*, "names")= chr [1:10] "L1" "L2" "L3" "L4" ...
# ..- attr(*, "class")= chr "describe"
# $ Sepal.Width :List of 6
# ...
# $ Species :List of 5
# ..$ descript: chr "Species"
# ..$ units : NULL
# ..$ format : NULL
# ..$ counts : Named num [1:3] 150 0 3
# .. ..- attr(*, "names")= chr [1:3] "n" "missing" "unique"
# ..$ values : num [1:2, 1:3] 50 33 50 33 50 33
# .. ..- attr(*, "dimnames")=List of 2
# .. .. ..$ : chr [1:2] "Frequency" "%"
# .. .. ..$ : chr [1:3] "setosa" "versicolor" "virginica"
# ..- attr(*, "class")= chr "describe"
# - attr(*, "descript")= chr "iris"
# - attr(*, "dimensions")= int [1:2] 150 5
# - attr(*, "class")= chr "describe"
我们看到di
是一个列表列表.我们可以通过仅查看第一个子列表来将其拆开.您可以将其转换为向量:
We see that di
is a list of lists. We can take it apart by looking at just the first sublist. You can convert that into a vector:
unlist(di[[1]])
# descript counts.n
# "Sepal.Length" "150"
# counts.missing counts.unique
# "0" "35"
# counts.Info counts.Mean
# "1" "5.843"
# counts..05 counts..10
# "4.600" "4.800"
# counts..25 counts..50
# "5.100" "5.800"
# counts..75 counts..90
# "6.400" "6.900"
# counts..95 intervalFreq.range1
# "7.255" "4.3"
# intervalFreq.range2 intervalFreq.count1
# "7.9" "1"
# ...
# values.H3 values.H2
# "7.6" "7.7"
# values.H1
# "7.9"
str(unlist(di[[1]]))
# Named chr [1:125] "Sepal.Length" "150" "0" "35" ...
# - attr(*, "names")= chr [1:125] "descript" "counts.n" "counts.missing" "counts.unique" ...
它非常非常长(125).元素被强制为所有相同(且包含范围最广)的类型,即字符.似乎您需要第二到第十二个元素:
It is very, very long (125). The elements have been coerced to all be of the same (and most inclusive) type, namely, character. It seems you want the 2nd through 12th elements:
unlist(di[[1]])[2:12]
# counts.n counts.missing counts.unique counts.Info
# "150" "0" "35" "1"
# counts.Mean counts..05 counts..10 counts..25
# "5.843" "4.600" "4.800" "5.100"
# counts..50 counts..75 counts..90
# "5.800" "6.400" "6.900"
现在,您可以开始使用一些东西了.但是请注意,似乎只有数字变量才是这种情况.因子变量species
不同:
Now you have something you can start to work with. But notice that this only seems to be the case for numerical variables; the factor variable species
is different:
unlist(di[[5]])
# descript counts.n counts.missing counts.unique
# "Species" "150" "0" "3"
# values1 values2 values3 values4
# "50" "33" "50" "33"
# values5 values6
# "50" "33"
在这种情况下,您似乎只需要元素2到4.
In that case, it seems you only want elements two through four.
使用发现和解决问题的过程,您可以了解如何将describe
的输出拆开并将所需的信息放入数据框中.但是,这将需要大量工作.您可能需要使用循环和许多if(){ ... } else{ ... }
块.您可能只想从头开始编写自己的数据集描述功能.
Using this process of discovery and problem solving, you can see how you'd take the output of describe
apart and put the information you want into a data frame. However, this will take a lot of work. You'll presumably need to use loops and lots of if(){ ... } else{ ... }
blocks. You might just want to code your own dataset description function from scratch.
这篇关于R-describe()输出到数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!