在数据框架中的多个列上使用shapiro.test [英] Using shapiro.test on multiple columns in a data frame

查看：190 发布时间：2017/3/26 1:44:07 r function statistics dataframe

本文介绍了在数据框架中的多个列上使用shapiro.test的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这似乎是一个非常简单的问题，但我找不到答案。

It seems like a pretty simple question, but I can't find the answer.

我有一个数据框（让它调用它 df ，包含n = 100列（ C1 ， C2 ，...， code> C100 ）和50行（ R1 ， R2 。， R50 ）。我测试了数据框中的所有列，以确保它们是数字。我想知道每列中的数据是否使用 shapiro.test（）函数正常分配。

I have a dataframe (lets call it df), containing n=100 columns (C1, C2,..., C100) and 50 rows (R1, R2,...,R50). I tested all the column in the data frame to be sure they are numeric. I want to know if the data in each columns has a normal distribution using the shapiro.test() function.

我可以使用代码按列执行列：

I am able to do it column by colums using the code :

> shapiro.test(df$Cn)

或

> shapiro.test(df[,c(Cn)])

但是当我尝试做几列同时不起作用：

However when I try to do it on several columns at the same time it doesn't work :

> shapiro.test(df[,c(C1:C100)])

返回错误： p。

returns the error :

Error in `[.data.frame`(x, complete.cases(x)) : undefined columns selected

如果有人可以建议一种同时进行所有测试的方法，最后将结果存储在新数据框/矩阵/列表/向量。

I would appreciate if anyone could suggest a way to do all the tests at the same time, and eventually storing the results in a new dataframe/matrix/list/vector.

谢谢！

Seb

推荐答案

不是我认为这是一个明智的数据分析方法，而是将功能应用于数据框的列的基本问题是一般性任务可以使用 sapply（）或 lapply（）（甚至应用程序（），但是对于数据帧，两个前面提到的函数之一将是最好的）。

Not that I think this is a sensible approach to data analysis, but the underlying issue of applying a function to the columns of a data frame is a general task that can easily be achieved using one of sapply() or lapply() (or even apply(), but for data frames, one of the two earlier-mentioned functions would be best).

这里是一个例子，使用一些虚拟数据：

Here is an example, using some dummy data:

set.seed(42)
df <- data.frame(Gaussian = rnorm(50), Poisson = rpois(50, 2), 
                 Uniform = runif(50))

现在应用 shapiro.test（）函数。我们在列表中捕获输出（给定该函数返回的对象），所以我们将使用 lapply（）。

Now apply the shapiro.test() function. We capture the output in a list (given the object returned by this function) so we will use lapply().

lshap <- lapply(df, shapiro.test)
lshap[[1]] ## look at the first column results

R> lshap[[1]]

    Shapiro-Wilk normality test

data:  X[[1L]]
W = 0.9802, p-value = 0.5611

您将需要从这些对象中提取所需的内容，这些对象都具有以下结构：

You will need to extract the things you want from these objects, which all have the structure:

R> str(lshap[[1]])
List of 4
 $ statistic: Named num 0.98
  ..- attr(*, "names")= chr "W"
 $ p.value  : num 0.561
 $ method   : chr "Shapiro-Wilk normality test"
 $ data.name: chr "X[[1L]]"
 - attr(*, "class")= chr "htest"

如果你想要 code>和 p.value 该对象的组件对于 lshap 的所有元素，我们将使用 sapply（）这次，为了很好地安排我们的结果：


If you want the statistic and p.value components of this object for all elements of lshap, we will use sapply() this time, to nicely arrange the results for us:
lres <- sapply(lshap, `[`, c("statistic","p.value"))

R> lres
          Gaussian Poisson Uniform 
statistic 0.9802   0.9371  0.918   
p.value   0.5611   0.01034 0.001998

鉴于您有500个，我会转置 lres ：
Given that you have 500 of these, I'd transpose lres:
R> t(lres)
         statistic p.value 
Gaussian 0.9802    0.5611  
Poisson  0.9371    0.01034 
Uniform  0.918     0.001998

如果您打算从本练习中执行 p   - 值的任何操作，我建议您在拍摄自己之前开始考虑如何纠正多次比较在30公里的脚下。
If you plan on doing anything with the p-values from this exercise, I suggest you start thinking about how to correct for multiple comparisons before you shoot yourself in the foot with a 30-cal.

                        这篇关于在数据框架中的多个列上使用shapiro.test的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

在数据框架中的多个列上使用shapiro.test [英] Using shapiro.test on multiple columns in a data frame

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在数据框架中的多个列上使用shapiro.test [英] Using shapiro.test on multiple columns in a data frame

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭