如何使用按列组织的样本在 R 中执行单因素方差分析? [英] How to perform single factor ANOVA in R with samples organized by column?

查看:31
本文介绍了如何使用按列组织的样本在 R 中执行单因素方差分析?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集,其中的样本按列分组.以下示例数据集与我的数据格式类似:

I have a data set where the samples are grouped by column. The following sample dataset is similar to my data's format:

a = c(1,3,4,6,8)
b = c(3,6,8,3,6)
c = c(2,1,4,3,6)
d = c(2,2,3,3,4)

mydata = data.frame(cbind(a,b,c,d))

当我使用上述数据集在 Excel 中执行单因素方差分析时,我得到以下结果:

When I perform a single factor ANOVA in Excel using the above dataset, I get the following results:

我知道 R 中的典型格式如下:

I know a typical format in R is as follows:

group  measurement
a      1
a      3
a      4
.      .
.      .
.      .
d      4

在 R 中执行方差分析的命令是使用 aov(group~measurement, data = mydata).如何使用按列而不是按行组织的样本在 R 中执行单因素方差分析? 换句话说,我如何使用 R 复制 excel 结果?非常感谢您的帮助.

And the command to perform ANOVA in R would be to use aov(group~measurement, data = mydata). How do I perform single factor ANOVA in R with samples organized by column rather than by row? In other words, how do I duplicate the excel results using R? Many thanks for the help.

推荐答案

你把它们堆成长格式:

mdat <- stack(mydata)
mdat
   values ind
1       1   a
2       3   a
3       4   a
4       6   a
5       8   a
6       3   b
7       6   b
snipped output

> aov( values ~ ind, mdat)
Call:
   aov(formula = values ~ ind, data = mdat)

Terms:
                 ind Residuals
Sum of Squares  18.2      65.6
Deg. of Freedom    3        16

Residual standard error: 2.024846 
Estimated effects may be unbalanced

鉴于警告,使用 lm 可能更安全:

Given the warning it might be safer to use lm:

> anova(lm(values ~ ind, mdat))
Analysis of Variance Table

Response: values
          Df Sum Sq Mean Sq F value Pr(>F)
ind        3   18.2  6.0667  1.4797 0.2578
Residuals 16   65.6  4.1000               
> summary(lm(values~ind, mdat))

Call:
lm(formula = values ~ ind, data = mdat)

Residuals:
   Min     1Q Median     3Q    Max 
 -3.40  -1.25   0.00   0.90   3.60 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   4.4000     0.9055   4.859 0.000174 ***
indb          0.8000     1.2806   0.625 0.540978    
indc         -1.2000     1.2806  -0.937 0.362666    
indd         -1.6000     1.2806  -1.249 0.229491    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 2.025 on 16 degrees of freedom
Multiple R-squared: 0.2172, Adjusted R-squared: 0.07041 
F-statistic:  1.48 on 3 and 16 DF,  p-value: 0.2578 

请不要问我为什么 Excel 会给出不同的答案.Excel 在统计方面通常被证明是非常不可靠的.Excel 有责任解释为什么它不能提供与 R 相当的答案.

And please don't ask me why Excel gives a different answer. Excel has generally been shown to be highly unreliable when it comes to statistics. The onus is on Excel to explain why it doesn't give answers comparable to R.

针对评论进行Excel 数据分析包 ANOVA 过程会创建一个输出,但该过程不使用 Excel 函数,因此当您更改派生数据的数据单元格中的数据时,然后点击F9,或等效的菜单重新计算命令,输出部分不会有任何变化.这个和其他用户和数值问题的来源记录在 David Heiser 用统计计算评估 Excel 问题的各个页面中:http://www.daheiser.info/excel/frontpage.html Heiser 开始了他的努力,现在至少有十年之久,期望微软对这些错误负责,但是他们一直忽视他和其他人在识别错误​​和建议更好的程序方面所做的努力.在2008 年 6 月号的计算统计&Data Analysis" 由 BD McCullough 编辑,涵盖了 Excel 的各种统计问题.

Edit in response to comments: The Excel Data Analysis Pack ANOVA procedure creates an output but it does not use an Excel function for that process, so when you change the data in the data cells from which it was derived, and then hit F9, or the equivalent menu recalculation command, there will be no change in the output section. This and other sources of user and numerical problems are documented in various pages of David Heiser's efforts at assessing Excel's problems with statistical calculations: http://www.daheiser.info/excel/frontpage.html Heiser started out his efforts which are now at least a decade-long, with the expectation that Microsoft would take responsibility for these errors, but they have consistently ignored his and others' efforts at identifying errors and suggesting better procedures. There was also a 6 section Special Report in the June 2008 issue of "Computational Statistics & Data Analysis" edited by BD McCullough that cover various statistical concerns with Excel.

这篇关于如何使用按列组织的样本在 R 中执行单因素方差分析?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆