chisq.test错误消息 [英] chisq.test Error Message
问题描述
这是我遇到的一个问题:
Here's a problem I'm encountering:
示例数据
df <- data.frame(1,2,3,4,5,6,7,8)
df <- rbind(df,df,df,df)
我想做的是找到1,2,3对4的chisq.test的p.value,在第一行中定义的data.frame中的5,6。
What I would like to do is find the p.value for the chisq.test of 1,2,3 vs. 4,5,6 in the data.frame defined above in the first row.
让我们一起尝试一下:
chisq.test(c(1,2,3),c(4,5,6))$p.value ## this works.
但是当我尝试通过调用列/行...
But when I try to do it by calling the columns/rows...
chisq.test(df[1,1:3],df[1,4:6])$p.value
提供:complete.cases(x,y)中的错误:并不是所有参数的长度相同
Gives: Error in complete.cases(x, y) : not all arguments have the same length
有趣的是,因为这似乎不是真的:
Interesting, because that doesn't seem to be true:
length(df[1,1:3])
length(df[1,4:6])
关于如何更改符号以获得所需结果的任何想法?
Any thoughts on how to change the notation to get the desired result?
推荐答案
?chisq.test
告诉我们:
Arguments:
x: a numeric vector or matrix. ‘x’ and ‘y’ can also both be
factors.
y: a numeric vector; ignored if ‘x’ is a matrix. If ‘x’ is a
factor, ‘y’ should be a factor of the same length.
如果我们查看 df
Q,您定义的子集是:
If we look at df
as per your Q, the subsets you define are:
> is.numeric(df[1,1:3])
[1] FALSE
> is.vector(df[1,1:3])
[1] FALSE
> is.matrix(df[1,1:3])
[1] FALSE
和您的其他子集相同。那么在上帝的腿上呢会发生什么呢?内部发生的是,由于 df [1,1:3]
是数据帧,它首先转换为一个列矩阵,然后转换为向量: p>
and the same for your other subset. What happens then is in the lap of the God's. What happens internally is that as df[1,1:3]
is a data frame, it is converted first to a one column matrix, and thence to a vector:
Browse[2]> x ## here x is df[1,1:3]
[1] 1 2 3
而 df [1,4:6]
( y
在 chisq中。测试
函数)保持不变:
whilst df[1,4:6]
(y
in the chisq.test
function) is left untouched:
Browse[2]> y
X4 X5 X6
1 4 5 6
当代码调用 complete.cases(x,y)
,我们收到您报告的错误:
and when the code calls complete.cases(x,y)
, we get the error you report:
Browse[2]> complete.cases(x, y)
Error in complete.cases(x, y) : not all arguments have the same length
complete.cases
调用内部代码,所以我们看不到发生了什么,但基本上R认为 x
和 y
的长度不一样,这是因为它们的类型不同。
complete.cases
calls internal code so we can't see what is going on, but essentially R thinks x
and y
are not of the same length and this is because they are of different types.
@Prasad提供了一个工作,即将您向 chisq.test
提供的2个数据框列入向量。
@Prasad provides a work around, namely unlisting the 2 data frames you supply to chisq.test
into vectors.
但是,使用这个功能的方式至少对我来说并不重要。人们通常将数据存储在列中,而不是数据帧的行中。它可能不会有差异,但数据框的列是其组件,如列表的组件。每个单独的组件(列)是离散实体,数据帧中/ n /观察点上的数据向量。如果我们将您的 df
(并转回到数据框),以反映更自然的数据设置:
However, the way you are using the function doesn't make much sense, to me at least. One would normally store the data in columns, rather than rows of a data frame. It might not appear like there is a difference, but the columns of the data frame are its components, like the components of a list. Each individual component (column) is a discrete entity, a vector of data on the /n/ observations in the data frame. If we transpose your df
(and cast back to a data frame) to reflect a more natural data set-up:
> df2 <- data.frame(t(df))
那么我们可以使用你做的方法,但索引 df2
的第一列的单独行(而不是第一行 df
中的单独列)在 chisq.test
中调用:
then we can use the approach you did, but index the separate rows of the first column of df2
(rather than the separate columns of the first row of df
) in the chisq.test
call:
> chisq.test(df2[1:3,1], df2[4:6,1])
Pearson's Chi-squared test
data: df2[1:3, 1] and df2[4:6, 1]
X-squared = 6, df = 4, p-value = 0.1991
Warning message:
In chisq.test(df2[1:3, 1], df2[4:6, 1]) :
Chi-squared approximation may be incorrect
这样做是因为R能够在两个子集中删除空维,所以两个输入都是相应长度的向量:
This works, because R is able to drop the empty dimension in both subsets, so both inputs are vectors of the appropriate length:
> df2[1:3,1] ## drops the empty dimension!
[1] 1 2 3
> is.vector(df2[1:3,1])
[1] TRUE
这篇关于chisq.test错误消息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!