每列每n行的统计信息 [英] Stats on every n rows for each column
问题描述
我想计算每n行(在我的情况下,每6行)(或样本)的均值和标准差.以下功能为我提供了每6行的平均值(96行为我提供16个平均值)
colMeans(matrix(data.trim$X0, nrow=6))
我想对所有列(总共1280个平均值)执行此操作.我尝试运行此功能:
colMeans(matrix(data.trim, nrow=6))
但这根本不起作用,并且我收到以下错误消息:
colMeans(matrix(data.trim,nrow = 6))中的错误:"x"必须为数字
此外:警告消息:
在matrix(data.trim,nrow = 6)中:数据长度[80]不是行数[6]的约数或倍数
您可以使用sapply
将函数应用于每列:
sapply(iris[1:4], function(x) colMeans(matrix(x, nrow=6)))
Sepal.Length Sepal.Width Petal.Length Petal.Width
[1,] 4.950000 3.383333 1.450000 0.2333333
[2,] 4.850000 3.316667 1.483333 0.2000000
[3,] 5.183333 3.633333 1.316667 0.2500000
...
[23,] 6.533333 2.950000 5.583333 1.9333333
[24,] 6.516667 3.033333 5.316667 2.1333333
[25,] 6.383333 3.033333 5.266667 2.1333333
与手动创建前六行的均值相比:
colMeans(iris[1:6, 1:4])
Sepal.Length Sepal.Width Petal.Length Petal.Width
4.9500000 3.3833333 1.4500000 0.2333333
您也可以使用aggregate
并使用适当的by
参数来做到这一点:
aggregate(iris[1:4], by=list((seq(nrow(iris))-1) %/% 6), FUN=mean)
Group.1 Sepal.Length Sepal.Width Petal.Length Petal.Width
1 0 4.950000 3.383333 1.450000 0.2333333
2 1 4.850000 3.316667 1.483333 0.2000000
3 2 5.183333 3.633333 1.316667 0.2500000
...
这可以通过创建一个向量来确定要平均的组:
(seq(nrow(iris))-1) %/% 6
[1] 0 0 0 0 0 0 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 5 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 7 7 8 8 8 8
[53] 8 8 9 9 9 9 9 9 10 10 10 10 10 10 11 11 11 11 11 11 12 12 12 12 12 12 13 13 13 13 13 13 14 14 14 14 14 14 15 15 15 15 15 15 16 16 16 16 16 16 17 17
[105] 17 17 17 17 18 18 18 18 18 18 19 19 19 19 19 19 20 20 20 20 20 20 21 21 21 21 21 21 22 22 22 22 22 22 23 23 23 23 23 23 24 24 24 24 24 24
sapply
解决方案返回一个矩阵,而aggregate
解决方案返回一个数据帧,以防万一.
I would like to calculate the mean and standard deviation for every nth (in my case every 6) rows (or samples). The following function gives me the means for every 6 rows (96 rows gives me 16 mean values)
colMeans(matrix(data.trim$X0, nrow=6))
I would like to do this for ALL columns (a total of 1280 mean values). I tried running this function:
colMeans(matrix(data.trim, nrow=6))
but this does not work at all and I get the following error message:
Error in colMeans(matrix(data.trim, nrow = 6)) : 'x' must be numeric
In addition: Warning message:
In matrix(data.trim, nrow = 6) : data length [80] is not a sub-multiple or multiple of the number of rows [6]
You can apply the function to each column with sapply
:
sapply(iris[1:4], function(x) colMeans(matrix(x, nrow=6)))
Sepal.Length Sepal.Width Petal.Length Petal.Width
[1,] 4.950000 3.383333 1.450000 0.2333333
[2,] 4.850000 3.316667 1.483333 0.2000000
[3,] 5.183333 3.633333 1.316667 0.2500000
...
[23,] 6.533333 2.950000 5.583333 1.9333333
[24,] 6.516667 3.033333 5.316667 2.1333333
[25,] 6.383333 3.033333 5.266667 2.1333333
Compare with creating the means of the first six rows manually:
colMeans(iris[1:6, 1:4])
Sepal.Length Sepal.Width Petal.Length Petal.Width
4.9500000 3.3833333 1.4500000 0.2333333
You can also do this with aggregate
given the proper by
argument:
aggregate(iris[1:4], by=list((seq(nrow(iris))-1) %/% 6), FUN=mean)
Group.1 Sepal.Length Sepal.Width Petal.Length Petal.Width
1 0 4.950000 3.383333 1.450000 0.2333333
2 1 4.850000 3.316667 1.483333 0.2000000
3 2 5.183333 3.633333 1.316667 0.2500000
...
This works by creating a vector which identifies the groups to be averaged:
(seq(nrow(iris))-1) %/% 6
[1] 0 0 0 0 0 0 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 5 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 7 7 8 8 8 8
[53] 8 8 9 9 9 9 9 9 10 10 10 10 10 10 11 11 11 11 11 11 12 12 12 12 12 12 13 13 13 13 13 13 14 14 14 14 14 14 15 15 15 15 15 15 16 16 16 16 16 16 17 17
[105] 17 17 17 17 18 18 18 18 18 18 19 19 19 19 19 19 20 20 20 20 20 20 21 21 21 21 21 21 22 22 22 22 22 22 23 23 23 23 23 23 24 24 24 24 24 24
The sapply
solution returns a matrix, whereas the aggregate
solution returns a data frame, in case one is more desirable.
这篇关于每列每n行的统计信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!