制作矩阵数字和命名 [英] Making matrix numeric and name orders
问题描述
我有以下数据:
yvar< - c(1:150)
复制& - c(rep(c(rep(1,10),rep(2,10),rep(3,10)),5))
基因型< 1:10,sep =),15))
environments< - c(rep(paste(E,5:1,sep =),each = 30))
mydf1< - data.frame(yvar,replication,genotypes,environments)
mydf1 $ replication< - as.factor(mydf1 $ replication)
我想汇总数据:
mydf = data.frame(aggregate (yvar〜genotypes + environments,data = mydf1,mean))
现在创建一个矩阵, ,matm不是!
matm = as.matrix(aggregate(yvar〜genotypes,mydf,'c'))
pre>
colnames )c(基因型,级别(mydf $环境))
基因型E1 E2 E3 E4 E5
[1,]G113110171 11
[2,]G10140110805020
[3,]G213210272 4212
[4,]G3133103734313
[5,]G4134104 4414
[6,]G5135105754515
[7,]G6136 764616
[8,]G7137107774717
[9,]G8138 108784818
[10,]G9139109794919
我转换成data.frame,然后
matd< ; - data.frame(matm)
基因型E1 E2 E3 E4 E5
1 G1 31.70000 26.76667 23.60000 30.73333 43.13333
2 G10 32.40 000 17.86667 28.83333 32.43333 30.23333
3 G2 29.50000 24.60000 24.16667 33.43333 38.66667
4 G3 27.00000 28.83333 33.63333 43.83333 29.60000
5 G4 29.53333 29.90000 26.60000 26.13333 40.33333
6 G5 27.40000 32.43333 27.96667 40.43333 41.46667
7 G6 36.76667 32.26667 28.26667 38.73333 33.43333
8 G7 29.63333 27.00000 26.96667 34.90000 40.70000
9 G8 24.50000 23.26667 22.50000 27.60000 32.26667
10 G9 31.60000 24.96667 24.46667 27.56667 36.26667
我想摆脱基因型列,然后将其转换为矩阵
matx = data.frame(matd [, - 1])$ b $ b matdm< - as.matrix(matx)
matdm
E1 E2 E3 E4 E5
[1,]31.7000026.7666723.6000030.7333343.13333
[2,]32.4000017.8666728.8333332.4333330.23333
[3,]29.5000024.6000024.1666733.43333 38.66667
[4,]27.0000028.8333333.6333343.8333329.60000
[5,]29.5333329.9000026.6000026.1333340.33333
[6,]27.4000032.4333327.9666740.4333341.46667
[7,]36.7666732.2666728.2666738.7333333.43333
[ 8,]29.6333327.0000026.9666734.9000040.70000
[9,]24.5000023.2666722.5000027.6000032.26667
[10,] 31.6000024.9666724.4666727.5666736.26667
我有两个问题: p>
(1)有一致的方式来制作/分配矩阵数字
(2)我可以看到基因型列名称按字母顺序排列。我的文件在列中有不同的顺序。如果这是一致的,我很好,但是我担心以下部分:
colnames(matm)< - c(基因型,级别(mydf $ environment))
如果有不同的顺序聚合函数和
级别(mydf $ environments),
它们是按字母顺序排列还是在文件中排序。
感谢您的建议。
解决方案我认为我看到混乱来自哪里。稍微备份,当你进行聚合时你想变成一个矩阵;尝试捕获并查看它:
myAgg< - 聚合(yvar〜基因型,mydf,'c')
str(myAgg)
产生:
> str(myAgg)
'data.frame':10 obs。的2个变量:
$基因型:因子w / 10级别G1,G10,G2,...:1 2 3 4 5 6 7 8 9 10
$ yvar:num [ 1:10,1:5] 131 140 132 133 134 135 136 137 138 139 ...
因此,聚合产生了一些非典型的数据框架。列
yvar
实际上是您感兴趣的矩阵:> ; myAgg $ yvar
[,1] [,2] [,3] [,4] [,5]
[1,] 131 101 71 41 11
[2,] 140 110 80 50 20
[3,] 132 102 72 42 12
[4,] 133 103 73 43 13
[5,] 134 104 74 44 14
[6,] 135 105 75 45 15
[7,] 136 106 76 46 16
[8,] 137 107 77 47 17
[9,] 138 108 78 48 18
[10 ,] 139 109 79 49 19
所以你可以直接抓住:
matdm< - myAgg $ yvar
现在回答你的具体问题...
1)使矩阵数字的一致方式是确保数据进入
matrix()
或as.matrix()
函数是数字的。当您调用
matm = as.matrix(aggregate(yvar〜genotypes,mydf,'c'))
您创建了一个字符矩阵,因为您有一个字符列。然后将该矩阵转换为data.frame。这将列转换为因素。那么你选择了几列,这并不奇怪,仍然是因素。所以当你调用
matdm< - as.matrix(matx)
将因子转换为字符。
2)由
聚合(yvar〜genotypes)创建的变量的顺序,mydf,'c')
是变量
基因型中的因子顺序的函数, code>。这些通常是按字母顺序创建的,但您可以随时查看级别,以便完全确定。如果因素是手动创建的,则不一定按字母顺序排列。
I have the following data:
yvar <- c(1:150) replication <- c( rep(c(rep(1, 10), rep(2,10), rep(3,10)),5)) genotypes <- c(rep(paste("G", 1:10, sep= ""), 15)) environments <- c(rep(paste("E",5:1, sep = ""), each = 30)) mydf1 <- data.frame (yvar, replication, genotypes, environments) mydf1$replication <- as.factor(mydf1$replication)
I want to summarize data:
mydf = data.frame(aggregate (yvar ~ genotypes + environments, data = mydf1, mean))
Now create a matrix, hopefully numeric, matm is not !
matm = as.matrix(aggregate(yvar ~ genotypes, mydf, 'c')) colnames(matm) <- c("genotypes", levels(mydf$environments)) genotypes E1 E2 E3 E4 E5 [1,] "G1" "131" "101" " 71" " 41" " 11" [2,] "G10" "140" "110" " 80" " 50" " 20" [3,] "G2" "132" "102" " 72" " 42" " 12" [4,] "G3" "133" "103" " 73" " 43" " 13" [5,] "G4" "134" "104" " 74" " 44" " 14" [6,] "G5" "135" "105" " 75" " 45" " 15" [7,] "G6" "136" "106" " 76" " 46" " 16" [8,] "G7" "137" "107" " 77" " 47" " 17" [9,] "G8" "138" "108" " 78" " 48" " 18" [10,] "G9" "139" "109" " 79" " 49" " 19"
I converted to data.frame, then
matd <- data.frame(matm) genotypes E1 E2 E3 E4 E5 1 G1 31.70000 26.76667 23.60000 30.73333 43.13333 2 G10 32.40000 17.86667 28.83333 32.43333 30.23333 3 G2 29.50000 24.60000 24.16667 33.43333 38.66667 4 G3 27.00000 28.83333 33.63333 43.83333 29.60000 5 G4 29.53333 29.90000 26.60000 26.13333 40.33333 6 G5 27.40000 32.43333 27.96667 40.43333 41.46667 7 G6 36.76667 32.26667 28.26667 38.73333 33.43333 8 G7 29.63333 27.00000 26.96667 34.90000 40.70000 9 G8 24.50000 23.26667 22.50000 27.60000 32.26667 10 G9 31.60000 24.96667 24.46667 27.56667 36.26667
I want to get rid of genotypes column and then convert it to matrix
matx = data.frame(matd[,-1]) matdm <- as.matrix(matx) matdm E1 E2 E3 E4 E5 [1,] "31.70000" "26.76667" "23.60000" "30.73333" "43.13333" [2,] "32.40000" "17.86667" "28.83333" "32.43333" "30.23333" [3,] "29.50000" "24.60000" "24.16667" "33.43333" "38.66667" [4,] "27.00000" "28.83333" "33.63333" "43.83333" "29.60000" [5,] "29.53333" "29.90000" "26.60000" "26.13333" "40.33333" [6,] "27.40000" "32.43333" "27.96667" "40.43333" "41.46667" [7,] "36.76667" "32.26667" "28.26667" "38.73333" "33.43333" [8,] "29.63333" "27.00000" "26.96667" "34.90000" "40.70000" [9,] "24.50000" "23.26667" "22.50000" "27.60000" "32.26667" [10,] "31.60000" "24.96667" "24.46667" "27.56667" "36.26667"
I have two questions:
(1) is there is consistent way to make / assign a matrix numeric
(2) I can see the genotypes column names are sorted alphabetically. My file has different order in the column. I am fine with this order if this is consistent, however I am afraid with the following portion:
colnames(matm) <- c("genotypes", levels(mydf$environments))
If there is different order of the aggregate function and
levels(mydf$environments),
do they both sort alphabettically or oder in file.appreciate your suggestion.
解决方案I think I see where the confusion is coming from. Backing up slightly, when you do the aggregation you want to turn into a matrix; try capturing that and looking at it:
myAgg <- aggregate(yvar ~ genotypes, mydf, 'c') str(myAgg)
yields:
> str(myAgg) 'data.frame': 10 obs. of 2 variables: $ genotypes: Factor w/ 10 levels "G1","G10","G2",..: 1 2 3 4 5 6 7 8 9 10 $ yvar : num [1:10, 1:5] 131 140 132 133 134 135 136 137 138 139 ...
So the aggregate produces a somewhat atypical data.frame. The column
yvar
is actually the matrix you are interested in:> myAgg$yvar [,1] [,2] [,3] [,4] [,5] [1,] 131 101 71 41 11 [2,] 140 110 80 50 20 [3,] 132 102 72 42 12 [4,] 133 103 73 43 13 [5,] 134 104 74 44 14 [6,] 135 105 75 45 15 [7,] 136 106 76 46 16 [8,] 137 107 77 47 17 [9,] 138 108 78 48 18 [10,] 139 109 79 49 19
so you can grab that directly:
matdm <- myAgg$yvar
Now to answer your specific questions...
1) the consistent way to make a matrix numeric is to ensure that data going into the
matrix()
oras.matrix()
functions are numeric. When you called
matm = as.matrix(aggregate(yvar ~ genotypes, mydf, 'c'))
you created a character matrix because you had a char column. Then you converted that matrix into a data.frame. This converted the columns into factors. Then you selected a few columns which were, not surprisingly, still factors. So when you called
matdm <- as.matrix(matx)
the factors got converted to characters.
2) The order of the variables created by
aggregate(yvar ~ genotypes, mydf, 'c')
is a function of the order of the factors in the variable
genotypes
. Those are generally created alphabetically, but you can always look at the levels in order to be totally sure. If the factors were created manually they would not necessarily be in alphabetical order.这篇关于制作矩阵数字和命名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!