从data.frame转换为数值矩阵时,为什么值会更改? [英] Why are values changing when converting from data.frame to a numeric matrix?

查看:103
本文介绍了从data.frame转换为数值矩阵时,为什么值会更改?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要将数据框转换为数字矩阵.但是,当我使用data.frame函数时,小数转换为另一个数字,我也不知道为什么.有人可以告诉我发生了什么事吗?

I need to convert my data frame into a numeric matrix. However, when I use the data.frame function, the decimals get converted to a different number and I have no idea why. Can someone fill me in on what's happening?

> head(x[,1:5])
         TCGA-AA-3520-01A-01R-0821-07 TCGA-AA-3532-01A-01R-0821-07 TCGA-AA-3553-01A-01R-0821-07 TCGA-A6-2674-01A-02R-0821-07 TCGA-AA-3521-01A-01R-0821-07
ELMO2              -0.840833333333333                        0.018            0.354916666666667                    -0.203750                    0.6890000
CREB3L1                         1.333                       0.7625                      0.13475                     2.498750                    1.1572500
RPS11                          1.4755                       0.3245                        0.634                     0.483125                    0.9526250
PNMA1                        -1.39075                     -1.48725                      -0.8305                    -0.463250                   -2.2230000
MMP2               0.0278333333333333                      -0.2065           0.0666666666666666                     2.156000                    0.1501667
C10orf90                      -2.5495                     -2.76575                     -2.76375                    -2.482250                   -2.1107500
> head(data.matrix(x[,1:5]))
         TCGA-AA-3520-01A-01R-0821-07 TCGA-AA-3532-01A-01R-0821-07 TCGA-AA-3553-01A-01R-0821-07 TCGA-A6-2674-01A-02R-0821-07 TCGA-AA-3521-01A-01R-0821-07
ELMO2                            3323                           94                         1701                    -0.203750                    0.6890000
CREB3L1                          4307                         3022                          654                     2.498750                    1.1572500
RPS11                            4485                         1458                         2786                     0.483125                    0.9526250
PNMA1                            4379                         4438                         3397                    -0.463250                   -2.2230000
MMP2                              155                          932                          328                     2.156000                    0.1501667
C10orf90                         5139                         5193                         5230                    -2.482250                   -2.1107500
> class(x)
[1] "data.frame"

> str(x)
'data.frame':   6150 obs. of  174 variables:
 $ TCGA-AA-3520-01A-01R-0821-07: Factor w/ 5538 levels "","0","0.000166666666666662",..: 3323 4307 4485 4379 155 5139 4177 1400 4735 3363 ...
 $ TCGA-AA-3532-01A-01R-0821-07: Factor w/ 5597 levels "","0.000499999999999968",..: 94 3022 1458 4438 932 5193 1374 2757 4671 2503 ...
 $ TCGA-AA-3553-01A-01R-0821-07: Factor w/ 5550 levels "","0.000249999999999995",..: 1701 654 2786 3397 328 5230 65 194 4900 3966 ...
 $ TCGA-A6-2674-01A-02R-0821-07: num  -0.204 2.499 0.483 -0.463 2.156 ...
 $ TCGA-AA-3521-01A-01R-0821-07: num  0.689 1.157 0.953 -2.223 0.15 ...
 $ TCGA-AA-3534-01A-01R-0821-07: num  -0.6789 -0.0877 1.5736 -1.6678 -0.7148 ...
 $ TCGA-AA-3555-01A-01R-0821-07: Factor w/ 5580 levels "","-0.00012499999999999",..: 373 4970 2076 519 1344 5084 3882 1285 4760 2778 ...
 $ TCGA-A6-2670-01A-02R-0821-07: num  0.588 0.569 0.808 -1.661 1.073 ...
 $ TCGA-A6-2683-01A-01R-0821-07: num  -0.77 0.741 1.564 -2.984 -1.569 ...
 $ TCGA-AA-3526-01A-02R-0821-07: num  -0.824 2.215 0.819 -1.846 -0.862 ...
 $ TCGA-A6-2677-01A-01R-0821-07: num  -0.733 0.526 0.892 -1.598 -1.69 ...
 $ TCGA-AA-3522-01A-01R-0821-07: num  -0.981 2.094 0.818 -1.048 -1.452 ...
 $ TCGA-AA-3538-01A-01R-0821-07: num  -0.144 0.631 0.794 -1.523 -0.198 ...
 $ TCGA-AA-3556-01A-01R-0821-07: Factor w/ 5556 levels "","-0.000125000000000014",..: 2256 4772 3446 4253 4040 4927 3026 316 3766 3221 ...
 $ TCGA-A6-2678-01A-01R-0821-07: num  -1.38 1.706 1.103 -2.725 -0.918 ...
 $ TCGA-AA-3524-01A-02R-0821-07: Factor w/ 5611 levels "","-0.0005","0.000500000000000006",..: 4062 3671 4749 4751 4051 5226 2623 1227 4252 1489 ...
 $ TCGA-AA-3542-01A-02R-0821-07: num  -1.195 0.641 1.952 -1.63 -1.264 ...
 $ TCGA-AA-3558-01A-01R-0821-07: Factor w/ 5580 levels "","0.000375000000000007",..: 4245 3920 4277 4910 4766 5126 1450 3350 4898 1915 ...
 $ TCGA-AA-3544-01A-01R-0821-07: num  -0.157 0.649 0.937 -1.941 -1.417 ...
 $ TCGA-AA-3560-01A-01R-0821-07: num  -0.146 0.554 0.581 -2.503 -0.438 ...
 $ TCGA-AA-3514-01A-02R-0821-07: Factor w/ 5678 levels "","0","0.000375000000000028",..: 3800 2056 2422 1158 1507 4620 3564 1877 5480 4076 ...
 $ TCGA-AA-3527-01A-01R-0821-07: num  -0.3973 -0.0915 1.4019 -2.5513 -0.395 ...
 $ TCGA-AA-3548-01A-01R-0821-07: Factor w/ 5470 levels "","0.000100000000000011",..: 2590 3817 3388 4531 2770 4922 2715 406 4473 2711 ...
 $ TCGA-AA-3561-01A-01R-0821-07: num  -1.115 1.01 1.266 -1.419 -0.537 ...
 $ TCGA-AA-3517-01A-01R-0821-07: Factor w/ 5604 levels "","-0.000333333333333335",..: 479 1182 4514 5003 4005 4799 1499 4796 849 3079 ...
 $ TCGA-AA-3529-01A-02R-0821-07: Factor w/ 5583 levels "","-0.000124999999999978",..: 2912 3970 4073 4555 4257 5238 3242 2668 899 3508 ...
 $ TCGA-AA-3549-01A-02R-0821-07: Factor w/ 5538 levels "","0.000166666666666671",..: 1378 4762 4356 4857 519 4739 1254 4777 350 444 ...
 $ TCGA-AA-3562-01A-02R-0821-07: Factor w/ 5628 levels "","0","0.000249999999999993",..: 2453 3556 3523 4987 2236 5148 1681 1854 2249 4096 ...

推荐答案

data.matrix()函数通过使用内部代码将因子转换为数字.这就是为什么它们在数据框中被列为因子,并且在使用data.matrix()后具有不同的值的原因.要在这种情况下创建数字矩阵,请尝试以下操作:

The data.matrix() function converts factors to numbers by using their internal codes. That's why they're listed as factors in the data frame and have different values after using data.matrix(). To create a numeric matrix in this situation, try this:

y <- apply(as.matrix(x[, 1:5]), 2, as.numeric)

使用as.matrix()时,因子变成字符串.使用apply()会将所有内容转换为数字,而不会丢失矩阵结构.

When using as.matrix(), factors become strings. Using apply() will convert everything to numeric without losing the matrix structure.

正如斯蒂芬·亨德森(Stephen Henderson)在评论中提到的那样,尝试弄清楚为什么为什么存储在数据框中的数字被视为因素是一个好主意.

As Stephen Henderson mentioned in his comment, it's a good idea to try to figure out why the numeric values stored in your data frame are being treated as factors.

这篇关于从data.frame转换为数值矩阵时,为什么值会更改?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆