如何将因子转换为整数\\\<br/>umeric而不会丢失信息? [英] How to convert a factor to an integer\numeric without a loss of information?

查看:144
本文介绍了如何将因子转换为整数\\\<br/>umeric而不会丢失信息?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我将因子转换为数字或整数时,我得到底层的代码,而不是作为数字的值。

When I convert a factor to a numeric or integer, I get the underlying level codes, not the values as numbers.

f <- factor(sample(runif(5), 20, replace = TRUE))
##  [1] 0.0248644019011408 0.0248644019011408 0.179684827337041 
##  [4] 0.0284090070053935 0.363644931698218  0.363644931698218 
##  [7] 0.179684827337041  0.249704354675487  0.249704354675487 
## [10] 0.0248644019011408 0.249704354675487  0.0284090070053935
## [13] 0.179684827337041  0.0248644019011408 0.179684827337041 
## [16] 0.363644931698218  0.249704354675487  0.363644931698218 
## [19] 0.179684827337041  0.0284090070053935
## 5 Levels: 0.0248644019011408 0.0284090070053935 ... 0.363644931698218

as.numeric(f)
##  [1] 1 1 3 2 5 5 3 4 4 1 4 2 3 1 3 5 4 5 3 2

as.integer(f)
##  [1] 1 1 3 2 5 5 3 4 4 1 4 2 3 1 3 5 4 5 3 2

我必须诉诸粘贴获得真实值。

I have to resort to paste to get the real values.

as.numeric(paste(f))
##  [1] 0.02486440 0.02486440 0.17968483 0.02840901 0.36364493 0.36364493
##  [7] 0.17968483 0.24970435 0.24970435 0.02486440 0.24970435 0.02840901
## [13] 0.17968483 0.02486440 0.17968483 0.36364493 0.24970435 0.36364493
## [19] 0.17968483 0.02840901

是否有更好的方法将因子转换为数字?

Is there a better way to convert a factor to numeric?

推荐答案

请参阅 ?factor


c> as.numeric 应用于
a因子是无意义的,可能
通过隐式强制发生。到
将一个因素 f 转换为
约为其原始数字
值, as.numeric(levels ))[f]
,稍微多一些
效率比
as.numeric(as.character(f) code>。

In particular, as.numeric applied to a factor is meaningless, and may happen by implicit coercion. To transform a factor f to approximately its original numeric values, as.numeric(levels(f))[f] is recommended and slightly more efficient than as.numeric(as.character(f)).

R上的常见问题有类似的建议

为什么 as.numeric(levels(f))[f] c $ c> as.numeric(as.character(f))

Why is as.numeric(levels(f))[f] more efficent than as.numeric(as.character(f))?

numeric(as.character(f))实际上是 as.numeric(levels(f)[f]) length(x)的值,而不是 nlevels(x)的值。速度差对于具有少量水平的长向量将是最明显的。如果这些值大部分是唯一的,速度上没有太大差别。但是,您执行转换,此操作不太可能是您的代码中的瓶颈,因此不要太担心它。

as.numeric(as.character(f)) is effectively as.numeric(levels(f)[f]), so you are performing the conversion to numeric on length(x) values, rather than on nlevels(x) values. The speed difference will be most apparent for long vectors with few levels. If the values are mostly unique, there won't be much difference in speed. However you do the conversion, this operation is unlikely to be the bottleneck in your code, so don't worry too much about it.

某些时间

library(microbenchmark)
microbenchmark(
  as.numeric(levels(f))[f],
  as.numeric(levels(f)[f]),
  as.numeric(as.character(f)),
  paste0(x),
  paste(x),
  times = 1e5
)
## Unit: microseconds
##                         expr   min    lq      mean median     uq      max neval
##     as.numeric(levels(f))[f] 3.982 5.120  6.088624  5.405  5.974 1981.418 1e+05
##     as.numeric(levels(f)[f]) 5.973 7.111  8.352032  7.396  8.250 4256.380 1e+05
##  as.numeric(as.character(f)) 6.827 8.249  9.628264  8.534  9.671 1983.694 1e+05
##                    paste0(x) 7.964 9.387 11.026351  9.956 10.810 2911.257 1e+05
##                     paste(x) 7.965 9.387 11.127308  9.956 11.093 2419.458 1e+05

这篇关于如何将因子转换为整数\\\<br/>umeric而不会丢失信息?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆