如何测试字符数据框中的数值,并将它们转换为数字? [英] How do I test for numeric values in a dataframe of characters, and convert those to numeric?

查看:19
本文介绍了如何测试字符数据框中的数值,并将它们转换为数字?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个类似于以下的数据框:

I have a dataframe somewhat like the following:

> theDF
   ID        Ticker INDUSTRY_SECTOR              VAR             CVAR
1   1      USD CASH                                0                0
12  2      ZAR CASH                 -181412.82055904 -301731.22832191
23  3 BAT SJ EQUITY       Financial  61711.951234826 102641.162795691
34  4 HCI SJ EQUITY       Financial 1095.16002541256 1821.50290513369
45  5 PSG SJ EQUITY       Financial 16498.2192382422  27440.331617902

我们可以看到这些都是字符列:

We can see these are all character columns:

> apply(theDF, 2, mode)
             ID          Ticker INDUSTRY_SECTOR             VAR            CVAR 
    "character"     "character"     "character"     "character"     "character" 

我想要一些只会将数字类型向量更改为数字的东西.基本上,如果它看起来像"一个数字,就让它成为数字,否则就保留它.我在 StackOverflow 上找不到任何不需要事先知道要转换的名称或列的内容.这个 DF 不会总是以相同的顺序排列,或者有列,所以我需要一些动态的方法来检查列是否看起来像"数字并将这些列设为数字.

I would like something that will change ONLY the numeric-type vectors to numeric. Basically, if it "looks like" a numeric, make it numeric, otherwise leave it be. I cannot find anything on StackOverflow which does not require knowing the names or columns you want to convert before hand. This DF will not always be in the same order, or have the columns, so I need some dynamic way to check if the columns "look like" numeric and make those columns numerics.

这(显然)给了我一堆字符列的 NA;s:

This (obviously) gives me a bunch of NA;s for the character columns:

> apply(theDF, 2, as.numeric)
     ID Ticker INDUSTRY_SECTOR        VAR        CVAR
[1,]  1     NA              NA       0.00       0.000
[2,]  2     NA              NA -181412.82 -301731.228
[3,]  3     NA              NA   61711.95  102641.163
[4,]  4     NA              NA    1095.16    1821.503
[5,]  5     NA              NA   16498.22   27440.332

我尝试过类似的方法,但它不仅不起作用,而且效率极低:

I tried something like this, but not only does it not work, it seems horribly inefficient:

> apply(theDF, 2, function(x) tryCatch(as.numeric(x),error=function(e) e, warning=function(w) x))
     ID  Ticker          INDUSTRY_SECTOR VAR                CVAR              
[1,] "1" "USD CASH"      ""              "0"                "0"               
[2,] "2" "ZAR CASH"      ""              "-181412.82055904" "-301731.22832191"
[3,] "3" "BAT SJ EQUITY" "Financial"     "61711.951234826"  "102641.162795691"
[4,] "4" "HCI SJ EQUITY" "Financial"     "1095.16002541256" "1821.50290513369"
[5,] "5" "PSG SJ EQUITY" "Financial"     "16498.2192382422" "27440.331617902" 

有没有更好的方法来做到这一点?

Is there a better way to do this?

人们一直在要求这个,所以这里......

People keep asking for this, so here goes...

> apply(theDF, 2, mode)
             ID          Ticker INDUSTRY_SECTOR             VAR            CVAR 
    "character"     "character"     "character"     "character"     "character" 
> sapply(theDF, mode)
             ID          Ticker INDUSTRY_SECTOR             VAR            CVAR 
    "character"     "character"     "character"     "character"     "character" 
> apply(theDF, 2, class)
             ID          Ticker INDUSTRY_SECTOR             VAR            CVAR 
    "character"     "character"     "character"     "character"     "character" 
> sapply(theDF, class)
             ID          Ticker INDUSTRY_SECTOR             VAR            CVAR 
    "character"     "character"     "character"     "character"     "character" 

推荐答案

看起来像是 type.convert() 的工作.

Looks like a job for type.convert().

theDF[] <- lapply(theDF, type.convert, as.is = TRUE)
## check the result
sapply(theDF, class)
#          ID          Ticker INDUSTRY_SECTOR             VAR            CVAR 
#   "integer"     "character"     "character"       "numeric"       "numeric" 

type.convert() 将向量强制为其最合适"的类型.设置 as.is = TRUE 允许我们保留字符,否则它们将被强制转换为因子.

type.convert() coerces a vector to its "most appropriate" type. Setting as.is = TRUE allows us to keep characters as such, where they otherwise would be coerced to factors.

更新:对于不是字符的列,需要先将它们强制转换为字符.

Update: For columns that are not character, they will need to be first coerced to character.

theDF[] <- lapply(theDF, function(x) type.convert(as.character(x), as.is = TRUE))

这篇关于如何测试字符数据框中的数值,并将它们转换为数字?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆