如何测试字符数据框中的数值,并将其转换为数字? [英] How do I test for numeric values in a dataframe of characters, and convert those to numeric?

查看:171
本文介绍了如何测试字符数据框中的数值,并将其转换为数字?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框如下:

> theDF
   ID        Ticker INDUSTRY_SECTOR              VAR             CVAR
1   1      USD CASH                                0                0
12  2      ZAR CASH                 -181412.82055904 -301731.22832191
23  3 BAT SJ EQUITY       Financial  61711.951234826 102641.162795691
34  4 HCI SJ EQUITY       Financial 1095.16002541256 1821.50290513369
45  5 PSG SJ EQUITY       Financial 16498.2192382422  27440.331617902

我们可以看到这些都是字符列:

We can see these are all character columns:

> apply(theDF, 2, mode)
             ID          Ticker INDUSTRY_SECTOR             VAR            CVAR 
    "character"     "character"     "character"     "character"     "character" 

我想要的东西只会将数字型向量更改为数字。基本上,如果它是看起来像一个数字,使它成为数字,否则留下它。我无法在StackOverflow上找到任何不需要知道要手动转换的名称或列的任何内容。这个DF不会总是在同一个顺序,或有列,所以我需要一些动态的方式来检查列看起来像数字,并使这些列数字。

I would like something that will change ONLY the numeric-type vectors to numeric. Basically, if it "looks like" a numeric, make it numeric, otherwise leave it be. I cannot find anything on StackOverflow which does not require knowing the names or columns you want to convert before hand. This DF will not always be in the same order, or have the columns, so I need some dynamic way to check if the columns "look like" numeric and make those columns numerics.

这个(显然)给了我一大堆的字符列:

This (obviously) gives me a bunch of NA;s for the character columns:

> apply(theDF, 2, as.numeric)
     ID Ticker INDUSTRY_SECTOR        VAR        CVAR
[1,]  1     NA              NA       0.00       0.000
[2,]  2     NA              NA -181412.82 -301731.228
[3,]  3     NA              NA   61711.95  102641.163
[4,]  4     NA              NA    1095.16    1821.503
[5,]  5     NA              NA   16498.22   27440.332

我尝试过这样的东西,但不但不起作用,似乎效率低下:

I tried something like this, but not only does it not work, it seems horribly inefficient:

> apply(theDF, 2, function(x) tryCatch(as.numeric(x),error=function(e) e, warning=function(w) x))
     ID  Ticker          INDUSTRY_SECTOR VAR                CVAR              
[1,] "1" "USD CASH"      ""              "0"                "0"               
[2,] "2" "ZAR CASH"      ""              "-181412.82055904" "-301731.22832191"
[3,] "3" "BAT SJ EQUITY" "Financial"     "61711.951234826"  "102641.162795691"
[4,] "4" "HCI SJ EQUITY" "Financial"     "1095.16002541256" "1821.50290513369"
[5,] "5" "PSG SJ EQUITY" "Financial"     "16498.2192382422" "27440.331617902" 

有没有更好的方法来做到这一点

Is there a better way to do this?

编辑:
人们不断要求这样做,所以这里...

People keep asking for this, so here goes...

> apply(theDF, 2, mode)
             ID          Ticker INDUSTRY_SECTOR             VAR            CVAR 
    "character"     "character"     "character"     "character"     "character" 
> sapply(theDF, mode)
             ID          Ticker INDUSTRY_SECTOR             VAR            CVAR 
    "character"     "character"     "character"     "character"     "character" 
> apply(theDF, 2, class)
             ID          Ticker INDUSTRY_SECTOR             VAR            CVAR 
    "character"     "character"     "character"     "character"     "character" 
> sapply(theDF, class)
             ID          Ticker INDUSTRY_SECTOR             VAR            CVAR 
    "character"     "character"     "character"     "character"     "character" 


推荐答案

看起来像 type.convert()的工作。 >

Looks like a job for type.convert().

theDF[] <- lapply(theDF, type.convert, as.is = TRUE)
## check the result
sapply(theDF, class)
#          ID          Ticker INDUSTRY_SECTOR             VAR            CVAR 
#   "integer"     "character"     "character"       "numeric"       "numeric" 

type.convert()强制向量到其最合适类型。设置 as.is = TRUE 允许我们保留字符,否则将被胁迫为因素。

type.convert() coerces a vector to its "most appropriate" type. Setting as.is = TRUE allows us to keep characters as such, where they otherwise would be coerced to factors.

更新:对于不是字符的列,它们将需要先被强制转换为字符。

Update: For columns that are not character, they will need to be first coerced to character.

theDF[] <- lapply(theDF, function(x) type.convert(as.character(x), as.is = TRUE))

这篇关于如何测试字符数据框中的数值,并将其转换为数字?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆