从 R 中的单个字符串中提取所有数字 [英] Extract all numbers from a single string in R

查看:77
本文介绍了从 R 中的单个字符串中提取所有数字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设你有一个字符串:

strLine <- "The transactions (on your account) were as follows: 0 3,000 (500) 0 2.25 (1,200)"

是否有一个函数可以将数字剥离到一个数组/向量中,产生以下所需的解决方案:

Is there a function that strips out the numbers into an array/vector producing the following required solution:

result <- c(0, 3000, -500, 0, 2.25, -1200)?

result[3] = -500

请注意,数字以会计形式显示,因此负数出现在 () 之间.此外,您可以假设只有数字出现在数字第一次出现的右侧.我对 regexp 不是很好,所以如果你能提供帮助,我将不胜感激.另外,我不想假设字符串总是相同的,所以我希望在第一个数字的位置之前去除所有单词(和任何特殊字符).

Notice, the numbers are presented in accounting form so negative numbers appear between (). Also, you can assume that only numbers appear to the right of the first occurance of a number. I am not that good with regexp so would appreciate it if you could help if this would be required. Also, I don't want to assume the string is always the same so I am looking to strip out all words (and any special characters) before the location of the first number.

推荐答案

library(stringr)
x <- str_extract_all(strLine,"\\(?[0-9,.]+\\)?")[[1]]
> x
[1] "0"       "3,000"   "(500)"   "0"       "2.25"    "(1,200)"

将括号更改为否定:

x <- gsub("\\((.+)\\)","-\\1",x)
x
[1] "0"      "3,000"  "-500"   "0"      "2.25"   "-1,200"

然后 as.numeric()taRifx::destring 完成(下一版本 destring 将默认支持否定,因此 keep 选项将是必需的):

And then as.numeric() or taRifx::destring to finish up (the next version of destring will support negatives by default so the keep option won't be necessary):

library(taRifx)
destring( x, keep="0-9.-")
[1]    0 3000  -500    0    2.25 -1200

或:

as.numeric(gsub(",","",x))
[1]     0  3000  -500     0     2.25 -1200

这篇关于从 R 中的单个字符串中提取所有数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆