从 R 中的单个字符串中提取所有数字 [英] Extract all numbers from a single string in R
问题描述
假设你有一个字符串:
strLine <- "The transactions (on your account) were as follows: 0 3,000 (500) 0 2.25 (1,200)"
是否有一个函数可以将数字剥离到一个数组/向量中,产生以下所需的解决方案:
Is there a function that strips out the numbers into an array/vector producing the following required solution:
result <- c(0, 3000, -500, 0, 2.25, -1200)?
即
result[3] = -500
请注意,数字以会计形式显示,因此负数出现在 () 之间.此外,您可以假设只有数字出现在数字第一次出现的右侧.我对 regexp 不是很好,所以如果你能提供帮助,我将不胜感激.另外,我不想假设字符串总是相同的,所以我希望在第一个数字的位置之前去除所有单词(和任何特殊字符).
Notice, the numbers are presented in accounting form so negative numbers appear between (). Also, you can assume that only numbers appear to the right of the first occurance of a number. I am not that good with regexp so would appreciate it if you could help if this would be required. Also, I don't want to assume the string is always the same so I am looking to strip out all words (and any special characters) before the location of the first number.
推荐答案
library(stringr)
x <- str_extract_all(strLine,"\\(?[0-9,.]+\\)?")[[1]]
> x
[1] "0" "3,000" "(500)" "0" "2.25" "(1,200)"
将括号更改为否定:
x <- gsub("\\((.+)\\)","-\\1",x)
x
[1] "0" "3,000" "-500" "0" "2.25" "-1,200"
然后 as.numeric()
或 taRifx::destring
完成(下一版本 destring
将默认支持否定,因此 keep
选项将是必需的):
And then as.numeric()
or taRifx::destring
to finish up (the next version of destring
will support negatives by default so the keep
option won't be necessary):
library(taRifx)
destring( x, keep="0-9.-")
[1] 0 3000 -500 0 2.25 -1200
或:
as.numeric(gsub(",","",x))
[1] 0 3000 -500 0 2.25 -1200
这篇关于从 R 中的单个字符串中提取所有数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!