提取字符串中括号之间的数字 [英] Extract numbers between brackets within a string
问题描述
可能的重复:
提取R中所有括号内的信息(正则表达式) >
我从 excel 导入数据,一个单元格由这些包含数字和字母的长字符串组成,有没有办法只从该字符串中提取数字并将其存储在一个新变量中?不幸的是,有些条目有两组括号,我只想要第二组?我可以为此使用 grep 吗?
I inported data from excel and one cell consists of these long strings that contain number and letters, is there a way to extract only the numbers from that string and store it in a new variable? Unfortunately, some of the entries have two sets of brackets and I would only want the second one? Could I use grep for that?
字符串看起来或多或少是这样的,但是字符串的长度各不相同:
the strings look more or less like this, the length of the strings vary however:
"East Kootenay C (5901035) RDA 01011"
或者像这样:
"Thompson-Nicola J (Copper Desert Country) (5933039) RDA 02020"
我想要的是 5901035
和 5933039
任何提示和帮助将不胜感激.
Any hints and help would be greatly appreciated.
推荐答案
有许多可能的正则表达式可以做到这一点.这是一个:
There are many possible regular expressions to do this. Here is one:
x=c("East Kootenay C (5901035) RDA 01011","Thompson-Nicola J (Copper Desert Country) (5933039) RDA 02020")
> gsub('.+\\(([0-9]+)\\).+?$', '\\1', x)
[1] "5901035" "5933039"
让我们分解第一个表达式 '.+\\(([0-9]+)\\).+'
Lets break down the syntax of that first expression '.+\\(([0-9]+)\\).+'
.+
一项或多项\\(
括号是正则表达式中的特殊字符,所以如果我想表示实际的东西(
我需要用\
.我必须为 R 再次转义它(因此有两个\
s).
.+
one or more of anything\\(
parentheses are special characters in a regular expression, so if I want to represent the actual thing(
I need to escape it with a\
. I have to escape it again for R (hence the two\
s).
([0-9]+)
我提到了特殊字符,这里我用了两个.第一个是括号,表示我想保留的组.第二个 [
和 ]
围绕着事物的组.有关详细信息,请参阅 ?regex
.
([0-9]+)
I mentioned special characters, here I use two. the first is the parentheses which indicate a group I want to keep. The second [
and ]
surround groups of things. see ?regex
for more information.
?$
最后一部分确保我正在获取注释中指出的括号中的最后一组数字.
?$
The final piece assures that I am grabbing the LAST set of numbers in parens as noted in the comments.
我也可以使用 *
而不是 .
这意味着 0 或多个而不是一个或多个 i,以防您的括号字符串出现在 a 的开头或结尾字符串.
I could also use *
instead of .
which would mean 0 or more rather than one or more i in case your paren string comes at the beginning or end of a string.
gsub
的第二部分是我替换第一部分的内容.我用过:\\1
.这表示使用组 1(上面的 ( )
中的内容.我需要再次对其进行两次转义,一次用于正则表达式,一次用于 R.
The second piece of the gsub
is what I am replacing the first portion with. I used: \\1
. This says use group 1 (the stuff inside the ( )
from above. I need to escape it twice again, once for the regex and once for R.
当然像泥一样清澈!享受您的数据处理项目!
Clear as mud to be sure! Enjoy your data munging project!
这篇关于提取字符串中括号之间的数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!