提取字符串中括号之间的数字 [英] Extract numbers between brackets within a string

查看:83
本文介绍了提取字符串中括号之间的数字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

可能的重复:
提取R中所有括号内的信息(正则表达式) >

我从 excel 导入数据,一个单元格由这些包含数字和字母的长字符串组成,有没有办法只从该字符串中提取数字并将其存储在一个新变量中?不幸的是,有些条目有两组括号,我只想要第二组?我可以为此使用 grep 吗?

I inported data from excel and one cell consists of these long strings that contain number and letters, is there a way to extract only the numbers from that string and store it in a new variable? Unfortunately, some of the entries have two sets of brackets and I would only want the second one? Could I use grep for that?

字符串看起来或多或少是这样的,但是字符串的长度各不相同:

the strings look more or less like this, the length of the strings vary however:

"East Kootenay C (5901035) RDA 01011"

或者像这样:

"Thompson-Nicola J (Copper Desert Country) (5933039) RDA 02020"

我想要的是 59010355933039

任何提示和帮助将不胜感激.

Any hints and help would be greatly appreciated.

推荐答案

有许多可能的正则表达式可以做到这一点.这是一个:

There are many possible regular expressions to do this. Here is one:

x=c("East Kootenay C (5901035) RDA 01011","Thompson-Nicola J (Copper Desert Country) (5933039) RDA 02020")

> gsub('.+\\(([0-9]+)\\).+?$', '\\1', x)
[1] "5901035" "5933039"

让我们分解第一个表达式 '.+\\(([0-9]+)\\).+'

Lets break down the syntax of that first expression '.+\\(([0-9]+)\\).+'

  • .+ 一项或多项
  • \\( 括号是正则表达式中的特殊字符,所以如果我想表示实际的东西 ( 我需要用 \.我必须为 R 再次转义它(因此有两个 \s).

  • .+ one or more of anything
  • \\( parentheses are special characters in a regular expression, so if I want to represent the actual thing ( I need to escape it with a \. I have to escape it again for R (hence the two \s).

([0-9]+) 我提到了特殊字符,这里我用了两个.第一个是括号,表示我想保留的组.第二个 [] 围绕着事物的组.有关详细信息,请参阅 ?regex.

([0-9]+) I mentioned special characters, here I use two. the first is the parentheses which indicate a group I want to keep. The second [ and ] surround groups of things. see ?regex for more information.

?$ 最后一部分确保我正在获取注释中指出的括号中的最后一组数字.

?$ The final piece assures that I am grabbing the LAST set of numbers in parens as noted in the comments.

我也可以使用 * 而不是 . 这意味着 0 或多个而不是一个或多个 i,以防您的括号字符串出现在 a 的开头或结尾字符串.

I could also use * instead of . which would mean 0 or more rather than one or more i in case your paren string comes at the beginning or end of a string.

gsub 的第二部分是我替换第一部分的内容.我用过:\\1.这表示使用组 1(上面的 ( ) 中的内容.我需要再次对其进行两次转义,一次用于正则表达式,一次用于 R.

The second piece of the gsub is what I am replacing the first portion with. I used: \\1. This says use group 1 (the stuff inside the ( ) from above. I need to escape it twice again, once for the regex and once for R.

当然像泥一样清澈!享受您的数据处理项目!

Clear as mud to be sure! Enjoy your data munging project!

这篇关于提取字符串中括号之间的数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆