在 R 中提取括号中的文本 [英] Extract text in parentheses in R

查看:56
本文介绍了在 R 中提取括号中的文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

两个相关的问题.我有文本数据的向量,例如

Two related questions. I have vectors of text data such as

"a(b)jk(p)"  "ipq"  "e(ijkl)"

并希望轻松地将其分成包含括号外文本的向量:

and want to easily separate it into a vector containing the text OUTSIDE the parentheses:

"ajk"  "ipq"  "e"

和一个包含括号内文本的向量:

and a vector containing the text INSIDE the parentheses:

"bp"   ""  "ijkl"

有没有什么简单的方法可以做到这一点?一个额外的困难是这些可能会变得非常大并且有大量(无限)括号.因此,我不能简单地在括号中预先/发布"文本并需要一个更智能的解决方案.

Is there any easy way to do this? An added difficulty is that these can get quite large and have a large (unlimited) number of parentheses. Thus, I can't simply grab text "pre/post" the parentheses and need a smarter solution.

推荐答案

括号外的文字

> x <- c("a(b)jk(p)"  ,"ipq" , "e(ijkl)")
> gsub("\\([^()]*\\)", "", x)
[1] "ajk" "ipq" "e"  

括号内的文字

> x <- c("a(b)jk(p)"  ,"ipq" , "e(ijkl)")
> gsub("(?<=\\()[^()]*(?=\\))(*SKIP)(*F)|.", "", x, perl=T)
[1] "bp"   ""     "ijkl"

(?<=\\()[^()]*(?=\\)) 匹配括号内的所有字符,然后匹配以下 (*SKIP)(*F) 使匹配失败.现在它尝试针对剩余的字符串执行 to | 符号之后的模式.所以点 . 匹配所有尚未跳过的字符.用空字符串替换所有匹配的字符将只给出球拍内的文本.

The (?<=\\()[^()]*(?=\\)) matches all the characters which are present inside the brackets and then the following (*SKIP)(*F) makes the match to fail. Now it tries to execute the pattern which was just after to | symbol against the remaining string. So the dot . matches all the characters which are not already skipped. Replacing all the matched characters with an empty string will give only the text present inside the rackets.

> gsub("\\(([^()]*)\\)|.", "\\1", x, perl=T)
[1] "bp"   ""     "ijkl"

此正则表达式将捕获括号内的所有字符并匹配所有其他字符.|. 或 part 有助于匹配除捕获字符以外的所有剩余字符.因此,通过用组索引 1 中存在的字符替换所有字符,将为您提供所需的输出.

This regex would capture all the characters which are present inside the brackets and matches all the other characters. |. or part helps to match all the remaining characters other than the captured ones. So by replacing all the characters with the chars present inside the group index 1 will give you the desired output.

这篇关于在 R 中提取括号中的文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆