在 R 中提取括号中的文本 [英] Extract text in parentheses in R
问题描述
两个相关的问题.我有文本数据的向量,例如
Two related questions. I have vectors of text data such as
"a(b)jk(p)" "ipq" "e(ijkl)"
并希望轻松地将其分成包含括号外文本的向量:
and want to easily separate it into a vector containing the text OUTSIDE the parentheses:
"ajk" "ipq" "e"
和一个包含括号内文本的向量:
and a vector containing the text INSIDE the parentheses:
"bp" "" "ijkl"
有没有什么简单的方法可以做到这一点?一个额外的困难是这些可能会变得非常大并且有大量(无限)括号.因此,我不能简单地在括号中预先/发布"文本并需要一个更智能的解决方案.
Is there any easy way to do this? An added difficulty is that these can get quite large and have a large (unlimited) number of parentheses. Thus, I can't simply grab text "pre/post" the parentheses and need a smarter solution.
推荐答案
括号外的文字
> x <- c("a(b)jk(p)" ,"ipq" , "e(ijkl)")
> gsub("\\([^()]*\\)", "", x)
[1] "ajk" "ipq" "e"
括号内的文字
> x <- c("a(b)jk(p)" ,"ipq" , "e(ijkl)")
> gsub("(?<=\\()[^()]*(?=\\))(*SKIP)(*F)|.", "", x, perl=T)
[1] "bp" "" "ijkl"
(?<=\\()[^()]*(?=\\))
匹配括号内的所有字符,然后匹配以下 (*SKIP)(*F)
使匹配失败.现在它尝试针对剩余的字符串执行 to |
符号之后的模式.所以点 .
匹配所有尚未跳过的字符.用空字符串替换所有匹配的字符将只给出球拍内的文本.
The (?<=\\()[^()]*(?=\\))
matches all the characters which are present inside the brackets and then the following (*SKIP)(*F)
makes the match to fail. Now it tries to execute the pattern which was just after to |
symbol against the remaining string. So the dot .
matches all the characters which are not already skipped. Replacing all the matched characters with an empty string will give only the text present inside the rackets.
> gsub("\\(([^()]*)\\)|.", "\\1", x, perl=T)
[1] "bp" "" "ijkl"
此正则表达式将捕获括号内的所有字符并匹配所有其他字符.|.
或 part 有助于匹配除捕获字符以外的所有剩余字符.因此,通过用组索引 1 中存在的字符替换所有字符,将为您提供所需的输出.
This regex would capture all the characters which are present inside the brackets and matches all the other characters. |.
or part helps to match all the remaining characters other than the captured ones. So by replacing all the characters with the chars present inside the group index 1 will give you the desired output.
这篇关于在 R 中提取括号中的文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!