R RegEx:匹配方括号内的所有双引号(“)字符 [英] R RegEx: Match all double-quote (") characters inside square brackets

查看:126
本文介绍了R RegEx:匹配方括号内的所有双引号(“)字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在努力获取一个正则表达式,该表达式与方括号内出现的所有双引号字符(")匹配.

我有不同的作品来完成我想要的部分.例如,

gsub('"', "", '"""xyz"""')
[1] "xyz"

将获得所有双引号,而不管其他任何事情.

gsub('\\[(.*?)\\]', "", '[xyz][][][]abc')
[1] "abc"

将所有内容都放在两个方括号内,包括方括号本身(我不想发生的情况–如何避免这种情况?).

我也不确定一旦将它们都工作后如何将两者结合起来.这是所需行为的一个示例.给定输入字符串["cats", "dogs"]"x",我想要一个表达式,该表达式将替换方括号内的四个"字符,而不替换方括号内的四个"字符.

更具体地说:

gsub('THE_REGEX', "", '["cats", "dogs"]"x"')

应该返回

[cats, dogs]"x"

当双引号出现在方括号内时,我想删除双引号,但是当双引号出现在方括号内时,我不想删除.

解决方案

基于\G的模式可确保匹配项之间保持连续性,并且始终位于方括号之间:

gsub('(?:\\G(?!\\A)|\\[)[^]"]*\\K"', "", '["cats", "dogs"]"x"', perl=TRUE)

或者,如果您要检查右方括号是否存在:

gsub('(?:\\G(?!\\A)|\\[(?=[^][]*]))[^]"]*\\K"', "", '["cats", "dogs"]"x"', perl=TRUE)

\G锚匹配正则表达式引擎到达的最后一个位置,因此可以使用它来确保匹配之间的连续性.

两个模式以交替方式开始.一个分支用于第一个匹配项(第二个匹配项)并找到左方括号,然后[^]"]*到达最后一个不是引号的字符或右方括号. \K标记您希望从匹配结果中返回字符的位置(这就是为什么之前的所有内容都不会被擦除的原因).以\G开头的另一个分支用于下一个匹配项(仅紧接在前一个之后).由于[^]"]*禁止使用方括号,因此您无法脱离方括号.如果没有更多的引号替换模式失败,则正则表达式引擎将转到下一个字符,依此类推,直到第二个分支再次成功(如果找到了方括号).

注意:即使这种方式不需要依赖关系,也要记住,与在Grothendieck上对括号之间的完整内容进行匹配时应用回调函数相比,(很远)它不那么容易理解. /p>


关于我的评论中的两个极端情况,我认为最好的解决方案是当引号位于方括号内时,使引号包含一个封闭的方括号:) that occur within square brackets.

I have different pieces that do parts of what I want. For example,

gsub('"', "", '"""xyz"""')
[1] "xyz"

Will get all double-quotes, irrespective of anything else.

gsub('\\[(.*?)\\]', "", '[xyz][][][]abc')
[1] "abc"

Will get everything inside two square brackets, including the brackets themselves (which I do not want to happen -- how do I avoid that?).

I'm also not sure how to combine the two once I have them each working. Here's an example of the desired behavior. Given an input string ["cats", "dogs"]"x", I want an expression that will replace the four " characters inside of the square brackets, but not the ones outside.

To be more specific:

gsub('THE_REGEX', "", '["cats", "dogs"]"x"')

should return

[cats, dogs]"x"

I want to remove double-quotes when they occur inside of square brackets, but not when they occur outside of square brackets.

解决方案

A \G based pattern ensures contiguity between matches and that you are always between square brackets:

gsub('(?:\\G(?!\\A)|\\[)[^]"]*\\K"', "", '["cats", "dogs"]"x"', perl=TRUE)

Or if you want to check that the closing square bracket exists:

gsub('(?:\\G(?!\\A)|\\[(?=[^][]*]))[^]"]*\\K"', "", '["cats", "dogs"]"x"', perl=TRUE)

The \G anchor matches the last position reached by the regex engine, this is why it can be used to ensure contiguity between matches.

The two patterns start with an alternation. One branch is used for the first match (the second one) and find the opening square bracket, then [^]"]* reaches the last character that isn't a quote or the closing square bracket. \K marks the position from which you want the characters to be returned from match result (that's why all that comes before isn't erased). The other branch that starts with \G is used for the next matches (immediately after the previous only). Since [^]"]* forbids the closing square bracket, you can't get out of the square brackets. When there's no more quotes to replace the pattern fails, the regex engine goes to the next character and so on until the second branch succeeds again (if an opening square bracket is found).

Notice: even if this way doesn't need a dependency, keep in mind that it is (from far) less simple to understand than applying a callback function on a match of the complete content between brackets as Grothendieck does it.


About the two edge cases in my comment, I think the best solution is to keep quotes that contains a closing square bracket when they are inside square brackets: https://regex101.com/r/SOMpqN/1

这篇关于R RegEx:匹配方括号内的所有双引号(“)字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆