如何查找和替换特定字符但前提是它在引号中? [英] How to find and replace a particular character but only if it is in quotes?

查看:57
本文介绍了如何查找和替换特定字符但前提是它在引号中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题:我有数以千计的文档,其中包含我不想要的特定字符.例如.字符 a.这些文档包含各种字符,但我要替换的 a 位于双引号或单引号内.

我想找到并替换它们,我认为需要使用正则表达式.我正在使用 VSCode,但我愿意接受任何建议.

我的尝试:我能够找到以下正则表达式来匹配包含 () 中的值的特定字符串.

".*?(r).*?"

但是,这只会突出显示整个引用.我只想突出显示字符.

欢迎任何解决方案,也许在正则表达式之外.

示例结果:给定字符为a,找到替换为b

有人曾经告诉我苹果"对你有好处 => 有人曾经告诉我bpples"对你有好处

"Aardvarks" 做的好烤肉串 => "Abrdvbrks" 做的好烤肉串

男孩说啊啊啊!"当他妈妈告诉他他正在吃土豚 => 男孩说bbbh!"当他妈妈告诉他他在吃土豚时

解决方案

Visual Studio Code

VS Code 使用 JavaScript RegEx 引擎来实现查找/替换功能.这意味着与 .NET 或 PCRE 等其他风格相比,您在使用正则表达式方面非常有限.

幸运的是,这种风格支持前瞻,通过前瞻,您可以寻找但不能消费字符.因此,确保我们在带引号的字符串中的一种方法是在匹配 a 后查找直到文件/主题字符串底部的引号数为奇数:

a(?=[^"]*"[^"]*(?:"[^"]*"[^"]*)*$)

三步法

最后但并非最不重要的...

匹配引号内的字符很棘手,因为分隔符完全相同,因此如果不查看相邻字符串,就无法区分开始和结束标记.您可以做的是将分隔符更改为其他内容,以便您稍后查找.

第 1 步:

搜索:"[^"\\]*(?:\\.[^"\\]*)*"

替换为:$0Я

第 2 步:

搜索:a(?=[^"\\]*(?:\\.[^"\\]*)*"Я)

替换为您期望的任何内容.

第 3 步:

搜索:

用空替换以恢复所有内容.


Problem: I have thousands of documents which contains a specific character I don't want. E.g. the character a. These documents contain a variety of characters, but the a's I want to replace are inside double quotes or single quotes.

I would like to find and replace them, and I thought using Regex would be needed. I am using VSCode, but I'm open to any suggestions.

My attempt: I was able to find the following regex to match for a specific string containing the values inside the ().

".*?(r).*?"

However, this only highlights the entire quote. I want to highlight the character only.

Any solution, perhaps outside of regex, is welcome.

Example outcomes: Given, the character is a, find replace to b

Somebody once told me "apples" are good for you => Somebody once told me "bpples" are good for you

"Aardvarks" make good kebabs => "Abrdvbrks" make good kebabs

The boy said "aaah!" when his mom told him he was eating aardvark => The boy said "bbbh!" when his mom told him he was eating aardvark

解决方案

Visual Studio Code

VS Code uses JavaScript RegEx engine for its find / replace functionality. This means you are very limited in working with regex in comparison to other flavors like .NET or PCRE.

Lucky enough that this flavor supports lookaheads and with lookaheads you are able to look for but not consume character. So one way to ensure that we are within a quoted string is to look for number of quotes down to bottom of file / subject string to be odd after matching an a:

a(?=[^"]*"[^"]*(?:"[^"]*"[^"]*)*$)

Live demo

This looks for as in a double quoted string, to have it for single quoted strings substitute all "s with '. You can't have both at a time.

There is a problem with regex above however, that it conflicts with escaped double quotes within double quoted strings. To match them too if it matters you have a long way to go:

a(?=[^"\\]*(?:\\.[^"\\]*)*"[^"\\]*(?:\\.[^"\\]*)*(?:"[^"\\]*(?:\\.[^"\\]*)*"[^"\\]*(?:\\.[^"\\]*)*)*$)

Applying these approaches on large files probably will result in an stack overflow so let's see a better approach.

I am using VSCode, but I'm open to any suggestions.

That's great. Then I'd suggest to use awk or sed or something more programmatic in order to achieve what you are after or if you are able to use Sublime Text a chance exists to work around this problem in a more elegant way.

Sublime Text

This is supposed to work on large files with hundred of thousands of lines but care that it works for a single character (here a) that with some modifications may work for a word or substring too:

Search for:

(?:"|\G(?<!")(?!\A))(?<r>[^a"\\]*+(?>\\.[^a"\\]*)*+)\K(a|"(*SKIP)(*F))(?(?=((?&r)"))\3)
                           ^              ^            ^

Replace it with: WHATEVER\3

Live demo

RegEx Breakdown:

(?: # Beginning of non-capturing group #1
    "   # Match a `"`
    |   # Or
    \G(?<!")(?!\A)  # Continue matching from last successful match
                    # It shouldn't start right after a `"`
)   # End of NCG #1
(?<r>   # Start of capturing group `r`
    [^a"\\]*+   # Match anything except `a`, `"` or a backslash (possessively)
    (?>\\.[^a"\\]*)*+   # Match an escaped character or 
                        # repeat last pattern as much as possible
)\K     # End of CG `r`, reset all consumed characters
(   # Start of CG #2 
    a   # Match literal `a`
    |   # Or
    "(*SKIP)(*F)    # Match a `"` and skip over current match
)
(?(?=   # Start a conditional cluster, assuming a positive lookahead
    ((?&r)")    # Start of CG #3, recurs CG `r` and match `"`
  )     # End of condition
  \3    # If conditional passed match CG #3
 )  # End of conditional

Three-step approach

Last but not least...

Matching a character inside quotation marks is tricky since delimiters are exactly the same so opening and closing marks can not be distinguished from each other without taking a look at adjacent strings. What you can do is change a delimiter to something else so that you can look for it later.

Step 1:

Search for: "[^"\\]*(?:\\.[^"\\]*)*"

Replace with: $0Я

Step 2:

Search for: a(?=[^"\\]*(?:\\.[^"\\]*)*"Я)

Replace with whatever you expect.

Step 3:

Search for:

Replace with nothing to revert every thing.


这篇关于如何查找和替换特定字符但前提是它在引号中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆