正则表达式替换,但仅在两种模式之间 [英] Regex replace, but only between two patterns

查看:48
本文介绍了正则表达式替换,但仅在两种模式之间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好的,我正在尝试对多行字符串进行清理.

Ok, I have a multi-line string I'm trying to do some clean-up on.

每一行可能是也可能不是大段引用文本的一部分.示例:

Each line may or may not be part of a big block of quoted text. Example:

This line is not quoted.
This part of the line is not quoted "but this is."
This one is not quoted either.
"This entire line is quoted"
Not quoted.
"This line is quoted
and so is this one
and so is this one."
This is not quoted "but this is
and so is this."

我需要一个RegEx替换项,该替换项将解开硬包装的引号行,即用空格替换"\ r \ n",但仅在卷曲引号之间.

I need a RegEx replacement that will un-wrap the hard-wrapped quoted lines, i.e., replace "\r\n" with a space, but only between the curly quotes.

这是更换后的外观:

This line is not quoted.
This part of the line is not quoted "but this is."
This one is not quoted either.
"This entire line is quoted"
Not quoted.
"This line is quoted and so is this one and so is this one."
This is not quoted "but this is and so is this."

(请注意,最后两行在输入文本中是多行.)

(Note how the last two lines were multiple lines in the input text.)

约束

  • 理想情况下需要一个正则表达式替换调用
  • 使用.NET RegEx库
  • 引号始终是 开头/结尾的卷曲引号,而不是普通的双引号(),这应该使它更容易一些.
  • Ideally need a single Regex replace call
  • Using .NET RegEx library
  • The quotes are always start/end curly quotes, not plain ol' double-ticks ("), which should make this a little easier.

重要约束

这不是直接的.NET代码,我正在填充"searchfor/replacewith"字符串表,然后通过RegEx.Replace对其进行调用.我无法添加自定义代码,例如匹配评估程序",遍历捕获的组等.

This is not direct .NET code, I'm populating a table of "searchfor/replacewith" strings that are then called via RegEx.Replace. I don't have the ability to add custom code like Match Evaluators, looping through captured groups, etc.

目前的答案,大致如下:

Current answer so far, something along the lines of:

r.Replace("(?<=")\r\n(?=")", " ")

很明显,我还没有关闭.

Obviously, I'm not even close yet.

可以将相同的逻辑应用于编程代码中的块注释的颜色编码-块注释内的任何内容都不能与注释外的内容以相同的方式对待.(代码有点棘手,因为开始/结束块注释定界符也可以合法地存在于文字字符串中,这是我在这里不必处理的问题.)

The same logic could be applied to, say, color-coding of block comments in programming code--anything inside the block comment is not treated the same way as the stuff outside the comments. (Code is a little trickier since start/end block comment delimiters can also legitimately exist within a literal string, an issue I don't have to deal with here.)

推荐答案

假定所有卷曲的引号都适当平衡,则此正则表达式应执行您想要的操作:

Assuming all curly quotes are properly balanced, this regex should do what you want:

@"[\r\n]+(?=[^""]*")"

[\ r \ n] + 将匹配一个或多个任何类型的行分隔符-Unix(\ n),DOS(\ r \ n)或更旧的Mac(\ r).然后,先行者断言前面有一个右引号,并且在此之间没有开引号.然后,您的替换文本可以是一个简单的空格字符.

The [\r\n]+ will match one or more line separators of any type--Unix (\n), DOS (\r\n) or older Mac (\r). Then the lookahead asserts that there's a close-quote ahead and that there's no open-quote between here and there. Then your replacement text can be a simple space character.

这篇关于正则表达式替换,但仅在两种模式之间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆