使用常规的前pression查找引用的字符串与转义引号在C# [英] Finding quoted strings with escaped quotes in C# using a regular expression

查看:101
本文介绍了使用常规的前pression查找引用的字符串与转义引号在C#的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图找到所有引用文字的一行。

I'm trying to find all of the quoted text on a single line.

例如:

"Some Text"
"Some more Text"
"Even more text about \"this text\""

我需要:


  • 一些文本

  • 一些文字

  • 关于更文字\\本文\\

  • "Some Text"
  • "Some more Text"
  • "Even more text about \"this text\""

\\[^ \\\\ R] * \\给我的一切,除了最后一个,因为转义引号的。

\"[^\"\r]*\" gives me everything except for the last one, because of the escaped quotes.

我读过关于 \\[^ \\\\\\] *(?:\\\\ [^ \\\\\\] *)* \\工作,但我在运行时出现错误:

I have read about \"[^\"\\]*(?:\\.[^\"\\]*)*\" working, but I get an error at run time:

parsing ""[^"\]*(?:\.[^"\]*)*"" - Unterminated [] set.

我该如何解决这个问题?

How do I fix this?

推荐答案

什么你得有Friedl的展开循环技术的一个例子,但你似乎对如何前preSS它的一些困惑作为一个字符串。下面是它应该如何看的正则表达式编译:

What you've got there is an example of Friedl's "unrolled loop" technique, but you seem to have some confusion about how to express it as a string literal. Here's how it should look to the regex compiler:

"[^"\\]*(?:\\.[^"\\]*)*"

最初的[^\\\\] * 匹配一个引号后面的比引号或反斜杠其他任何字符零个或多个。单单那部分,最后,将匹配一个简单的引用字符串没有嵌入转义序列,如

The initial "[^"\\]* matches a quotation mark followed by zero or more of any characters other than quotation marks or backslashes. That part alone, along with the final ", will match a simple quoted string with no embedded escape sequences, like "this" or "".

如果它的确实的遇到一个反斜杠, \\\\。消耗反斜线和任何遵循它,而 [^ \\\\] * (再次)都消耗到下一个反斜杠或引号,这部分被作为必要重复多次,直到转义引号轮番上涨(或到达的结束字符串匹配尝试失败)。

If it does encounter a backslash, \\. consumes the backslash and whatever follows it, and [^"\\]* (again) consumes everything up to the next backslash or quotation mark. That part gets repeated as many times as necessary until an unescaped quotation mark turns up (or it reaches the end of the string and the match attempt fails).

请注意,这将匹配福\\ - \\福\\ - 栏。这似乎暴露在正则表达式的一个缺陷,但是它没有;它的的输入的这是无效的。我们的目标是要匹配引用的字符串,可含有反斜杠转义引号,嵌入其他文本 - 为什么会有转义引号的之外的带引号的字符串的?如果你真的需要支持,你有一个更复杂的问题,需要一个非常不同的方法。

Note that this will match "foo\"- in \"foo\"-"bar". That may seem to expose a flaw in the regex, but it doesn't; it's the input that's invalid. The goal was to match quoted strings, optionally containing backslash-escaped quotes, embedded in other text--why would there be escaped quotes outside of quoted strings? If you really need to support that, you have a much more complex problem, requiring a very different approach.

正如我所说的,上面是正则表达式应该如何看的正则表达式编译器。但是你在一个字符串的形式写它,而那些倾向于专门治疗某些字符 - 即,反斜线和引号。幸运的是,C#的逐字字符串,让您不用双反斜线逃逸的麻烦;你只需要逃避与另一个引号每个引号:

As I said, the above is how the regex should look to the regex compiler. But you're writing it in the form of a string literal, and those tend to treat certain characters specially--i.e., backslashes and quotation marks. Fortunately, C#'s verbatim strings save you the hassle of having to double-escape backslashes; you just have to escape each quotation mark with another quotation mark:

Regex r = new Regex(@"""[^""\\]*(?:\\.[^""\\]*)*""");

所以,规则是C#编译器和双反斜线为正则表达式编译器的双引号 - 好和容易。这种特殊的正则表达式可能看起来有点别扭,在两端三个引号,但考虑替代方案:

So the rule is double quotation marks for the C# compiler and double backslashes for the regex compiler--nice and easy. This particular regex may look a little awkward, with the three quotation marks at either end, but consider the alternative:

Regex r = new Regex("\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\"");

在Java中,你的总是的有他们写的方式。 : - (

In Java, you always have to write them that way. :-(

这篇关于使用常规的前pression查找引用的字符串与转义引号在C#的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆