正则表达式拆分字符串保留引号 [英] Regex split string preserving quotes

查看:36
本文介绍了正则表达式拆分字符串保留引号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要根据空格作为分隔符来拆分如下所示的字符串.但引号内的任何空格都应保留.

I need to split a string like the one below, based on space as the delimiter. But any space within a quote should be preserved.

research library "not available" author:"Bernard Shaw"

research
library
"not available"
author:"Bernard Shaw"

我正在 C Sharp 中尝试这样做,我有这个正则表达式:@"(?<="")|w[ws]*(?="")|w+|""[ws]*""" 来自 SO 中的另一篇文章,它将字符串拆分为

I am trying to do this in C Sharp, I have this Regex: @"(?<="")|w[ws]*(?="")|w+|""[ws]*""" from another post in SO, which splits the string into

research
library
"not available"
author
"Bernard Shaw"

不幸的是,这不符合我的确切要求.

which unfortunately does not meet my exact requirements.

我正在寻找任何可以解决问题的正则表达式.

I'm looking for any Regex, that would do the trick.

感谢任何帮助.

推荐答案

只要在引用的字符串中不能有转义的引用,以下应该可以工作:

As long as there can be no escaped quoted inside quoted strings, the following should work:

splitArray = Regex.Split(subjectString, "(?<=^[^"]*(?:"[^"]*"[^"]*)*) (?=(?:[^"]*"[^"]*")*[^"]*$)");

此正则表达式仅在空格字符前后有偶数个引号时才会拆分.

This regex splits on space characters only if they are preceded and followed by an even number of quotes.

没有所有那些转义引号的正则表达式,解释:

The regex without all those escaped quotes, explained:

(?<=      # Assert that it's possible to match this before the current position (positive lookbehind):
 ^        # The start of the string
 [^"]*    # Any number of non-quote characters
 (?:      # Match the following group...
  "[^"]*  # a quote, followed by any number of non-quote characters
  "[^"]*  # the same
 )*       # ...zero or more times (so 0, 2, 4, ... quotes will match)
)         # End of lookbehind assertion.
[ ]       # Match a space
(?=       # Assert that it's possible to match this after the current position (positive lookahead):
 (?:      # Match the following group...
  [^"]*"  # see above
  [^"]*"  # see above
 )*       # ...zero or more times.
 [^"]*    # Match any number of non-quote characters
 $        # Match the end of the string
)         # End of lookahead assertion

这篇关于正则表达式拆分字符串保留引号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆