正则表达式分割字符串preserving报价 [英] Regex split string preserving quotes

查看:131
本文介绍了正则表达式分割字符串preserving报价的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要拆分类似下面的基础上,空格作为分隔符的字符串。但报价中的任何空间应该是preserved。

 研究图书馆不可用作者:萧伯纳

 研究
图书馆
不可用
作者:萧伯纳

我试图做到这在C夏普,我有这样的正则表达式: @(?< =)| \\ W [\\ W \\ S] *(?=) | \\ W + |[\\ W \\ S] *从另一篇文章中SO,这样会将串入

 研究
图书馆
不可用
作者
萧伯纳

不幸的是不符合我的具体要求。

我在寻找任何正则表达式,即会做的伎俩。

任何帮助AP preciated。


解决方案

只要有可能卷款无报价引号的字符串里面,下面应该工作:

  splitArray = Regex.Split(subjectString(小于= ^ [^ \\?] *(?:\\[^ \\] * \\[^ \\ ?] *)*)(=(:[^ \\] * \\[^ \\] * \\)* [^ \\] * $));

这对正则表达式的空格字符分割,只有当他们是preceded其次为偶数报价。

没有所有这些转义引号正则表达式,解释道:

<?pre> (小于=#断言,这是可能的当前位置(正后向)前符合此:
 ^#字符串的开始
 [^] *#任意数量的非引号字符
 (?:#匹配以下组...
  [^] *#报价,后跟任意数量的非引号字符
  [^] *#同
 )*#...零次或多次(因此0,2,4,...引号将匹配)
)#向后断言结束。
[]#匹配一个空格
?(=#断言,这是可能的当前位置(正向前查找)后,符合此:
 (?:#匹配以下组...
  [^] *#见上
  [^] *#见上
 )*#...零次或多次。
 [^] *#匹配任意数量的非引号字符
 $#匹配字符串的结尾
)#前向断言结束

I need to split a string like the one below, based on space as the delimiter. But any space within a quote should be preserved.

research library "not available" author:"Bernard Shaw"

to

research
library
"not available"
author:"Bernard Shaw"

I am trying to do this in C Sharp, I have this Regex: @"(?<="")|\w[\w\s]*(?="")|\w+|""[\w\s]*""" from another post in SO, which splits the string into

research
library
"not available"
author
"Bernard Shaw"

which unfortunately does not meet my exact requirements.

I'm looking for any Regex, that would do the trick.

Any help appreciated.

解决方案

As long as there can be no escaped quoted inside quoted strings, the following should work:

splitArray = Regex.Split(subjectString, "(?<=^[^\"]*(?:\"[^\"]*\"[^\"]*)*) (?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)");

This regex splits on space characters only if they are preceded and followed by an even number of quotes.

The regex without all those escaped quotes, explained:

(?<=      # Assert that it's possible to match this before the current position (positive lookbehind):
 ^        # The start of the string
 [^"]*    # Any number of non-quote characters
 (?:      # Match the following group...
  "[^"]*  # a quote, followed by any number of non-quote characters
  "[^"]*  # the same
 )*       # ...zero or more times (so 0, 2, 4, ... quotes will match)
)         # End of lookbehind assertion.
[ ]       # Match a space
(?=       # Assert that it's possible to match this after the current position (positive lookahead):
 (?:      # Match the following group...
  [^"]*"  # see above
  [^"]*"  # see above
 )*       # ...zero or more times.
 [^"]*    # Match any number of non-quote characters
 $        # Match the end of the string
)         # End of lookahead assertion

这篇关于正则表达式分割字符串preserving报价的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆