正则表达式拆分字符串但保留定界符,但不作为单独的元素 [英] Regex to split a string but keep delimiters, but not as separate elements

查看:70
本文介绍了正则表达式拆分字符串但保留定界符,但不作为单独的元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要分割以下字符串

the quick brown fox jumps over the lazy dog

插入以下令牌:

  1. the
  2. 快速的棕色狐狸跳过
  3. 懒狗

所以要解释一下,我想对 the 进行拆分,但在前面的数组元素(而不是作为其自己的单独元素)中包括 the 分隔符.

So to explain, I want to split on the but include the the delimiter in the preceding array element (not as its own, separate element).

任何人都可以阐明这一点,或者给我正确的正则表达式吗?

Can anyone shed any light on this or perhaps give me the correct regex?

我正在使用C#.

推荐答案

您需要使用后向(?< = ).名称说明了一切,请查看前面的字符以查看它们是否匹配某些给定的模式.

You need to use look-behind (?<=). The name says it all, look at the previous characters to see if they match some given pattern.

这应该有效:

"(?<=\\bthe) "

因此,请在任意位置检查前面的字符是否为"the" ,如果匹配,则匹配.

So, at any space, check if the previous characters were "the", if so, it matches.

注意-我们还需要包含单词边界 \\ b (转义的 \ b ),否则应使用"bathe" 也将匹配.

Note - We also need to include the word boundary \\b (escaped \b) other-wise something like "bathe" will also match.

我们不会检查所有空格,

Without the look-behind, we'll check all the spaces:

   v     v     v   v     v    v   v    v
the quick brown fox jumps over the lazy dog

通过后面的查找,我们将只匹配在其前面具有"the" 的那些对象:(暂时忽略 \\ b )

With the look-behind, we'll only match those the have "the" before it: (ignoring the \\b for now)

"the" -刚发现一个空格,最后一个字符是"the" ,所以匹配.
"quick" -刚刚找到了另一个空格,但最后一个字符为"... k" ,因此没有匹配项.

"the " - just found a space, and last characters are "the", so match.
"quick " - just found another space, but last characters are "...k", so no match.
etc.

测试.

这篇关于正则表达式拆分字符串但保留定界符,但不作为单独的元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆