正则表达式字边界表达式 [英] Regex word boundary expressions

查看:172
本文介绍了正则表达式字边界表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

比方说我有以下字符串一二(三)(三级)三四五我要替换(三)(4件),但不言中。 ?我会怎么做。

Say for example I have the following string "one two(three) (three) four five" and I want to replace "(three)" with "(four)" but not within words. How would I do it?

基本上我想要做一个正则表达式替换与以下字符串结束:

Basically I want to do a regex replace and end up with the following string:

"one two(three) (four) four five"

我曾尝试下面的正则表达式,但它不工作:

I have tried the following regex but it doesn't work:

@"\b\(three\)\b"

基本上我写一些搜索和替换代码,我给用户平时选择匹配的情况下,全字匹配等。在这种情况下,用户已经选择全字匹配,但我不知道要搜索什么样的文本会。

Basically I am writing some search and replace code and am giving the user the usual options to match case, match whole word etc. In this instance the user has chosen to match whole words but I don't know what the text being searched for will be.

推荐答案

您的问题从什么 \b ,其实就是一种误解造成的。诚然,这不是很明显。

Your problem stems from a misunderstanding of what \b actually means. Admittedly, it is not obvious.

究其原因 \b\(three\)\b 不匹配的输入字符串的三分球是以下内容:

The reason \b\(three\)\b doesn’t match the threes in your input string is the following:


  • \b 意思是:一个的文字字符之间的边界的和的非文字字符

  • 字母(如AZ)被认为是字字符

  • 标点符号,如被认为的非单词字符

  • \b means: the boundary between a word character and a non-word character.
  • Letters (e.g. a-z) are considered word characters.
  • Punctuation marks such as ( are considered non-word characters.

下面是你再次输入字符串,伸出了一下,我已经标志着地方 \b 匹配:

Here is your input string again, stretched out a bit, and I’ve marked the places where \b matches:

 o n e   t w o ( t h r e e )   ( t h r e e )   f o u r   f i v e
↑     ↑ ↑     ↑ ↑         ↑     ↑         ↑   ↑       ↑ ↑       ↑

正如你可以在这里看到,有一个 \b 两个和(三),但在此之前没有第二(三)​​。

As you can see here, there is a \b between "two" and "(three)", but not before the second "(three)".

这个故事的寓意是什么?全词搜索并没有真正多大意义,如果你正在寻找不只是一个字(字母串)。既然你在搜索字符串中有标点字符(括号),它并不像这样一个字。如果你搜索仅包括的字字符的,那么 \b 会做你所期望的。

The moral of the story? "Whole-word search" doesn’t really make much sense if what you’re searching for is not just a word (a string of letters). Since you have punctuation characters (parentheses) in your search string, it is not as such a "word". If you searched for a word consisting only of word characters, then \b would do what you expect.

可以,当然,使用不同的正则表达式,只有当它用空格包围或字符串的开头或结尾时匹配字符串:

You can, of course, use a different Regex to match the string only if it surrounded by spaces or occurs at the beginning or end of the string:

(^|\s)\(three\)(\s|$)

但是,该问题是,当然,如果搜索三(不括号),它将找不到一个在(三)因为它没有它周围的空间,尽管它实际上是一个完整的单词。

However, the problem with this is, of course, that if you search for "three" (without the parentheses), it won’t find the one in "(three)" because it doesn’t have spaces around it, even though it is actually a whole word.

我想大多数文本编辑器(包括Visual Studio中)将使用 \b 仅在搜索字符串实际开始和/或一个单词字符结尾:

I think most text editors (including Visual Studio) will use \b only if your search string actually starts and/or ends with a word character:

var pattern = Regex.Escape(searchString);
if (Regex.IsMatch(searchString, @"^\w"))
    pattern = @"\b" + pattern;
if (Regex.IsMatch(searchString, @"\w$"))
    pattern = pattern + @"\b";



这样他们就发现(三),即使您选择了全字匹配。

That way they will find "(three)" even if you select "whole words only".

这篇关于正则表达式字边界表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆