正则表达式表达不符合预期 [英] Regex expression not behaving as expected

查看:76
本文介绍了正则表达式表达不符合预期的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想测试字符串是否包含单词。
所以,我有这个正则表达式: / \bde\b / gi



,如果我的字符串是Comidadecão,它可以工作。



但是,如果我有一个像Necessidade de adeus depois这样的字符串,它也匹配necessidade,adeus和depois中的de。 / p>

此外,当我尝试在像éavida这样的字符串中匹配带有重音的单词时,使用这样的正则表达式: / \bé \b / gi 找不到任何内容。
但是,如果我在中间搜索一个带有重音的单词,它就会被找到!所以在字符串Onível中,如果我使用以下正则表达式 /\bnível\b/ gi
它匹配正确的单词。



我一直在寻找类似的问题,但我仍然无法解决我的问题。



顺便说一下,这里第一个问题没有发生,它按预期工作。



谢谢!



编辑:添加了我的代码

  var myRe = new RegExp(\\b+ query +\\ b,iu) ; 
var match = myRe.test(Necessidade de adeus depois);


解决方案

最接近我找到的工作物是这个。就像我的评论中所说的那样,单词边界和unicode字符似乎存在问题。



我认为这个解决方案可以改进,但它使用了一个积极的前瞻(即不消耗字符)来测试是否开始 ^ 或结束 $ 的字符串,或者如果不是一个字字符:

  //作为单词结尾重音或开始
/(?= ^ | \ W)é (?= $ | \ W)/ giu

//没有重音作为单词结束或开始
/\bnível\b/ giu

编辑:是的,这是真的,不适用于多个字符..如果你可以测试你要测试的长度,你可以如果你搜索1个或多个字符仍然会有不同的情况



EDIT2:实际上最后编辑是错误的。它不依赖于长度,但重音字符是否接近边界。所以对于éternel和 / \\\ n来说,它是 /(?= ^ | \ W)éternel\b/ giu = $ | \ W)/ giu forné



更新的正则表达式示例: https://regex101.com/r/6v2gId/3



EDIT3:a我尝试过的一个小例子,回答你的最后评论:



  var query ='de'; var myRe = new RegExp(\\b+ query +\\ b,giu); var match = myRe.test(determinado de necessidade decomeréde ); document.getElementById('res1')。innerHTML = match; var match = myRe.test(determinado necessidadecomerée); document.getElementById('res2')。innerHTML = match; var query ='dé' ; var myRe = new RegExp(\\b+ query +(?= $ | \\ W),giu); var mat ch = myRe.test(déterminadodeditsadadédecomeréde); document.getElementById('res3')。innerHTML = match; var match = myRe.test(déterminadoinsididadécomeréde); document.getElementById( 'res4')。innerHTML = match;  

 < span> ;使用\\\\\\\进行测试:< / span>< br />< span> fordeterminado de necessidade decomeréde:< / span>< span id =res1>< / span>< br />< span> fordeterminado necessidadecomerée:< / span>< span id =res2>< / span> ;< br />< br />< span>使用\\\\\\\\\\\\\\\&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& ; span> fordéterminadodEDeadidadédecomeréde:< / span>< span id =res3>< / span>< br />< span> fordéterminadoinsididadécomeréé de:< / span>< span id =res4>< / span>  


I want to test if a string contains a word or not. So, I have this regex expression:/\bde\b/gi

And, if my string is "Comida de cão", it works.

But, if I have a string like "Necessidade de adeus depois " it also matches the "de" in "necessidade", "adeus" and "depois".

Besides, when I try to match words with accents in a string like "é a vida", using the regex like this: /\bé\b/gi nothing is found. But if I search for a word with an accent in the middle it is found! So in the string "O nível" if I use the following regex expression /\bnível\b/gi it matches the right word.

I've been searching similar issues but I still didn't manage to solve my problem.

Btw, here the first issue doesn't happen and it works as expected.

Thanks!

Edit: Added my code

var myRe = new RegExp("\\b" + query + "\\b","iu");
var match = myRe.test("Necessidade de adeus depois");

解决方案

The closest to a working thing that I have found is this. Like stated in my comment, there seem to be a problem with word boundaries and unicode characters.

This solution can be improved i think, but it uses a positive lookahead (that doesn't consume the characters) to test either if start ^ or end $ of string, or if not a word character:

//accent as a word end or start
/(?=^|\W)é(?=$|\W)/giu

//no accent as a word end or start
/\bnível\b/giu

EDIT: yes that's true, does not work with multiple chars.. if you can test the length of what you want to test, you can still make different cases depending if you search for 1 or multiple chars

EDIT2: actually last edit is wrong. It doesn't depend on the length but if the accented char is near the boundary or not. so it would be /(?=^|\W)éternel\b/giu for "éternel" and /\bné(?=$|\W)/giu for "né"

updated regex example: https://regex101.com/r/6v2gId/3

EDIT3: a little example of what i tried, to answer your last comment:

var query = 'de';
var myRe = new RegExp("\\b" + query + "\\b","giu");
var match = myRe.test("determinado de necessidade de comer é de");
document.getElementById('res1').innerHTML = match;
var match = myRe.test("determinado necessidade comer é e");
document.getElementById('res2').innerHTML = match;
var query = 'dé';
var myRe = new RegExp("\\b" + query + "(?=$|\\W)","giu");
var match = myRe.test("déterminado dé necessidadé de comer é de");
document.getElementById('res3').innerHTML = match;
var match = myRe.test("déterminado necessidadé comer é de");
document.getElementById('res4').innerHTML = match;

<span>test with "\\bde\\b":</span><br/>
<span>for "determinado de necessidade de comer é de":</span><span id="res1"></span><br/>
<span>for "determinado necessidade comer é e":</span><span id="res2"></span><br/><br/>
<span>test with "\\bdé(?=$|\\W)":</span><br/>
<span>for "déterminado dé necessidadé de comer é de":</span><span id="res3"></span><br/>
<span>for "déterminado necessidadé comer é de":</span><span id="res4"></span>

这篇关于正则表达式表达不符合预期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆