为什么我不能在单词边界旁边使用重音字符? [英] Why can't I use accented characters next to a word boundary?

查看:152
本文介绍了为什么我不能在单词边界旁边使用重音字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试制作一个与人名相匹配的动态正则表达式。它在大多数名字上都没有问题,直到我在名字的末尾遇到重音字符。

I'm trying to make a dynamic regex that matches a person's name. It works without problems on most names, until I ran into accented characters at the end of the name.

示例:一些FancyNamé

Example: Some Fancy Namé

我到目前为止使用的正则表达式是:

The regex I've used so far is:

/\b(Fancy Namé|Namé)\b/i

像这样使用:

"Goal: Some Fancy Namé. Awesome.".replace(/\b(Fancy Namé|Namé)\b/i, '<a href="#">$1</a>');

这根本不匹配。如果我用e替换é,它就匹配得很好。
如果我尝试匹配诸如SomeFancyNaméa这样的名字,它就可以了。
如果我删除单词的最后一个边界锚,它就可以正常工作。

This simply won't match. If I replace the é with a e, it matches just fine. If I try to match a name such as "Some Fancy Naméa", it works just fine. If I remove the word last word boundary anchor, it works just fine.

为什么单词boundary flag不在这里?关于如何解决这个问题的任何建议?

Why doesn't the word boundary flag work here? Any suggestions on how I would get around this problem?

我考虑过使用类似的东西,但我不确定性能惩罚是什么样的:

I have considered using something like this, but I'm not sure what the performance penalties would be like:

"Some fancy namé. Allow me to ellaborate.".replace(/([\s.,!?])(fancy namé|namé)([\s.,!?]|$)/g, '$1<a href="#">$2</a>$3')

建议?想法?

推荐答案

JavaScript的正则表达式实现不支持Unicode。它只知道标准低字节ASCII中的单词字符,它不包括é或任何其他重音或非英语字母。

JavaScript's regex implementation is not Unicode-aware. It only knows the ‘word characters’ in standard low-byte ASCII, which does not include é or any other accented or non-English letters.

因为é不是JS的单词字符,é后跟空格可以永远不会被视为单词边界。 (如果在单词中间使用,它将匹配 \ b ,例如Namés。)

Because é is not a word character to JS, é followed by a space can never be considered a word boundary. (It would match \b if used in the middle of a word, like Namés.)


/([\。。,!?])(花式namé|namé)([\。,! ?] | $)/

是的,这将是JS的常用解决方法(尽管可能与更多标点符号)。对于其他语言,你通常使用lookahead / lookbehind来避免匹配前后边界字符,但这些在JS中支持不足/错误,所以最好避免。

Yeah, that would be the usual workaround for JS (though probably with more punctuation characters). For other languages you'd generally use lookahead/lookbehind to avoid matching the pre and post boundary characters, but these are poorly supported/buggy in JS so best avoided.

这篇关于为什么我不能在单词边界旁边使用重音字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆