重音字符(变音符号)的具体 Javascript 正则表达式 [英] Concrete Javascript Regex for Accented Characters (Diacritics)
问题描述
我看过堆栈溢出(替换字符..嗯, JavaScript 如何不遵循有关 RegExp 的 Unicode 标准 等)并且还没有真正找到问题的具体答案:
I've looked on Stack Overflow (replacing characters.. eh, how JavaScript doesn't follow the Unicode standard concerning RegExp, etc.) and haven't really found a concrete answer to the question:
JavaScript 如何匹配重音字符(带有变音符号的字符)?
我强制 UI 中的字段匹配以下格式:last_name, first_name
(last [comma space] first),我想为以下内容提供支持变音符号,但显然在 JavaScript 中它比其他语言/平台更难.
I'm forcing a field in a UI to match the format: last_name, first_name
(last [comma space] first), and I want to provide support for diacritics, but evidently in JavaScript it's a bit more difficult than other languages/platforms.
这是我的原始版本,直到我想添加变音符号支持:
This was my original version, until I wanted to add diacritic support:
/^[a-zA-Z]+,s[a-zA-Z]+$/
目前我正在讨论添加支持的三种方法中的一种,所有这些方法我都已经测试并有效(至少在某种程度上,我真的不知道第二种方法的程度"是什么).他们在这里:
Currently I'm debating one of three methods to add support, all of which I have tested and work (at least to some extent, I don't really know what the "extent" is of the second approach). Here they are:
var accentedCharacters = "àèìòùÀÈÌÒÙáéíóúýÁÉÍÓÚÝâêîôûÂÊÎÔÛãñõÃÑÕäëïöüÿÄËÏÖÜŸçÇßØøÅåÆæœ";
// Build the full regex
var regex = "^[a-zA-Z" + accentedCharacters + "]+,\s[a-zA-Z" + accentedCharacters + "]+$";
// Create a RegExp from the string version
regexCompiled = new RegExp(regex);
// regexCompiled = /^[a-zA-ZàèìòùÀÈÌÒÙáéíóúýÁÉÍÓÚÝâêîôûÂÊÎÔÛãñõÃÑÕäëïöüÿÄËÏÖÜŸçÇßØøÅåÆæœ]+,s[a-zA-ZàèìòùÀÈÌÒÙáéíóúýÁÉÍÓÚÝâêîôûÂÊÎÔÛãñõÃÑÕäëïöüÿÄËÏÖÜŸçÇßØøÅåÆæœ]+$/
- 这正确地将姓氏/名字与
accentedCharacters
中任何受支持的重音字符匹配. - This correctly matches a last/first name with any of the supported accented characters in
accentedCharacters
.
var regex = /^.+,s.+$/;
- 这几乎可以匹配任何东西,至少是以下形式:
something, something
.没关系,我想... - This would match for just about anything, at least in the form of:
something, something
. That's alright I suppose...
/^[a-zA-Zu00C0-u017F]+,s[a-zA-Zu00C0-u017F]+$/
- 它匹配一系列 unicode 字符 - 经过测试和工作,虽然我没有尝试任何疯狂的东西,只是我在我们的语言部门看到的教员姓名的正常内容.
- 第一个解决方案过于局限,而且草率和复杂.如果我忘记了一两个字符就需要更改它,这不太实用.
- 第二种解决方案更好、更简洁,但它可能比实际应该匹配的多得多.我找不到关于完全什么
.
匹配的任何真实文档,只是除换行符之外的任何字符"的概括(来自MDN). 第三个解决方案似乎是最精确的,但是有什么问题吗?我对 Unicode 不是很熟悉,至少在实践中,但查看了一个代码表/该表的延续,
u00C0-u017F
似乎非常可靠,至少对于我预期的输入而言是这样.- The first solution is far too limiting, and sloppy and convoluted at that. It would need to be changed if I forgot a character or two, and that's just not very practical.
- The second solution is better, concise, but it probably matches far more than it actually should. I couldn't find any real documentation on exactly what
.
matches, just the generalization of "any character except the newline character" (from a table on the MDN). The third solution seems the be the most precise, but are there any gotchas? I'm not very familiar with Unicode, at least in practice, but looking at a code table/continuation of that table,
u00C0-u017F
seems to be pretty solid, at least for my expected input.- 教师不会提交以他们的母语(例如阿拉伯语、中文、日语等)命名的表格,所以我不必担心拉丁字符集以外的字符莉>
以下是我的担忧:
<小时>
那么真正的问题:这三种方法中的哪一种最适合该任务?或者有更好的解决方案吗?
So the real question(s): Which of these three approaches is most suited for the task? Or are there better solutions?
推荐答案
接受所有重音的更简单方法是:
The easier way to accept all accents is this:
[A-zÀ-ú] // accepts lowercase and uppercase characters
[A-zÀ-ÿ] // as above but including letters with an umlaut (includes [ ] ^ × ÷)
[A-Za-zÀ-ÿ] // as above but not including [ ] ^
[A-Za-zÀ-ÖØ-öø-ÿ] // as above but not including [ ] ^ × ÷
有关按数字顺序列出的字符,请参阅 https://unicode-table.com/en/.
See https://unicode-table.com/en/ for characters listed in numeric order.
这篇关于重音字符(变音符号)的具体 Javascript 正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!