使用正则表达式检查变音符号 [英] Checking for diacritics with a regular expression
问题描述
简单问题:现有项目允许我添加其他字段(对这些字段进行额外检查作为正则表达式)以支持自定义输入表单.我需要添加一个新表单,但无法更改该项目的工作方式.此表格允许访问者输入他的名字和姓氏以及姓名首字母.所以正则表达式 ^[a-zA-Z.]*$
现在工作得很好.
然后有人注意到它不会接受变音符号作为输入.像 Ömür
这样的土耳其名字不被接受为有效.不过,它需要被接受.
Simple problem: an existing project allows me to add additional fields (with additional checks on those fields as regular expressions) to support custom input forms. And I need to add a new form but cannot change how this project works. This form allows a visitor to enter his first and last name plus initials. So the RegEx ^[a-zA-Z.]*$
worked just fine for now.
Then someone noticed that it wouldn't accept diacritic characters as input. A Turkish name like Ömür
was not accepted as valid. It needs to be accepted, though.
所以我有两个选择:
- 彻底取消检查,这将允许用户输入垃圾.
- 编写一个正则表达式,其中也包含变音字母,但仍不包含数字、空格或其他非字母.
由于我无法更改项目的代码,所以我只有这两个选项.我更喜欢选项 2,但现在想知道正确的 RegEx 应该是什么.(该项目是用 C# 4.0 编写的.)
Since I cannot change the code of the project, I only have these two options. I would prefer option 2 but now wonder what the proper RegEx should be. (The project is written in C# 4.0.)
推荐答案
您可以使用特定的 Unicode 转义字母 - p{L}
(这将包括 A-Za-z
范围):
You can use the specific Unicode escape for letters - p{L}
(this will include the A-Za-z
ranges):
^[.p{L}]*$
p{L} 或 p{Letter}
p{L} or p{Letter}
匹配具有字母"属性的单个 Unicode 代码点.有关完整的属性列表,请参阅教程中的 Unicode 字符属性.每个 Unicode 代码点只有一个属性.可以在字符类中使用.
Matches a single Unicode code point that has the property "letter". See Unicode Character Properties in the tutorial for a complete list of properties. Each Unicode code point has exactly one property. Can be used inside character classes.
这篇关于使用正则表达式检查变音符号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!