使用正则表达式检查变音符号 [英] Checking for diacritics with a regular expression

查看:37
本文介绍了使用正则表达式检查变音符号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

简单问题:现有项目允许我添加其他字段(对这些字段进行额外检查作为正则表达式)以支持自定义输入表单.我需要添加一个新表单,但无法更改该项目的工作方式.此表格允许访问者输入他的名字和姓氏以及姓名首字母.所以正则表达式 ^[a-zA-Z.]*$ 现在工作得很好.
然后有人注意到它不会接受变音符号作为输入.像 Ömür 这样的土耳其名字不被接受为有效.不过,它需要被接受.

Simple problem: an existing project allows me to add additional fields (with additional checks on those fields as regular expressions) to support custom input forms. And I need to add a new form but cannot change how this project works. This form allows a visitor to enter his first and last name plus initials. So the RegEx ^[a-zA-Z.]*$ worked just fine for now.
Then someone noticed that it wouldn't accept diacritic characters as input. A Turkish name like Ömür was not accepted as valid. It needs to be accepted, though.

所以我有两个选择:

  1. 彻底取消检查,这将允许用户输入垃圾.
  2. 编写一个正则表达式,其中也包含变音字母,但仍不包含数字、空格或其他非字母.

由于我无法更改项目的代码,所以我只有这两个选项.我更喜欢选项 2,但现在想知道正确的 RegEx 应该是什么.(该项目是用 C# 4.0 编写的.)

Since I cannot change the code of the project, I only have these two options. I would prefer option 2 but now wonder what the proper RegEx should be. (The project is written in C# 4.0.)

推荐答案

您可以使用特定的 Unicode 转义字母 - p{L}(这将包括 A-Za-z 范围):

You can use the specific Unicode escape for letters - p{L} (this will include the A-Za-z ranges):

^[.p{L}]*$

参见regularexpressions.info:

p{L} 或 p{Letter}

p{L} or p{Letter}

匹配具有字母"属性的单个 Unicode 代码点.有关完整的属性列表,请参阅教程中的 Unicode 字符属性.每个 Unicode 代码点只有一个属性.可以在字符类中使用.

Matches a single Unicode code point that has the property "letter". See Unicode Character Properties in the tutorial for a complete list of properties. Each Unicode code point has exactly one property. Can be used inside character classes.

这篇关于使用正则表达式检查变音符号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆