在正则表达式中包含ö,ä,ü等特殊字符 [英] Include special characters like ö,ä,ü in regular expressions

查看:646
本文介绍了在正则表达式中包含ö,ä,ü等特殊字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下正则表达式:

pattern='^.*(?=.{8,})(?=.*[a-zA-Z])(?=.*\d).*$'




  • 至少8个字符

  • 至少1个数字

  • 至少1个字母(上限)或者小写)

  • 不幸的是,不包括像德国ä,ö,ü这样的特殊字符,所以像<$ c $这样的模式c>1234567ä将失败。
    有谁知道如何让他们进入这个表达式?我猜它应该在(?=。* [a-zA-Z])部分。
    提前感谢您的努力

    unfortunately, special characters like the german ä,ö,ü are not included, so patterns like 1234567ä will fail. Does anyone know how to get them into this Expression? I guess that it should probably be in the (?=.*[a-zA-Z]) section. Thank you in advance for your effort

    推荐答案

    答案取决于您想要做什么。

    The answer depends on exactly what you want to do.

    如您所知, [a-zA-Z] 仅匹配没有变音符号的拉丁字母。

    As you have noticed, [a-zA-Z] only matches Latin letters without diacritics.

    如果您只关心德国变音符号和ß结扎,那么您只需用替换该部分[a-zA-ZäöüÄÖÜß] ,例如:

    If you only care about German diacritics and the ß ligature, then you can just replace that part with [a-zA-ZäöüÄÖÜß], e.g.:

    pattern='^.*(?=.{8,})(?=.*[a-zA-ZäöüÄÖÜß])(?=.*\d).*$'
    

    但这可能不是你想要做的。您可能希望将拉丁字母与任何变音符号匹配,而不仅仅是德语中使用的那些。或许你想要匹配任何字母表中的任何字母,而不仅仅是拉丁语。

    But that probably isn’t what you want to do. You probably want to match Latin letters with any diacritics, not just those used in German. Or perhaps you want to match any letters from any alphabet, not just Latin.

    其他正则表达式方言有字符类可以帮助你解决这类问题,但遗憾的是JavaScript常规表达式方言只有很少的字符类,没有一个对你有帮助。

    Other regular expressions dialects have character classes to help you with problems like this, but unfortunately JavaScript’s regular expression dialect has very few character classes and none of them help you here.

    (如果你不知道,字符类是一个匹配任何字符的表达式作为预定义字符组成员的字符。例如, \w 是一个匹配任何ASCII字母,数字或下划线的字符类,并且是一个匹配任何字符的字符类。)

    (In case you don’t know, a "character class" is an expression that matches any character that is a member of a predefined group of characters. For example, \w is a character class that matches any ASCII letter, or digit, or an underscore, and . is a character class that matches any character.)

    这意味着你必须列出每个UTF范围-16代码单元,对应于您想要匹配的字符。

    This means that you have to list out every range of UTF-16 code units that corresponds to a character that you want to match.

    快速而肮脏的解决方案可能是 [a-zA- Z\\ \\ u0080-\ uFFFF] ,或完整:

    A quick and dirty solution might be to say [a-zA-Z\u0080-\uFFFF], or in full:

    pattern='^.*(?=.{8,})(?=.*[a-zA-Z\\u0080-\\uFFFF])(?=.*\d).*$'
    

    这将匹配ASCII范围内的任何字母,但也将匹配ASCII范围之外的任何字符。这包括所有可能的字母字符,在任何脚本中有或没有变音符号。但是,它还包含许多不是字母的字符。不包括ASCII范围内的非字母,但包括ASCII范围之外的非字母。

    This will match any letter in the ASCII range, but will also match any character at all that is outside the ASCII range. This includes all possible alphabetic characters with or without diacritics in any script. However, it also includes a lot of characters that are not letters. Non-letters in the ASCII range are excluded, but non-letters outside the ASCII range are included.

    上述内容可能足以满足您的需要,但如果不是那么你必须找出你需要的字符范围并明确指定它们。

    The above might be good enough for your purposes, but if it isn’t then you will have to figure out which character ranges you need and specify those explicitly.

    这篇关于在正则表达式中包含ö,ä,ü等特殊字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆