国际电子邮件验证在C + + +使用Regex [英] International Email validation in C++ using Regex
问题描述
我在验证国际电子邮件地址时遇到一些问题,例如 john.doe @神谕.com
, sara.smith @神谕.com
, babu.ratnakar +आଆఉఊګ神谕@ gmail.com
,
testæœö。神谕#$& ; *éùôß@äßæçëêùé+आଆ神谕.com
在C ++中使用REGEX
下面的Regex在Java中对我有用:
^ [\\p {L} 0-9!#$%&'* + / =?^ _`{ |}〜 - ] +(?:\\。[\\p {L} 0-9!#$%&'* + / =?^ _`{|}〜 - ] + @(?:[\\p {L} 0-9](?:[\\p {L} 0-9 - ] * [\\p {L} 0-9])? \\。)+ [\\p {L} 0-9](?:[\\p {L} 0-9 - ] * [\\p {L} 0-9 ])?$
我尝试在C ++中使用相同的略有修改
std :: string str([\\\\p {L} 0-9!#$%&'* + / =?^ _`{|}〜 - ] +(?: \。[\\\\p {L} 0-9!#$%&'* + / =?^ _`{ |}〜 - ] +)* @(?:[\\\\p {L} 0-9](?:[\\\\p {L} 0-9-] * [\\\\p {L} 0-9])?\。)+ [\\\\p {L} 0-9](?:[\\ \\p {L} 0-9 - ] * [\\\\p {L} 0-9])?
std :: regex rx4(str);
但所有情况下 regex_match
我认为问题是与 \p {L}
。当我用 a-z
替换它时,它接受具有英文字母的电子邮件地址。即这个工作正常:
std :: regex rx3([a-z0-9!#$%& * + / =?^ _`{|}〜 - ] +(?: \。[a-z0-9!#$%&'* + / =? *α(α:[a-z0-9](α:[a-z0-9-] * [a-z0-9] -z0-9-] * [a-z0-9])?,std :: regex :: ECMAScript);
/ p {L}
字母不能在C ++中工作?
C ++ std :: regex
支持6种regex口味:
< blockquote>
std :: regex_constants
中定义了六种不同的正则表达式语法或语法:
这些都不支持Unicode属性(或 Unicode类别 code> \p {L} 使用您的解决方法,如果它适用于您: 或来自已知在JavaScript中验证电子邮件地址 SO post (删除锚点,因为您正在使用 I am facing some issues in validating international email addresses like The following Regex worked fine for me in Java: I tried using the same with slight modification in C++ But C++ Six different regular expression flavors or grammars are defined in None of these support Unicode properties (or Unicode category classes) like Use your workaround if it works for you: Or a version from a known Validate email address in JavaScript? SO post (removing anchors since you are using
这篇关于国际电子邮件验证在C + + +使用Regex的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋! ECMAScript
:类似于JavaScriptbooks
基本
:类似于POSIX BRE。
扩展
:类似于POSIX ERE。
grep
:与基本相同,另外将换行符视为替换运算符。
,因此您不能在模式中使用
egrep
:与扩展名相同,除了将换行符视为替换运算符。
awk
:与扩展名相同,为不可打印的字符添加了支持的公共转义。 / p>
\p {L}
。
std :: regex rx3 [a-z0-9!#$%&'* + / =?^ _`{|}〜 - ] +(?: \。[a-z0-9!#$%&'* + (α:[a-z0-9])[α-z0-9])/(α:[a-z0-9])。 )+ [a-z0-9](α:[a-z0-9 - ] * [a-z0-9])?,std :: regex :: ECMAScript);
regex_match
使用非原始字符串字面值和 std :: regex :: ECMAScript
,因为它被默认使用):
std :: regex rx3((?:(?:[^<>()\\ [\\]。,;::\\\ \\ s @ \] +(?: \\。[^<>()\\ [\\]。,;::\\s @ \] + *)| \。+ \)@(?:(?:[^<>()\\ [\\]。,;:\\s @ \\] + \\。)+ [^<>()\\ [\\]。,;:\\ s @ \] {2,
john.doe@神谕.com
, sara.smith@神谕.com
, babu.ratnakar+आଆఉఊګ神谕@gmail.com
,
testæœö.神谕#$&*éùôß@äßæçëêùé+आଆ神谕.com
using REGEX in C++^[\\p{L}0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[\\p{L}0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[\\p{L}0-9](?:[\\p{L}0-9-]*[\\p{L}0-9])?\\.)+[\\p{L}0-9](?:[\\p{L}0-9-]*[\\p{L}0-9])?$
std::string str("[\\\\p{L}0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[\\\\p{L}0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[\\\\p{L}0-9](?:[\\\\p{L}0-9-]*[\\\\p{L}0-9])?\.)+[\\\\p{L}0-9](?:[\\\\p{L}0-9-]*[\\\\p{L}0-9])?");
std::regex rx4(str);
regex_match
fails on all cases. I think the issue is with \p{L}
. When I replaced that with a-z
, it accepts email addresses with english alphabets. ie this one is working: std::regex rx3("[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?", std::regex::ECMAScript);
/p{L}
to match unicode letters won't work in C++ ?std::regex
supports 6 regex flavors:
std::regex_constants
:ECMAScript
: Similar to JavaScript
basic
: Similar to POSIX BRE.
extended
: Similar to POSIX ERE.
grep
: Same as basic, with the addition of treating line feeds as alternation operators.
egrep
: Same as extended, with the addition of treating line feeds as alternation operators.
awk
: Same as extended, with the addition of supporting common escapes for non-printable characters.\p{L}
, thus you cannot use \p{L}
in your patterns.std::regex rx3("[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?", std::regex::ECMAScript);
regex_match
and re-escaping for use with a non-raw string literal, and std::regex::ECMAScript
since it is used by default):std::regex rx3("(?:(?:[^<>()\\[\\].,;:\\s@\"]+(?:\\.[^<>()\\[\\].,;:\\s@\"]+)*)|\".+\")@(?:(?:[^<>()\\[\\].,;:\\s@\"]+\\.)+[^<>()\\[\\].,;:\\s@\"]{2,})")