国际电子邮件验证在C + + +使用Regex [英] International Email validation in C++ using Regex

查看:140
本文介绍了国际电子邮件验证在C + + +使用Regex的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在验证国际电子邮件地址时遇到一些问题,例如 john.doe @神谕.com sara.smith @神谕.com babu.ratnakar +आଆఉఊګ神谕@ gmail.com
testæœö。神谕#$& ; *éùôß@äßæçëêùé+आଆ神谕.com 在C ++中使用REGEX



下面的Regex在Java中对我有用:

  ^ [\\p {L} 0-9!#$%&'* + / =?^ _`{ |}〜 - ] +(?:\\。[\\p {L} 0-9!#$%&'* + / =?^ _`{|}〜 - ] + @(?:[\\p {L} 0-9](?:[\\p {L} 0-9  - ] * [\\p {L} 0-9])? \\。)+ [\\p {L} 0-9](?:[\\p {L} 0-9  - ] * [\\p {L} 0-9 ])?$ 

我尝试在C ++中使用相同的略有修改

  std :: string str([\\\\p {L} 0-9!#$%&'* + / =?^ _`{|}〜 - ] +(?: \。[\\\\p {L} 0-9!#$%&'* + / =?^ _`{ |}〜 - ] +)* @(?:[\\\\p {L} 0-9](?:[\\\\p {L} 0-9-] * [\\\\p {L} 0-9])?\。)+ [\\\\p {L} 0-9](?:[\\ \\p {L} 0-9  - ] * [\\\\p {L} 0-9])? 

std :: regex rx4(str);

但所有情况下 regex_match 我认为问题是与 \p {L} 。当我用 a-z 替换它时,它接受具有英文字母的电子邮件地址。即这个工作正常:

  std :: regex rx3([a-z0-9!#$%& * + / =?^ _`{|}〜 - ] +(?: \。[a-z0-9!#$%&'* + / =? *α(α:[a-z0-9](α:[a-z0-9-] * [a-z0-9] -z0-9-] * [a-z0-9])?,std :: regex :: ECMAScript); 

/ p {L} 字母不能在C ++中工作?

解决方案

C ++ std :: regex 支持6种regex口味



< blockquote>

std :: regex_constants 中定义了六种不同的正则表达式语法或语法:



ECMAScript :类似于JavaScriptbooks
基本:类似于POSIX BRE。

扩展:类似于POSIX ERE。

grep :与基本相同,另外将换行符视为替换运算符。

egrep :与扩展名相同,除了将换行符视为替换运算符。

awk :与扩展名相同,为不可打印的字符添加了支持的公共转义。 / p>

这些都不支持Unicode属性(或 Unicode类别 code> \p {L} ,因此您不能在模式中使用 \p {L}

使用您的解决方法,如果它适用于您:

  std :: regex rx3 [a-z0-9!#$%&'* + / =?^ _`{|}〜 - ] +(?: \。[a-z0-9!#$%&'* + (α:[a-z0-9])[α-z0-9])/(α:[a-z0-9])。 )+ [a-z0-9](α:[a-z0-9  - ] * [a-z0-9])?,std :: regex :: ECMAScript); 

或来自已知在JavaScript中验证电子邮件地址 SO post (删除锚点,因为您正在使用 regex_match 使用非原始字符串字面值和 std :: regex :: ECMAScript ,因为它被默认使用):

  std :: regex rx3((?:(?:[^<>()\\ [\\]。,;::\\\ \\ s @ \] +(?: \\。[^<>()\\ [\\]。,;::\\s @ \] + *)| \。+ \)@(?:(?:[^<>()\\ [\\]。,;:\\s @ \\] + \\。)+ [^<>()\\ [\\]。,;:\\ s @ \] {2, 


I am facing some issues in validating international email addresses like john.doe@神谕.com, sara.smith@神谕.com, babu.ratnakar+आଆఉఊګ神谕@gmail.com, testæœö.神谕#$&*éùôß@äßæçëêùé+आଆ神谕.com using REGEX in C++

The following Regex worked fine for me in Java:

^[\\p{L}0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[\\p{L}0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[\\p{L}0-9](?:[\\p{L}0-9-]*[\\p{L}0-9])?\\.)+[\\p{L}0-9](?:[\\p{L}0-9-]*[\\p{L}0-9])?$

I tried using the same with slight modification in C++

std::string str("[\\\\p{L}0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[\\\\p{L}0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[\\\\p{L}0-9](?:[\\\\p{L}0-9-]*[\\\\p{L}0-9])?\.)+[\\\\p{L}0-9](?:[\\\\p{L}0-9-]*[\\\\p{L}0-9])?"); 

std::regex rx4(str);

But regex_match fails on all cases. I think the issue is with \p{L}. When I replaced that with a-z, it accepts email addresses with english alphabets. ie this one is working:

std::regex rx3("[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?", std::regex::ECMAScript);

/p{L} to match unicode letters won't work in C++ ?

解决方案

C++ std::regex supports 6 regex flavors:

Six different regular expression flavors or grammars are defined in std::regex_constants:

ECMAScript: Similar to JavaScript
basic: Similar to POSIX BRE.
extended: Similar to POSIX ERE.
grep: Same as basic, with the addition of treating line feeds as alternation operators.
egrep: Same as extended, with the addition of treating line feeds as alternation operators.
awk: Same as extended, with the addition of supporting common escapes for non-printable characters.

None of these support Unicode properties (or Unicode category classes) like \p{L}, thus you cannot use \p{L} in your patterns.

Use your workaround if it works for you:

std::regex rx3("[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?", std::regex::ECMAScript);

Or a version from a known Validate email address in JavaScript? SO post (removing anchors since you are using regex_match and re-escaping for use with a non-raw string literal, and std::regex::ECMAScript since it is used by default):

std::regex rx3("(?:(?:[^<>()\\[\\].,;:\\s@\"]+(?:\\.[^<>()\\[\\].,;:\\s@\"]+)*)|\".+\")@(?:(?:[^<>()‌​\\[\\].,;:\\s@\"]+\\.)+[^<>()\\[\\].,;:\\s@\"]{2,})")

这篇关于国际电子邮件验证在C + + +使用Regex的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆