根据RFC5321/RFC5322对电子邮件地址进行正则表达式验证 [英] Regex validation of email addresses according to RFC5321/RFC5322

查看:243
本文介绍了根据RFC5321/RFC5322对电子邮件地址进行正则表达式验证的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有人知道根据 RFC5321 /由于(稳定)注释使语法不规则,因此仅应考虑没有注释的地址.

Since (nestable) comments make the grammar irregular, only addresses without comments should be regarded.

当然,如果您有兴趣验证某人实际拥有的地址,那么唯一真正的验证就是向该地址发送电子邮件,并检查所有者是否收到了该电子邮件.但是,我纯粹对RFC标准感兴趣.对于实用方法,此问题更为相关.

Of course, if you're interested in validating an address that is actually owned by someone then the only real validation is to send an email to the address and check if the owner received it. I am however purely interested in the RFC standards. For a practical approach this question is more relevant.

除了评论,我愿意牺牲折叠的空白,但是除此之外,我对拒绝任何RFC5321/2有效地址的表达式不感兴趣. (可以说,在某些情况下甚至忽略折叠的空白也是有意义的.)

On top of comments I am willing to sacrifice folding white space, but apart from that I'm not interested in expressions that reject any addresses that are RFC5321/2-valid. (Arguably it would even make sense in some circumstances to disregard folding white space.)

理想情况下,正则表达式会拒绝所有不是 RFC有效的内容,但这并不重要.例如,在正则表达式中包含详尽的顶级域名列表并不是很有趣.只需接受任何顶级域名就足够了.

Ideally the regex would reject anything that's not RFC-valid, but that's less important. It's not so interesting to include an exhausive list of top-level domains in the regex for example. Simply accepting any top-level domain will suffice.

我不确定地址标记(例如address+tag@domain.org)是否是我提到的RFC的一部分,但我希望正则表达式能够对其进行验证.

I'm not sure if address tags (e.g. address+tag@domain.org) are part of the RFCs I mentioned, but I would like the regex to validate these.

应该正确地正确处理IPv6( RFC5952 ).

IPv6 should definitly be handled correctly (RFC5952).

据我了解国际化电子邮件( RFC6530 RFC6532 RFC6533 )仍处于实验阶段,但是验证这些地址的表达式也很有趣

As I understand internationalized email (RFC6530, RFC6531, RFC6532, RFC6533) is still in the experimental phase, but an expression validating these addresses would also be interesting.

要使答案普遍有趣,如果任何正则表达式为POSIX格式,那将是很好的选择.

To make the answers universally interesting it would be nice if any regular expressions were in POSIX format.

推荐答案

嵌套的注释使电子邮件地址的语法不规则(无上下文).但是,如果您排除注释,则语法是正常的.主要定义允许在词法标记(例如a @ b.com)之间使用(折叠)空格.删除所有折叠的空白将产生规范形式.

Nestable comments make the grammar for email-addresses irregular (context-free). If you preclude comments however, the resulting grammar is regular. The primary definition allows for (folding) whitespace between lexical tokens (e.g. a @ b.com). Removing all folding whitespace results in a canonical form.

这是根据RFC 5322(不包括注释)的规范电子邮件地址的正则表达式:

This is the regex for canonical email addresses according to RFC 5322 (precluding comments):

([!#-'*+/-9=?A-Z^-~-]+(\.[!#-'*+/-9=?A-Z^-~-]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@([!#-'*+/-9=?A-Z^-~-]+(\.[!#-'*+/-9=?A-Z^-~-]+)*|\[[\t -Z^-~]*])

如果您需要接受折叠空格,则这是根据RFC 5322(不包括注释)的电子邮件地址的正则表达式:

If you need to accept folding whitespace, then this is the regular expression for email addresses according to RFC 5322 (precluding comments):

((([\t ]*\r\n)?[\t ]+)?[-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?"(((([\t ]*\r\n)?[\t ]+)?([]!#-[^-~]|(\\[\t -~])))+(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?)"(([\t ]*\r\n)?[\t ]+)?)@((([\t ]*\r\n)?[\t ]+)?[-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?\[((([\t ]*\r\n)?[\t ]+)?[!-Z^-~])*(([\t ]*\r\n)?[\t ]+)?](([\t ]*\r\n)?[\t ]+)?)

有效电子邮件地址在RFC 5321(SMTP)中进一步受到限制.它基本上只保留@符号之前的部分,但仅接受@符号之后的主机名或地址文字. ("---.---"是有效的点原子,但不是有效的主机名,"[...]"是有效的域文字,但不是有效的地址文字.)

Valid email addresses are further restricted in RFC 5321 (SMTP). It basically leaves alone the part before the @-sign, but accepts only host names or address literals after the @-sign. ("---.---" is a valid dot-atom, but not a valid host name and "[...]" is a valid domain literal, but not a valid address literal.)

当涉及到主机名和IP地址时,RFC 5321中提出的语法太宽容了.我使用此草案 RFC 1034 (第3.5节)作为准则.这是生成的正则表达式.

The grammar presented in RFC 5321 is too lenient when it comes to both host names and IP addresses. I took the liberty of "correcting" the rules in question, using this draft and RFC 1034 (section 3.5) as guidelines. Here's the resulting regex.

([!#-'*+/-9=?A-Z^-~-]+(\.[!#-'*+/-9=?A-Z^-~-]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@([0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?(\.[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?)*|\[((25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3}|IPv6:((((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){6}|::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){5}|[0-9A-Fa-f]{0,4}::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){4}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):)?(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){3}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,2}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){2}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,3}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,4}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,5}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,6}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)|(?!IPv6:)[0-9A-Za-z-]*[0-9A-Za-z]:[!-Z^-~]+)])

所有正则表达式均为POSIX ERE.最后一个使用否定的前瞻.请参阅此处正则表达式.

All regexes are POSIX EREs. The last one uses a negative lookahead. See here for the derivations of the regular expressions.

这篇关于根据RFC5321/RFC5322对电子邮件地址进行正则表达式验证的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆