使用正则表达式验证电子邮件地址会造成伤害吗? [英] Can it cause harm to validate email addresses with a regex?

查看:40
本文介绍了使用正则表达式验证电子邮件地址会造成伤害吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我听说用正则表达式验证电子邮件地址是一件坏事,而且它实际上会造成伤害.这是为什么?我认为验证数据永远不会是一件坏事.也许不必要,但只要您正确执行验证,就绝对不是坏事.你能解释一下为什么这是对的还是错的?如果可能造成伤害,请举例说明.

I've heard that it is a bad thing to validate email addresses with a regex, and that it actually can cause harm. Why is that? I thought it never could be a bad thing to validate data. Maybe unnecessary, but never a bad thing provided that you perform the validation correctly. Could you explain to me why this is right or wrong? If it can cause harm, please give an example.

推荐答案

总的来说,是的 - 使用正则表达式来验证电子邮件地址是有害的.这是因为正则表达式作者的错误(不正确)假设.

In general, yes - using regular expressions to validate email addresses is harmful. This is because of bad (incorrect) assumptions by the author of the regular expression.

正如@klutt 指出的,电子邮件地址有两部分,local-partdomain.值得注意的是,关于这些部分的一些不是很明显的事情:

As @klutt indicated, an email address has two parts, the local-part and the domain. It's worth noting some things about these parts that aren't immediately obvious:

  • local-part 可以包含转义字符,甚至可以包含额外的 @ 字符.
  • local-part 可以区分大小写,但是取决于该特定域的邮件服务器如何区分大小写.
  • domain 部分可以包含零个或多个由句点 (.) 分隔的标签,但实际上没有对应于根的 MX 记录(零标签)或在 gTLD(一个标签)上.
  • The local-part can contain escaped characters and even additional @ characters.
  • The local-part can be case sensitive, however it is up to the mail server at that specific domain how it wants to distinguish case.
  • The domain part can contain zero or more labels separated by a period (.), though in practice there are no MX records corresponding to the root (zero labels) or on the gTLDs (one label) themselves.

因此,您可以在不拒绝与上述对应的有效电子邮件地址的情况下进行一些检查:

So, there are some checks that you can do without rejecting valid email addresses that correspond with the above:

  • 地址至少包含一个@
  • local-part(最右边@左侧的所有内容)非空
  • domain 部分(最右边的 @ 右侧的所有内容)至少包含一个句点(同样,这不是严格正确的,而是实用的)
  • Address contains at least one @
  • The local-part (everything to the left of the rightmost @) is non-empty
  • The domain part (everything to the right of the rightmost @) contains at least one period (again, this isn't strictly true but pragmatic)

就是这样.正如其他人指出的那样,最好的做法是测试该地址的可传递性.这将确定两件重要的事情:

That's it. As others have pointed out, it's best practice to test deliverability to that address. This will establish two important things:

  1. 电子邮件当前是否存在;和
  2. 用户有权访问电子邮件地址(是合法用户或所有者)

如果您将电子邮件激活流程构建到您的业务流程中,您就无需担心复杂的正则表达式会出现问题.

If you build email activation processes into your business process, you don't need to worry about complicated regular expressions that have issues.

一些进一步阅读以供参考:

Some further reading for reference:

RFC 5321:简单邮件传输协议

OWASP:输入验证备忘单

这篇关于使用正则表达式验证电子邮件地址会造成伤害吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆