最好定期防爆pression与ASP.NET 3.5认证电子邮件格式验证 [英] Best Regular Expression for Email Format Validation with ASP.NET 3.5 Validation

查看:147
本文介绍了最好定期防爆pression与ASP.NET 3.5认证电子邮件格式验证的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我用两个下面的正前pressions进行测试与ASP.NET验证控件的有效电子邮件前pression。我想知道这是从性能的角度来看,较好的前pression,或者如果有人有更好的。

I've used both of the following Regular Expressions for testing for a valid email expression with ASP.NET validation controls. I was wondering which is the better expression from a performance standpoint, or if someone has better one.


 - \w+([-+.']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*
 - ^([0-9a-zA-Z]([-\.\w]*[0-9a-zA-Z])*@([0-9a-zA-Z][-\w]*[0-9a-zA-Z]\.)+[a-zA-Z]{2,9})$

我试图避免对的 BCL团队博客

更新

根据反馈,我结束了创建一个函数来测试如果电子邮件是有效的:

Based on feedback I ended up creating a function to test if an email is valid:

Public Function IsValidEmail(ByVal emailString As String, Optional ByVal isRequired As Boolean = False) As Boolean
    Dim emailSplit As String()
    Dim isValid As Boolean = True
    Dim localPart As String = String.Empty
    Dim domainPart As String = String.Empty
    Dim domainSplit As String()
    Dim tld As String

    If emailString.Length >= 80 Then
        isValid = False
    ElseIf emailString.Length > 0 And emailString.Length < 6 Then
        'Email is too short
        isValid = False
    ElseIf emailString.Length > 0 Then
        'Email is optional, only test value if provided
        emailSplit = emailString.Split(CChar("@"))

        If emailSplit.Count <> 2 Then
            'Only 1 @ should exist
            isValid = False
        Else
            localPart = emailSplit(0)
            domainPart = emailSplit(1)
        End If

        If isValid = False OrElse domainPart.Contains(".") = False Then
            'Needs at least 1 period after @
            isValid = False
        Else
            'Test Local-Part Length and Characters
            If localPart.Length > 64 OrElse ValidateString(localPart, ValidateTests.EmailLocalPartSafeChars) = False OrElse _
               localPart.StartsWith(".") OrElse localPart.EndsWith(".") OrElse localPart.Contains("..") Then
                isValid = False
            End If

            'Validate Domain Name Portion of email address
            If isValid = False OrElse _
               ValidateString(domainPart, ValidateTests.HostNameChars) = False OrElse _
               domainPart.StartsWith("-") OrElse domainPart.StartsWith(".") OrElse domainPart.Contains("..") Then
                isValid = False
            Else
                domainSplit = domainPart.Split(CChar("."))
                tld = domainSplit(UBound(domainSplit))

                ' Top Level Domains must be at least two characters
                If tld.Length < 2 Then
                    isValid = False
                End If
            End If
        End If
    Else
        'If no value is passed review if required
        If isRequired = True Then
            isValid = False
        Else
            isValid = True
        End If
    End If

    Return isValid
End Function

注:


  • IsValidEmail是更严格的关于允许再RFC字符,但它不检查那些所有可能的字符无效使用

推荐答案

如果你想知道为什么这个问题是生成这么少的活动,这是因为有应与你开始思考表演前处理许多其他的问题。这些中,最重要的是你是否应该使用正则表达式验证电子邮件地址在所有 - 与共识是,你不应该。它更麻烦比大多数人预期,而且很可能毫无意义呢。

If you're wondering why this question is generating so little activity, it's because there are so many other issues that should be dealt with before you start thinking about performance. Foremost among those is whether you should be using regexes to validate email addresses at all--and the consensus is that you should not. It's much trickier than most people expect, and probably pointless anyway.

另一个问题是,你的两个正则表达式中种,他们可以匹配字符串的巨大变化。例如,第二个被锚定在两端的,但首先是不;这将匹配&GT;&GT;&GT;&GT; foo@bar.com<&LT;&LT;&LT; ,因为有东西,看起来像嵌在一个电子邮件地址。也许框架迫使正则表达式的整个字符串匹配,但如果是这样的话,为什么是第二个锚?

Another problem is that your two regexes vary hugely in the kinds of strings they can match. For example, the second one is anchored at both ends, but the first isn't; it would match ">>>>foo@bar.com<<<<" because there's something that looks like an email address embedded in it. Maybe the framework forces the regex to match the whole string, but if that's the case, why is the second one anchored?

另一个区别是,第一个正则表达式使用 \\ W 贯穿始终,而第二个使用 [0-9A-ZA-Z] 在很多地方。在大多数的正则表达式的口味, \\ W 除了字母和数字的下划线相匹配,但在一些(包括.NET)也匹配的字母和数字,从家喻户晓的书写系统UNI code。

Another difference is that the first regex uses \w throughout, while the second uses [0-9a-zA-Z] in many places. In most regex flavors, \w matches the underscore in addition to letters and digits, but in some (including .NET) it also matches letters and digits from every writing system known to Unicode.

有许多其他方面的差异,但是这是学术;无论这些正则表达式是非常好的。请参见这里的话题了很好的讨论,以及一个更好的正则表达式。

There are many other differences, but that's academic; neither of those regexes is very good. See here for a good discussion of the topic, and a much better regex.

再回到原来的问题,我没有看到的性能的问题和这两个正则表达式的。除了在BCL博客中引用了嵌套量词的反模式,你也应该留意的情况下正则表达式的两个或多个相邻部分可以匹配相同的字符集 - 例如,

Getting back to the original question, I don't see a performance problem with either of those regexes. Aside from the nested-quantifiers anti-pattern cited in that BCL blog entry, you should also watch out for situations where two or more adjacent parts of the regex can match the same set of characters--for example,

([A-Za-z]+|\w+)@

有没有像在要么你张贴的正则表达式的。由量词控制部件总是由未量化的其他部分破碎。这两个正则表达式会遇到一些可以避免的回溯,但也有很多更好的理由不是性能拒绝它们。

There's nothing like that in either of the regexes you posted. Parts that are controlled by quantifiers are always broken up by other parts that aren't quantified. Both regexes will experience some avoidable backtracking, but there are many better reasons than performance to reject them.

编辑:所以第二个正则表达式的的受到灾难性的回溯;我应该在拍摄我的嘴关闭之前彻底测试过它。在该正则表达式左看右看,我不明白为什么你需要外星号在第一部分:

So the second regex is subject to catastrophic backtracking; I should have tested it thoroughly before shooting my mouth off. Taking a closer look at that regex, I don't see why you need the outer asterisk in the first part:

[0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*

所有这一切确实一点是确保第一个和最后一个字符是字母数字,同时允许两者之间的某个其他字符。这个版本做同样的事情,但它更迅速失败时不匹配是可能的:

All that bit does is make sure the first and last characters are alphanumeric while allowing some additional characters in between. This version does the same thing, but it fails much more quickly when no match is possible:

[0-9a-zA-Z][-.\w]*[0-9a-zA-Z]

这将可能足以消除回溯问题,但你也可以在@后使部分使用的原子团更​​有效的:

That would probably suffice to eliminate the backtracking problem, but you could also make the part after the "@" more efficient by using an atomic group:

(?>(?:[0-9a-zA-Z][-\w]*[0-9a-zA-Z]\.)+)[a-zA-Z]{2,9}

在换句话说,如果你已经匹配了所有你可以子看起来像域组件尾随点,接下来的部分看起来并不像一个顶级域名,不要打扰回溯。你将不得不放弃第一个字符是最后的点,你知道 [A-ZA-Z] {2,9} 将不匹配。

In other words, if you've matched all you can of substrings that look like domain components with trailing dots, and the next part doesn't look like a TLD, don't bother backtracking. The first character you would have to give up is the final dot, and you know [a-zA-Z]{2,9} won't match that.

这篇关于最好定期防爆pression与ASP.NET 3.5认证电子邮件格式验证的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆