用正则表达式在c解析电子邮件# [英] parse email with regex in c#

查看:149
本文介绍了用正则表达式在c解析电子邮件#的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要解析在C#中的正则表达式电子邮件文件,即分析包含几个电子邮件的电子邮件文件,并将其解析为它的成分,例如发件人,收件人BCC等。

I need to parse email files with regex in c#, that is parse the email file that contains several emails and parse it into its constituents e.g from, to, bcc etc.

正则表达式现在用电子邮件为

the regex am using for email is

"\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*"

这个问题我有是,抄送和密送有时包含多个电子邮件,并且发生在一个以上的行

the problem am having is the To, Cc and Bcc sometimes contains more than one email, and occurs in more than one line

To: Me meagain <me@me.com>,
    Me1 meagain <me1@me.com>,Me3 meagain <me1@me.com>



此外,它的正则表达式匹配的信息?

Also, which regex will match the message?

推荐答案

用正则表达式解析电子邮件是一个可怕的想法。你也许能够解析使用正则表达式的组成部分,但的找到的正则表达式的组成部分是要给你来得正好。

Parsing an email message with regular expressions is a terrible idea. You might be able to parse the constituent parts with regular expressions, but finding the constituent parts with regular expressions is going to give you fits.

正常情况下,当然是非常的方便。但你跨类似的东西,内部有嵌入的消息的消息中运行。也就是说,内容包括与发件人:密件抄送:等而你天真的正则表达式解析器认为,一个完整的电子邮件消息唉!我发现了一个新的消息!

The normal case, of course, is pretty easy. But then you run across something like a message that has an embedded message within it. That is, the content includes a full email message with From:, To:, Bcc:, etc. And your naive regex parser thinks, "Oh, boy! I found a new message!"

你最好阅读和理解 Internet邮件格式并写一个真正的解析器,或者使用已经写的 OpenPop.NET 东西。

You're better off reading and understanding the Internet Message Format and writing a real parser, or using something already written like OpenPop.NET.

此外,检查出使用POP3在阅读电子邮件的建议C#免费POP3 .NET库?,等等。

Also, check out the suggestions in Reading Email using Pop3 in C# and Free POP3 .NET library?, among others.

你所面临的困难一个很好的例子是,你匹配的电子邮件地址的正则表达式是不够的。根据RFC2822(上面链接)第3.2.4,以下字符被允许在电子邮件地址的本地部分:

A good example of the difficulty you'll face is that your regular expression for matching email addresses is inadequate. According to section 3.2.4 of RFC2822 (linked above), the following characters are allowed in the "local-part" of the email address:

atext = ALPHA / DIGIT / ; Any character except controls,
        "!" / "#" /     ;  SP, and specials.
        "$" / "%" /     ;  Used for atoms
        "&" / "'" /
        "*" / "+" /
        "-" / "/" /
        "=" / "?" /
        "^" / "_" /
        "`" / "{" /
        "|" / "}" /
        "~"



域名可以包含除空白任何ASCII和\字符,并且具有满足一些格式要求。然后是过时的东西,虽然已过时,仍然是在使用。而这仅仅是在解析电子邮件地址。如果你看一下,可以被包含在其他领域的东西,我想你会同意,试图用正则表达式解析它会是令人沮丧的最好的。

The domain name can contain any ASCII except whitespace and the "\" character, and has to meet some format requirements. Then there's the "obsolete" stuff that, although deprecated, is still in use. And that's just in parsing email addresses. If you look at the stuff that can be included in the other fields, I think you'll agree that trying to parse it with regular expressions is going to be frustrating at best.

这篇关于用正则表达式在c解析电子邮件#的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆