在C中解析电子邮件标头与正则表达式# [英] Parse email header with Regex in C#

查看:299
本文介绍了在C中解析电子邮件标头与正则表达式#的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个网络挂接张贴到我的web应用程序的形式,我需要解析出电子邮件标题地址

I've got a webhook posting to a form on my web application and I need to parse out the email header addresses.

下面是原文:

Thread-Topic: test subject
Thread-Index: AcwE4mK6Jj19Hgi0SV6yYKvj2/HJbw==
From: "Lastname, Firstname" <firstname_lastname@domain.com>
To: <testto@domain.com>, testto1@domain.com, testto2@domain.com
Cc: <testcc@domain.com>, test3@domain.com
X-OriginalArrivalTime: 27 Apr 2011 13:52:46.0235 (UTC) FILETIME=[635226B0:01CC04E2]

我期待拉出以下内容:

<testto@domain.com>, testto1@domain.com, testto2@domain.com

我一直用正则表达式挣扎了一整天没有任何运气。

I'm been struggling with Regex all day without any luck.

推荐答案

相反,这里的一些职位我有mmutz同意,你不能解析的邮件用正则表达式的...看到这篇文章:

Contrary to some of the posts here I have to agree with mmutz, you cannot parse emails with a regex... see this article:

的http:// tools.ietf.org/html/rfc2822#section-3.4.1

3.4.1。地址规格说明书

3.4.1. Addr-spec specification

这是地址规格是包含局部
演绎得串后跟
。在一个特定的因特网
标识符-sign字符(@,ASCII值
64),其次是一个互联网域名。

An addr-spec is a specific Internet identifier that contains a locally interpreted string followed by the at-sign character ("@", ASCII value 64) followed by an Internet domain.

的本地的想法解释意味着只有接收服务器预计能够解析它。

The idea of "locally interpreted" means that only the receiving server is expected to be able to parse it.

如果我打算尝试解决这个我会找到行内容,除了打破它,并尝试分析与System.Net.Mail每个段。 。MailAddress

If I were going to try and solve this I would find the "To" line contents, break it apart and attempt to parse each segment with System.Net.Mail.MailAddress.

    static void Main()
    {
        string input = @"Thread-Topic: test subject
Thread-Index: AcwE4mK6Jj19Hgi0SV6yYKvj2/HJbw==
From: ""Lastname, Firstname"" <firstname_lastname@domain.com>
To: <testto@domain.com>, ""Yes, this is valid""@[emails are hard to parse!], testto1@domain.com, testto2@domain.com
Cc: <testcc@domain.com>, test3@domain.com
X-OriginalArrivalTime: 27 Apr 2011 13:52:46.0235 (UTC) FILETIME=[635226B0:01CC04E2]";

        Regex toline = new Regex(@"(?im-:^To\s*:\s*(?<to>.*)$)");
        string to = toline.Match(input).Groups["to"].Value;

        int from = 0;
        int pos = 0;
        int found;
        string test;

        while(from < to.Length)
        {
            found = (found = to.IndexOf(',', from)) > 0 ? found : to.Length;
            from = found + 1;
            test = to.Substring(pos, found - pos);

            try
            {
                System.Net.Mail.MailAddress addy = new System.Net.Mail.MailAddress(test.Trim());
                Console.WriteLine(addy.Address);
                pos = found + 1;
            }
            catch (FormatException)
            {
            }
        }
    }

从上面的程序输出:

testto@domain.com
"Yes, this is valid"@[emails are hard to parse!]
testto1@domain.com
testto2@domain.com

这篇关于在C中解析电子邮件标头与正则表达式#的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆