是否可以以编程方式“清理”电子邮件? [英] Is it possible to programmatically 'clean' emails?

查看:162
本文介绍了是否可以以编程方式“清理”电子邮件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有人有任何建议,如何清理收到的电子邮件的身体?我想删除免责声明,图片,也许任何以前的电子邮件文本可能也存在,这样我只剩下正文文本内容。我的猜测是不可能以任何可靠的方式,但有人试过吗?有没有任何图书馆适合这种事情?

解决方案

在电子邮件中,有几个约定的标记意味着你想要的东西剥离您可以使用正则表达式查找这些行。我怀疑你不能很好地清理你的电子邮件,但有些东西你可以寻找:


  1. 以>开头的行(大于那个空格)标记一个引号

  2. 使用 - (两个连字符,然后是空格,然后换行)标记签名的开头,请参见维基百科上的签名块

  3. 多部分消息,边界以 - ,除此之外,您需要进行一些搜索,将邮件正文部分与不需要的部分(如base64图像)分开。

至于一个实际的C#实现,我为你或其他的人留下。


Does anyone have any suggestions as to how I can clean the body of incoming emails? I want to strip out disclaimers, images and maybe any previous email text that may be also be present so that I am left with just the body text content. My guess is it isn't going to be possible in any reliable way, but has anyone tried it? Are there any libraries geared towards this sort of thing?

解决方案

In email, there is couple of agreed markings that mean something you wish to strip. You can look for these lines using regular expressions. I doubt you can't really well "sanitize" your emails, but some things you can look for:

  1. Line starting with "> " (greater than then whitespace) marks a quote
  2. Line with "-- " (two hyphens then whitespace then linefeed) marks the beginning of a signature, see Signature block on Wikipedia
  3. Multipart messages, boundaries start with --, beyond that you need to do some searching to separate the message body parts from unwanted parts (like base64 images)

As for an actual C# implementation, I leave that for you or other SOers.

这篇关于是否可以以编程方式“清理”电子邮件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆