算法来匹配邮件自然的文本 [英] Algorithm to match natural text in mail

查看：153 发布时间：2015/11/30 21:06:58 python regex algorithm nlp

本文介绍了算法来匹配邮件自然的文本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要做进一步处理之前分离的自然，连贯的文/句子从列表，签名，问候的邮件等。

I need to separate natural, coherent text/sentences in emails from lists, signatures, greetings and so on before further processing.

例如：

汤姆

上周一，我们也BLA BLA，绝杀Lorem存有悲坐阿梅德，consectetur adipisici ELIT，sed的eiusmod tempor incidunt UT labore等 dolore麦格纳aliqua。的

last monday we did bla bla, lore Lorem ipsum dolor sit amet, consectetur adipisici elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua.

列表项目2
列表项目3
列表项目3

list item 2
list item 3
list item 3

UT斯达康enim广告微量veniam，QUIS nostrud实习ullamco laboris妮斯UT aliquid x EA commodi consequat。 QUIS奥特iure重新prehenderit 在voluptate velit 的

Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquid x ea commodi consequat. Quis aute iure reprehenderit in voluptate velit

问候，K。

---线的搞笑字符 - #######

---line-of-funny-characters-#######

例如INC。

33邪恶街，伦敦

移动：00二十三万四千三百四十五分之二十三万四千五百三十四

mobile: 00 234534/234345

在理想情况下，该算法将只匹配了大胆的零件。

Ideally the algorithm would match only the bold parts.

有没有什么建议的方法 - 或者是那里的问题，即使现有的算法？我应该尝试近似常规EX pressions以上统计的东西的基础上punctation痕迹，长度等号码？

Is there any recommended approach - or are there even existing algorithms for that problem? Should I try approximate regular expressions or more statistical stuff based on number of punctation marks, length and so on?

算法来匹配邮件自然的文本 [英] Algorithm to match natural text in mail

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

算法来匹配邮件自然的文本 [英] Algorithm to match natural text in mail

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭