苹果如何找到电子邮件中的日期,时间和地址? [英] How does Apple find dates, times and addresses in emails?

查看:569
本文介绍了苹果如何找到电子邮件中的日期,时间和地址?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在iOS电子邮件客户端中,当电子邮件中包含日期,时间或位置时,文本变为超链接,并且只需点击链接即可创建约会或查看地图.它不仅适用于英语的电子邮件,还适用于其他语言的电子邮件.我喜欢这个功能,并且想了解他们是如何做到的.

In the iOS email client, when an email contains a date, time or location, the text becomes a hyperlink and it is possible to create an appointment or look at a map simply by tapping the link. It not only works for emails in English, but in other languages also. I love this feature and would like to understand how they do it.

天真的方法是拥有许多正则表达式并全部运行它们.但是我无法很好地扩展,只能用于特定的语言或日期格式等.我认为Apple必须使用某种机器学习的概念来提取实体(8:00 PM、8PM、8:00, 0800、20:00、20h,20h00、2000等).

The naive way to do this would be to have many regular expressions and run them all. However I this is not going to scale very well and will work for only a specific language or date format, etc. I think that Apple must be using some concept of machine learning to extract entities (8:00PM, 8PM, 8:00, 0800, 20:00, 20h, 20h00, 2000 etc.).

您知道Apple如何能够在其电子邮件客户端中如此快速地提取实体吗?您将应用哪种机器学习算法来完成此类任务?

Any idea how Apple is able to extract entities so quickly in its email client? What machine learning algorithm would you to apply accomplish such task?

推荐答案

他们可能使用信息提取技巧.

这是斯坦福大学的SUTime工具的演示:

Here is a demo of Stanford's SUTime tool:

http://nlp.stanford.edu:8080/sutime/process

您将在文档中提取有关n-gram(连续词)的属性:

You would extract attributes about n-grams (consecutive words) in a document:

  • numberOfLetters
  • numberOfSymbols
  • 长度
  • previousWord
  • nextWord
  • nextWordNumberOfSymbols
    ...
  • numberOfLetters
  • numberOfSymbols
  • length
  • previousWord
  • nextWord
  • nextWordNumberOfSymbols
    ...

然后使用分类算法,并向其提供正面和负面的示例:

And then use a classification algorithm, and feed it positive and negative examples:

Observation  nLetters  nSymbols  length  prevWord  nextWord isPartOfDate  
"Feb."       3         1         4       "Wed"     "29th"   TRUE  
"DEC"        3         0         3       "company" "went"   FALSE  
...

您可能会忽略每个例子的50个,但是越多越好.然后,该算法将基于这些示例进行学习,并将其应用于以前从未见过的未来示例.

You might get away with 50 examples of each, but the more the merrier. Then, the algorithm learns based on those examples, and can apply to future examples that it hasn't seen before.

它可能会学习

  • 如果先前的单词只是字符,也许是句点...
  • 现在的单词在"february","mar.","the" ...
  • 下一个单词在第十二个"中,any_number ...
  • 那是日期

这是Google工程师针对此主题的体面的视频

Here is a decent video by a Google engineer on the subject

这篇关于苹果如何找到电子邮件中的日期,时间和地址?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆