Apple 如何在电子邮件中查找日期、时间和地址? [英] How does Apple find dates, times and addresses in emails?

查看:66
本文介绍了Apple 如何在电子邮件中查找日期、时间和地址?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 iOS 电子邮件客户端中,当电子邮件包含日期、时间或地点时,文本将变成超链接,只需点击链接即可创建约会或查看地图.它不仅适用于英语电子邮件,也适用于其他语言.我喜欢这个功能,想了解他们是如何做到的.

In the iOS email client, when an email contains a date, time or location, the text becomes a hyperlink and it is possible to create an appointment or look at a map simply by tapping the link. It not only works for emails in English, but in other languages also. I love this feature and would like to understand how they do it.

最简单的方法是使用许多正则表达式并将它们全部运行.但是我这不会很好地扩展并且只适用于特定的语言或日期格式等.我认为 Apple 必须使用机器学习的一些概念来提取实体(8:00PM、8PM、8:00,0800、20:00、20h、20h00、2000 等).

The naive way to do this would be to have many regular expressions and run them all. However I this is not going to scale very well and will work for only a specific language or date format, etc. I think that Apple must be using some concept of machine learning to extract entities (8:00PM, 8PM, 8:00, 0800, 20:00, 20h, 20h00, 2000 etc.).

知道 Apple 如何在其电子邮件客户端中如此快速地提取实体吗?你会应用什么机器学习算法来完成这样的任务?

Any idea how Apple is able to extract entities so quickly in its email client? What machine learning algorithm would you to apply accomplish such task?

推荐答案

他们可能使用 信息提取 技术.

They likely use Information Extraction techniques for this.

这是斯坦福大学的 SUTime 工具的演示:

Here is a demo of Stanford's SUTime tool:

http://nlp.stanford.edu:8080/sutime/process

您将在文档中提取有关 n-gram(连续词)的属性:

You would extract attributes about n-grams (consecutive words) in a document:

  • 字母数
  • numberOfSymbols
  • 长度
  • 上一个词
  • 下一个词
  • nextWordNumberOfSymbols
    ...

然后使用分类算法,并给它提供正反例:

And then use a classification algorithm, and feed it positive and negative examples:

Observation  nLetters  nSymbols  length  prevWord  nextWord isPartOfDate  
"Feb."       3         1         4       "Wed"     "29th"   TRUE  
"DEC"        3         0         3       "company" "went"   FALSE  
...

您可能会得到 50 个示例,但越多越好.然后,该算法根据这些示例进行学习,并可以应用于它以前从未见过的未来示例.

You might get away with 50 examples of each, but the more the merrier. Then, the algorithm learns based on those examples, and can apply to future examples that it hasn't seen before.

它可能会学习诸如

  • 如果前一个单词只是字符和句点...
  • 当前词在二月"、三月"、该"...
  • 下一个单词在第十二个"中,any_number ...
  • 然后是日期

这是 Google 工程师关于该主题的不错的视频

Here is a decent video by a Google engineer on the subject

这篇关于Apple 如何在电子邮件中查找日期、时间和地址?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆