从邀请文本中检测时间,日期和地点的算法 [英] algorithm to detect time, date and place from invitation text

查看:105
本文介绍了从邀请文本中检测时间,日期和地点的算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究一些自然语言处理算法来阅读一段文本,如果该文本似乎在试图建议会议要求,它将自动为您设置该会议.

I am researching some Natural Language Processing algorithms to read a piece of text, and if the text seems to be trying to suggest a meeting request, it sets up that meeting for you automatically.

例如,如果电子邮件文本显示为:

For example, if an email text reads:

让我们明天市区晚上7点的某个地方".

Let's meet tomorrow someplace in Downtown at 7pm".

该算法应该能够检测事件的时间,日期和地点.

The algorithm should be able to detect the Time, date and place of the event.

有人知道我可以为此目的使用一些已经存在的NLP算法吗?我一直在研究一些NLP资源(例如 NLTK

Does someone know of some already existing NLP algorithms that I could use for this purpose? I have been researching some NLP resources (like NLTK and some tools in R), but did not have much success.

谢谢

推荐答案

这是信息提取,并且可以使用隐马尔可夫模型(HMM)或条件随机场(CRF)等序列分割算法进行更具体的解决.

This is an application of information extraction, and can be solved more specifically with sequence segmentation algorithms like hidden Markov models (HMMs) or conditional random fields (CRFs).

对于软件实施,您可能要从UMass-Amherst的 MALLET工具包开始,这是一个流行的库,它实现了用于信息提取的CRF.

For a software implementation, you might want to start with the MALLET toolkit from UMass-Amherst, it's a popular library that implements CRFs for information extraction.

您将把句子中的每个标记视为要用您感兴趣的字段标记的内容(或以上都不是'x'),作为单词特征的函数(例如词性,大写字母,字典)成员资格等).类似这样的内容:

You would treat each token in a sentence as something to be labeled with the fields you are interested in (or 'x' for none of the above), as a function of word features (like part of speech, capitalization, dictionary membership, etc.)... something like this:

token       label       features
-----------------------------------
Let         x           POS=NNP, capitalized
's          x           POS=POS
meet        x           POS=VBP
tomorrow    DATE        POS=NN, inDateDictionary
someplace   x           POS=NN
in          x           POS=IN
Downtown    LOCATION    POS=NN, capitalized
at          x           POS=IN
7pm         TIME        POS=CD, matchesTimeRegex
.           x           POS=.

不过,您将需要首先提供一些带有手工标记的培训数据.

You will need to provide some hand-labeled training data first, though.

这篇关于从邀请文本中检测时间,日期和地点的算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆