从邀请文本中检测时间、日期和地点的算法 [英] algorithm to detect time, date and place from invitation text

查看:47
本文介绍了从邀请文本中检测时间、日期和地点的算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究一些自然语言处理算法来阅读一段文本,如果文本似乎试图提出会议请求,它会自动为您设置该会议.

I am researching some Natural Language Processing algorithms to read a piece of text, and if the text seems to be trying to suggest a meeting request, it sets up that meeting for you automatically.

例如,如果电子邮件文本如下:

For example, if an email text reads:

让我们明天见面晚上 7 点在市中心的某个地方.

算法应该能够检测事件的时间、日期和地点.

The algorithm should be able to detect the Time, date and place of the event.

有人知道我可以用于此目的的一些现有 NLP 算法吗?我一直在研究一些 NLP 资源(例如 NLTKR 中的一些工具),但没有取得多大成功.

Does someone know of some already existing NLP algorithms that I could use for this purpose? I have been researching some NLP resources (like NLTK and some tools in R), but did not have much success.

谢谢

推荐答案

这是一个信息抽取的应用,并且可以更具体地解决序列分割算法,如隐马尔可夫模型 (HMM) 或条件随机场 (CRF).

This is an application of information extraction, and can be solved more specifically with sequence segmentation algorithms like hidden Markov models (HMMs) or conditional random fields (CRFs).

对于软件实施,您可能希望从 UMass-Amherst 的 MALLET 工具包开始,它是一个流行的库,它实现了用于信息提取的 CRF.

For a software implementation, you might want to start with the MALLET toolkit from UMass-Amherst, it's a popular library that implements CRFs for information extraction.

您会将句子中的每个标记视为用您感兴趣的字段(或x"表示上述任何一个都不是)标记的东西,作为单词特征的函数(如词性、大写、字典)会员资格等)...类似这样:

You would treat each token in a sentence as something to be labeled with the fields you are interested in (or 'x' for none of the above), as a function of word features (like part of speech, capitalization, dictionary membership, etc.)... something like this:

token       label       features
-----------------------------------
Let         x           POS=NNP, capitalized
's          x           POS=POS
meet        x           POS=VBP
tomorrow    DATE        POS=NN, inDateDictionary
someplace   x           POS=NN
in          x           POS=IN
Downtown    LOCATION    POS=NN, capitalized
at          x           POS=IN
7pm         TIME        POS=CD, matchesTimeRegex
.           x           POS=.

不过,您需要先提供一些手工标记的训练数据.

You will need to provide some hand-labeled training data first, though.

这篇关于从邀请文本中检测时间、日期和地点的算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆