从文本中提取位置的方法? [英] Methods for extracting locations from text?

本文介绍了从文本中提取位置的方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从自由文本中提取位置的推荐方法有哪些?

What are the recommended methods for extracting locations from free text?

我能想到的是使用正则表达式规则,例如位置中的单词".但是有没有比这更好的方法了?

What I can think of is to use regex rules like "words ... in location". But are there better approaches than this?

我还可以考虑使用具有国家和城市名称的查找哈希表,然后将文本中提取的每个令牌与哈希表的令牌进行比较.

Also I can think of having a lookup hash table table with names for countries and cities and then compare every extracted token from the text to that of the hash table.

有人知道更好的方法吗?

Does anybody know of better approaches?

我正在尝试从推文文本中提取位置.因此,发推数过多的问题也可能会影响我对方法的选择.

I'm trying to extract locations from tweets text. So the issue of high number of tweets might also affect my choice for a method.

推荐答案

所有基于规则的方法都将失败(如果您的文字确实是免费的").其中包括正则表达式,上下文无关的语法,任何形式的查找...相信我,我以前来过这里:-)

All rule-based approaches will fail (if your text is really "free"). That includes regex, context-free grammars, any kind of lookup... Believe me, I've been there before :-)

此问题称为命名实体识别.位置是研究最多的3门课程之一(包括人员和组织). Stanford NLP具有非常强大的开源Java实现: http://nlp.stanford.edu/software/CRF -NER.shtml

This problem is called Named Entity Recognition. Location is one of the 3 most studied classes (with Person and Organization). Stanford NLP has an open source Java implementation that is extremely powerful: http://nlp.stanford.edu/software/CRF-NER.shtml

您可以轻松找到其他编程语言的实现.

You can easily find implementations in other programming languages.

这篇关于从文本中提取位置的方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆