方法地理标记或Geolabelling文本内容 [英] Methods for Geotagging or Geolabelling Text Content

查看:99
本文介绍了方法地理标记或Geolabelling文本内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有什么好的算法,用于与该城市/地区或原产地标记自动文本?也就是说,如果一个博客是纽约,我怎么能告诉编程。是否有包/论文,声称有任何把握做到这一点?

What are some good algorithms for automatically labeling text with the city / region or origin? That is, if a blog is about New York, how can I tell programatically. Are there packages / papers that claim to do this with any degree of certainty?

我也看了一些基于TFIDF方法,专有名词的十字路口,但到目前为止,没有任何引人注目的成功,而且我AP preciate的想法!

I have looked at some tfidf based approaches, proper noun intersections, but so far, no spectacular successes, and I'd appreciate ideas!

在更普遍的问题是有关分配课文题目,题目给出了一些列表。

The more general question is about assigning texts to topics, given some list of topics.

pferred到全贝叶斯方法简单/幼稚的方法$ P $,但我开放。

Simple / naive approaches preferred to full on Bayesian approaches, but I'm open.

推荐答案

您正在寻找一个名为实体识别系统,或短的净入学率。有几个 工具包可以帮助你。 LingPipe特别是有一个非常体面的教程 CAGEclass 似乎是面向各地NER地理地名,但我还没有使用它。

You're looking for a named entity recognition system, or short NER. There are several good toolkits available to help you out. LingPipe in particular has a very decent tutorial. CAGEclass seems to be oriented around NER on geographical place names, but I haven't used it yet.

这里的的困难一个不错的博客条目的净入学率与地理地名。

Here's a nice blog entry about the difficulties of NER with geographical places names.

如果你打算使用Java,我建议使用LingPipe NER类。 OpenNLP也有一些,但前者有一个更好的文档

If you're going with Java, I'd recommend using the LingPipe NER classes. OpenNLP also has some, but the former has a better documentation.

如果你正在寻找一些理论背景,查韦斯等人。 (2005年)构建了一个有趣的syntem并记录它。

If you're looking for some theoretical background, Chavez et al. (2005) have constructed an interesting syntem and documented it.

这篇关于方法地理标记或Geolabelling文本内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆