地理标记或地理标记文本内容的方法 [英] Methods for Geotagging or Geolabelling Text Content

查看:26
本文介绍了地理标记或地理标记文本内容的方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有什么好的算法可以用城市/地区或来源自动标记文本?也就是说,如果博客是关于纽约的,我如何以编程方式讲述.是否有任何包裹/文件声称可以肯定地做到这一点?

What are some good algorithms for automatically labeling text with the city / region or origin? That is, if a blog is about New York, how can I tell programatically. Are there packages / papers that claim to do this with any degree of certainty?

我已经研究了一些基于 tfidf 的方法、专有名词交叉点,但到目前为止,还没有取得惊人的成功,我很感激想法!

I have looked at some tfidf based approaches, proper noun intersections, but so far, no spectacular successes, and I'd appreciate ideas!

给定一些主题列表,更一般的问题是将文本分配给主题.

The more general question is about assigning texts to topics, given some list of topics.

简单/幼稚的方法比完整的贝叶斯方法更受欢迎,但我是开放的.

Simple / naive approaches preferred to full on Bayesian approaches, but I'm open.

推荐答案

您正在寻找 命名实体识别系统,或简称NER.有 几个 工具包可以帮助您.LingPipe 尤其有一个非常不错的教程.CAGEclass 在地理地名上似乎是面向 NER 的,但我还没有使用过.

You're looking for a named entity recognition system, or short NER. There are several good toolkits available to help you out. LingPipe in particular has a very decent tutorial. CAGEclass seems to be oriented around NER on geographical place names, but I haven't used it yet.

这是关于 NER 使用地名的困难的博客条目.

Here's a nice blog entry about the difficulties of NER with geographical places names.

如果您要使用 Java,我建议您使用 LingPipe NER 类.OpenNLP 也有一些,但前者有更好的文档.

If you're going with Java, I'd recommend using the LingPipe NER classes. OpenNLP also has some, but the former has a better documentation.

如果您正在寻找一些理论背景,Chavez 等人.(2005) 构建了一个有趣的 Syntem 并将其记录在案.

If you're looking for some theoretical background, Chavez et al. (2005) have constructed an interesting syntem and documented it.

这篇关于地理标记或地理标记文本内容的方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆