如何使用DBPedia从内容中提取标签/关键字? [英] How to use DBPedia to extract Tags/Keywords from content?

查看:223
本文介绍了如何使用DBPedia从内容中提取标签/关键字?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在探索如何使用Wikipedia的分类信息从内容中提取标签/关键字。



我找到了有关DBPedia的文章。 DBpedia是社区的一项工作,旨在从Wikipedia中提取结构化信息,并使该信息在Web上可用。



有人使用过他们的Web服务吗?您知道它们的工作原理和可靠性吗?

解决方案

DBpedia 优质资源。但是,为了将您的内容转变为一组相关的DBpedia概念,您将需要在文本中准确识别它们,这至少涉及两个步骤:


  1. 识别内容中的DBpedia概念:这包括识别文本中的概念名称(和替代名称),并在每个短语的所有可能含义之间进行歧义消除。根据其歧义消除页面,术语 Sun可能指代数十种可能的概念。包括星号,报纸,人名等。这涉及实体的标识,分类和链接。


  2. 标识哪些概念很有趣:例如,您是否要在文本包含 the一词时显示定冠词的​​概念( The 重定向到)?


您可能想考虑一个预先存在的文本分析库或服务,该库或服务支持将实体链接到DBpedia 。 毛伊岛是一种很好的主题索引工具,它是由 Alyona Medelyan 在攻读博士学位期间。另一个很棒的开源解决方案是同一所大学的David Milne的 Wikipedia Miner 。 / p>

提供与DBpedia概念链接的两个商业服务是 Zemanta 提取 (允许一定程度的免费使用)。 DBpedia聚光灯选项。其他可能提供这些功能的列表在: https://stackoverflow.com/问题/ 2119279 /是一个比opencalais更好的工具



披露:我[曾经]在Extractiv工作(已停业),由语言计算机公司的NLP提供支持。


I am exploring how I can use Wikipedia's taxonomy information to extract Tags/Keywords from my content.

I found articles about DBPedia. DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web.

Has anyone used their web services? Do you know how they work and how reliable it is?

解决方案

DBpedia is a fantastic, high quality resource. In order to turn your content into a set of relevant DBpedia concepts, however, you will need to accurately identify them in your text, which involves at least two steps:

  1. Identify DBpedia concepts in your content: This includes recognizing concept names (and alternate names) in text, and also disambiguating among all possible meanings of each phrase. The term "Sun" may refer to dozens of possible concepts according to its disambiguation page including a star, newspapers, person names, etc. This involves entity identification, classification, and linking.

  2. Identify which of those concepts are interesting: For example, do you want the concept "Definite article" showing up when text includes the term "the" (which The redirects to)?

You may want to consider a preexisting text analytics library or service, which supports entity linking to DBpedia. One great tool for topic indexing is Maui, which was developed by Alyona Medelyan during her PhD. Another great open source solution is Wikipedia Miner by David Milne at the same university.

Two commercial services which provide linking to DBpedia concepts are Zemanta and Extractiv (allow some level of free use). DBpedia spotlight option. Others which may provide these capabilities are listed at: https://stackoverflow.com/questions/2119279/is-there-a-better-tool-than-opencalais

Disclosure: I [used to] work at Extractiv (defunct), which is powered by Language Computer Corporation's NLP.

这篇关于如何使用DBPedia从内容中提取标签/关键字?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆