为jena和lucene构建全文搜索索引 [英] building fulltext search index for jena and lucene

查看：354 发布时间：2018/4/16 16:33:51 lucene indexing full-text-search jena

本文介绍了为jena和lucene构建全文搜索索引的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想用lucene和jena对dbpedia的一个子集（我有一个tdb存储库）进行全文搜索。

  String TDBDirectory =path; 
数据集数据集= TDBFactory.createDataset（TDBDirectory）;

但是不是所有的资源都只有标题。我认为通过仅对需要的三元组进行索引，我可以执行更快的搜索。例如：

 < http：//de.dbpedia.org/resource/Gurke> < HTTP：//www.w3.org/2000/01/rdf-schema#label> Gurke@de。

在这里，我想搜索Gurke，但不是在其他任何三元组中搜索#label属性。
所以我的问题是如何建立索引和搜索#label属性只有三倍？
我已经看过 http://jena.sourceforge.net/ARQ/ lucene-arq.html ，但它对我来说不够详细或太难。

解决方案

http://jena.sourceforge.net/ 是耶拿的老家 - 该项目现在是 http://jena.apache.org/ （你是如何找到那个旧页面的？）

该项目最近推出了LARQ的替代品。

http://jena.apache.org/documentation/query/text-query.html

和这现在是主要代码库的一部分。它将随着2.10.2版本发布 - 目前，您必须使用 https://repository.apache.org/content/repositories/snapshots/org/apache/jena/ 。您需要使用Fuseki或将其添加为项目的依赖项。

这个新的文本搜索子系统在TDB和Fuseki中效果更好。

I would like to perform a full text search on a subset of dbpedia (which i have in a tdb store) with lucene and jena.

String TDBDirectory = "path" ;
Dataset dataset = TDBFactory.createDataset(TDBDirectory) ;

But not over all resources, only over titles. I think by making indices only over the needed triples I can perform a faster search. E.g.

<http://de.dbpedia.org/resource/Gurke> <http://www.w3.org/2000/01/rdf-schema#label> "Gurke"@de .

Here I would like to search for "Gurke", but not in any other triples than the ones with the #label property. So my question is how do I build indices and search only triples with the #label property? I have already looked at http://jena.sourceforge.net/ARQ/lucene-arq.html but it's not detailed enough or too difficult for me.

解决方案

http://jena.sourceforge.net/ is the old home for Jena -- the project is now http://jena.apache.org/ (how did you managed to find that old page?)

The project recently introduced a replacement for LARQ.

http://jena.apache.org/documentation/query/text-query.html

and this is now part of the main codebase. It will released with the 2.10.2 release - for the moment you must use the development build from https://repository.apache.org/content/repositories/snapshots/org/apache/jena/. You either need to be using Fuseki or add it as a dependency for your project.

This new text search subsystem works much better with TDB and Fuseki.

这篇关于为jena和lucene构建全文搜索索引的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

为jena和lucene构建全文搜索索引 [英] building fulltext search index for jena and lucene

问题描述

相关文章

其他数据库最新文章

热门教程

热门工具

登录关闭

为jena和lucene构建全文搜索索引 [英] building fulltext search index for jena and lucene

问题描述

相关文章

其他数据库最新文章

热门教程

热门工具

登录 关闭

登录关闭