为jena和lucene构建全文搜索索引 [英] building fulltext search index for jena and lucene

查看:354
本文介绍了为jena和lucene构建全文搜索索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想用lucene和jena对dbpedia的一个子集(我有一个tdb存储库)进行全文搜索。

  String TDBDirectory =path; 
数据集数据集= TDBFactory.createDataset(TDBDirectory);

但是不是所有的资源都只有标题。我认为通过仅对需要的三元组进行索引,我可以执行更快的搜索。例如:

 < http://de.dbpedia.org/resource/Gurke> < HTTP://www.w3.org/2000/01/rdf-schema#label> Gurke@de。 

在这里,我想搜索Gurke,但不是在其他任何三元组中搜索#label属性。
所以我的问题是如何建立索引和搜索#label属性只有三倍?
我已经看过 http://jena.sourceforge.net/ARQ/ lucene-arq.html ,但它对我来说不够详细或太难。

解决方案

http://jena.sourceforge.net/ 是耶拿的老家 - 该项目现在是 http://jena.apache.org/ (你是如何找到那个旧页面的?)



该项目最近推出了LARQ的替代品。

http://jena.apache.org/documentation/query/text-query.html



和这现在是主要代码库的一部分。它将随着2.10.2版本发布 - 目前,您必须使用 https://repository.apache.org/content/repositories/snapshots/org/apache/jena/ 。您需要使用Fuseki或将其添加为项目的依赖项。



这个新的文本搜索子系统在TDB和Fuseki中效果更好。


I would like to perform a full text search on a subset of dbpedia (which i have in a tdb store) with lucene and jena.

String TDBDirectory = "path" ;
Dataset dataset = TDBFactory.createDataset(TDBDirectory) ;

But not over all resources, only over titles. I think by making indices only over the needed triples I can perform a faster search. E.g.

<http://de.dbpedia.org/resource/Gurke> <http://www.w3.org/2000/01/rdf-schema#label> "Gurke"@de .

Here I would like to search for "Gurke", but not in any other triples than the ones with the #label property. So my question is how do I build indices and search only triples with the #label property? I have already looked at http://jena.sourceforge.net/ARQ/lucene-arq.html but it's not detailed enough or too difficult for me.

解决方案

http://jena.sourceforge.net/ is the old home for Jena -- the project is now http://jena.apache.org/ (how did you managed to find that old page?)

The project recently introduced a replacement for LARQ.

http://jena.apache.org/documentation/query/text-query.html

and this is now part of the main codebase. It will released with the 2.10.2 release - for the moment you must use the development build from https://repository.apache.org/content/repositories/snapshots/org/apache/jena/. You either need to be using Fuseki or add it as a dependency for your project.

This new text search subsystem works much better with TDB and Fuseki.

这篇关于为jena和lucene构建全文搜索索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆