为jena和lucene构建全文搜索索引 [英] building fulltext search index for jena and lucene
问题描述
我想用lucene和jena对dbpedia的一个子集(我有一个tdb存储库)进行全文搜索。
String TDBDirectory =path;
数据集数据集= TDBFactory.createDataset(TDBDirectory);
但是不是所有的资源都只有标题。我认为通过仅对需要的三元组进行索引,我可以执行更快的搜索。例如:
< http://de.dbpedia.org/resource/Gurke> < HTTP://www.w3.org/2000/01/rdf-schema#label> Gurke@de。
在这里,我想搜索Gurke,但不是在其他任何三元组中搜索#label属性。
所以我的问题是如何建立索引和搜索#label属性只有三倍?
我已经看过 http://jena.sourceforge.net/ARQ/ lucene-arq.html ,但它对我来说不够详细或太难。
http://jena.sourceforge.net/ 是耶拿的老家 - 该项目现在是 http://jena.apache.org/ (你是如何找到那个旧页面的?)
该项目最近推出了LARQ的替代品。
http://jena.apache.org/documentation/query/text-query.html
和这现在是主要代码库的一部分。它将随着2.10.2版本发布 - 目前,您必须使用 https://repository.apache.org/content/repositories/snapshots/org/apache/jena/ 。您需要使用Fuseki或将其添加为项目的依赖项。
这个新的文本搜索子系统在TDB和Fuseki中效果更好。
I would like to perform a full text search on a subset of dbpedia (which i have in a tdb store) with lucene and jena.
String TDBDirectory = "path" ;
Dataset dataset = TDBFactory.createDataset(TDBDirectory) ;
But not over all resources, only over titles. I think by making indices only over the needed triples I can perform a faster search. E.g.
<http://de.dbpedia.org/resource/Gurke> <http://www.w3.org/2000/01/rdf-schema#label> "Gurke"@de .
Here I would like to search for "Gurke", but not in any other triples than the ones with the #label property. So my question is how do I build indices and search only triples with the #label property? I have already looked at http://jena.sourceforge.net/ARQ/lucene-arq.html but it's not detailed enough or too difficult for me.
http://jena.sourceforge.net/ is the old home for Jena -- the project is now http://jena.apache.org/ (how did you managed to find that old page?)
The project recently introduced a replacement for LARQ.
http://jena.apache.org/documentation/query/text-query.html
and this is now part of the main codebase. It will released with the 2.10.2 release - for the moment you must use the development build from https://repository.apache.org/content/repositories/snapshots/org/apache/jena/. You either need to be using Fuseki or add it as a dependency for your project.
This new text search subsystem works much better with TDB and Fuseki.
这篇关于为jena和lucene构建全文搜索索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!