2个数据集的Fuseki配置+文本索引:如何使用乌龟文件? [英] Fuseki config for 2 datasets + text index : how to use turtle files?

查看:67
本文介绍了2个数据集的Fuseki配置+文本索引:如何使用乌龟文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是不熟悉fuseki的,想要为我们的项目使用2个TDB数据集:一个用于我们自己的数据,一个用于大数据集(168M三元组,从 http://data.bnf.fr ).

I'm new to fuseki and want to use 2 TDB datasets for our project : a small one for our own data, and a large one (168 M triples, imported data from http://data.bnf.fr).

我们需要为数据建立索引,因为使用"FILTER(CONTAINS())"进行的SPARQL查询不适用于大型数据集("BnF_text"). 因此,在这篇文章之后,我为"BnF_text"构建了文本索引:

We need to index the data because SPARQL queries using "FILTER(CONTAINS())" don't work on the large dataset ("BnF_text"). Therefore, I've built a text index for "BnF_text", following this post : Fuseki indexed (Lucene) text search returns no results (but I had to modify the turtle config file to get the text:query working).

它可以工作,但是我遇到了一个与"BnF_text"有关的奇怪问题:同一查询有时会返回超时,并且我看不到在融合日志或apache日志中找到错误.

It works, but I've encountered a strange problem with "BnF_text" : from time to time, the same query returns a timeout, and I can't see find error in fuseki logs nor apache logs.

~~~~~~~~这是我的问题:~~~~~~~

~~~~~~~ Here are my questions : ~~~~~~~

  • 我的配置文件有问题吗?
  • 两个数据集的共存会影响性能吗?

~~~~~~~~这是我的安装细节:~~~~~~~

~~~~~~~ Here are the details of my installation : ~~~~~~~

  • 修改了脚本fuseki-server中的Java内存限制:设置为--Xmx4000M.
  • SPARQL查询通过PHP EasyRDF库发送
  • 我有2个配置文件:$FUSEKI_PATH/text_config.ttl + $FUSEKI_PATH/run/configuration/MY_DATASET.ttl
  • 我使用以下命令运行fuseki-server:./fuseki-server --config text_config.ttl
  • modified Java memory limit in script fuseki-server : set to --Xmx4000M .
  • SPARQL queries are sent via PHP EasyRDF library
  • I have 2 config files : $FUSEKI_PATH/text_config.ttl + $FUSEKI_PATH/run/configuration/MY_DATASET.ttl
  • I run fuseki-server with this command : ./fuseki-server --config text_config.ttl

配置文件

1)text_config.ttl

1) text_config.ttl

@prefix :        <#> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix text:    <http://jena.apache.org/text#> .
@prefix fuseki:  <http://jena.apache.org/fuseki#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix dcterms: <http://purl.org/dc/terms/> .

## Initialize TDB --------------------------------

[] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
tdb:GraphTDB    rdfs:subClassOf  ja:Model .

## Initialize text query -------------------------------------
[] ja:loadClass       "org.apache.jena.query.text.TextQuery" .
# A TextDataset is a regular dataset with a text index.
text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
# Lucene index
text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .

## ---------------------------------------------------------------
## This URI must be fixed - it's used to assemble the text dataset.

:text_dataset rdf:type     text:TextDataset ;

    text:dataset :tdb_dataset_readwrite ;
    text:index     <#indexLucene> ;
    .

# A TDB datset used for RDF storage ------------------------------
:tdb_dataset_readwrite                    # <= EDIT : instead of <#dataset>  
        a             tdb:DatasetTDB ;
        tdb:location  "TDB_PATH" ;
.

# Text index description ------------------------------------------
<#indexLucene> a text:TextIndexLucene ;
    text:directory <file:LUCENE_PATH> ;
    text:entityMap <#entMap> ;
    text:storeValues true ;
    .

# Mapping in the index ---------------------------------------------
# URI stored in field "uri" 
<#entMap> a text:EntityMap ;
    text:entityField      "uri" ;
    text:defaultField     "text" ;
    text:map (
         [ text:field "text" ; text:predicate dcterms:title ]
         [ text:field "text" ; text:predicate foaf:familyName ]
         [ text:field "text" ; text:predicate foaf:name ]
         ) .

# Fuseki services (http) --------------------------------------------- 

# EDIT : added following lines

:service_tdb_all  a                   fuseki:Service ;
        rdfs:label                    "TDB BnF_text" ;
        fuseki:dataset                :text_dataset ; ### 
        fuseki:name                   "BnF_text" ;
        fuseki:serviceQuery           "query" , "sparql" ;
        fuseki:serviceReadGraphStore  "get" ;
        fuseki:serviceReadWriteGraphStore " .

2)MY_DATASET.ttl

2) MY_DATASET.ttl

@prefix :      <http://base/#> .
@prefix tdb:   <http://jena.hpl.hp.com/2008/tdb#> .
@prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix ja:    <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
@prefix fuseki: <http://jena.apache.org/fuseki#> .

:service_tdb_all  a                   fuseki:Service ;
        rdfs:label                    "TDB MY_DATASET" ;
        fuseki:dataset                :tdb_dataset_readwrite ;
        fuseki:name                   "MY_DATASET" ;
        fuseki:serviceQuery           "query" , "sparql" ;
        fuseki:serviceReadGraphStore  "get" ;
        fuseki:serviceReadWriteGraphStore
                "data" ;
        fuseki:serviceUpdate          "update" ;
        fuseki:serviceUpload          "upload" .

:tdb_dataset_readwrite
        a             tdb:DatasetTDB ;
        tdb:location  "MY_DATASET_TDB_PATH" .

预先感谢

推荐答案

感谢安迪(Andy),您是对的.问题来自EasyRDF,而不是Fuseki.我找到了: https://groups.google.com/d/msg/skosmos-users/WhtZwnsxOFs/MtAocr8vDgAJ ,因此更改了vendor/easyrdf/easyrdf/lib/EasyRdf/Http/Client.php中的超时时间,现在看来一切正常.我将再进行一些测试,然后尝试将问题标记为已解决.

Thanks Andy, you were right. The problem came from EasyRDF and not from Fuseki. I found this : https://groups.google.com/d/msg/skosmos-users/WhtZwnsxOFs/MtAocr8vDgAJ , so changed timeout in vendor/easyrdf/easyrdf/lib/EasyRdf/Http/Client.php, and everything seems to be ok now. I'm going to make a few more tests and then try to mark the question as solved.

一切似乎现在都可以了" = EasyRdf_Exception中的超时"消息已消失

'everything seems to be ok now' = the "timeout" message from EasyRdf_Exception has disappeared

这篇关于2个数据集的Fuseki配置+文本索引:如何使用乌龟文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆