选择DBpedia资源中摘要中至少出现N个字词吗? [英] Select DBpedia resource with at least N occurrences of seleted word in abstract?

查看:88
本文介绍了选择DBpedia资源中摘要中至少出现N个字词吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个要求,导致产生了一些DBpedia资源及其摘要。我该如何过滤结果以仅获取摘要中至少包含一定数量的特定单词出现的资源?

  PREFIX rdf:< http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
PREFIX dbpedia-owl:< http://www.dbpedial.org/>
PREFIX rdfs:< http://www.w3.org/2000/01/rdf-schema#>
PREFIX泡沫:< http://xmlns.com/foaf/0.1/>

选择不同的?resource?url?resume,其中{
?resource rdfs:label?Nom。
?resource foaf:isPrimaryTopicOf?url。
资源dbo:抽象简历。
FILTER langMatches(lang(?Nom), EN)
FILTER langMatches(lang(?resume), EN)
?Nom< bif:contains> 苹果。
}

这是没有绑定功能的新请求:

 选择(strlen(replace(replace(Lcase(?resume),'Jobs','_'),'[^ _]','') )as?nbr)?resource?url 
其中{
?resource rdfs:label?Nom。
?resource foaf:isPrimaryTopicOf?url。
资源dbo:抽象简历。
FILTER langMatches(lang(?Nom), EN)
FILTER langMatches(lang(?resume), EN)
?Nom< bif:contains> Apple。}
GROUP BY?Nom
具有(?nbr> = 1)


解决方案

这不是绝对完美,但对于您要完成的工作,它应该相对较好。您可以使用替换将您要计数的单词的所有实例替换为单个字符(例如 _)。然后,您可以再次使用 replace 将该字符除 以外的所有内容替换为空字符串。然后,您将得到一个类似于 ______的字符串,其中长度是单词在字符串中出现的次数。例如,这是一个查询,该查询在摘要中对 the进行计数,仅保留那些至少出现 the五次的查询。





< {
values?x {dbr:Horse dbr:Cat dbr:Dog}
?x dbo: select?x?n抽象?abs
过滤器langMatches(lang(?abs),'en')
bind(strlen(replace(replace(?abs,'\\thes's','_ '),'[[^ _]',''))as?nThe)
过滤器(?nThe> = 5)
}

SPARQL结果


I have this request that results some DBpedia resources and their abstracts. How can I filter the results to get just the resources whose abstracts contain at least a certain number of occurrences of a particular word?

PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbpedia-owl:<http://www.dbpedial.org/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

select distinct ?resource ?url ?resume where {
   ?resource rdfs:label ?Nom.
   ?resource foaf:isPrimaryTopicOf ?url.
   ?resource dbo:abstract ?resume.
   FILTER langMatches( lang(?Nom), "EN" )
   FILTER langMatches( lang(?resume), "EN" )
   ?Nom <bif:contains> "apple".             
}  

This is the new request without Bind function:

select (strlen(replace(replace(Lcase(?resume), 'Jobs', '_'),'[^_]', '')) as ?nbr )  ?resource ?url 
where {
?resource rdfs:label ?Nom.
   ?resource foaf:isPrimaryTopicOf ?url.
   ?resource dbo:abstract ?resume.
FILTER langMatches( lang(?Nom), "EN" )    
FILTER langMatches( lang(?resume), "EN" )
?Nom <bif:contains> "Apple".}
GROUP BY ?Nom
Having(?nbr >= 1)      

解决方案

This won't be absolutely perfect, but it should work relatively well for what you're trying to accomplish. You can use replace to replace all the instances of the word you want to count with some single character (e.g., '_'). Then you can use replace again to replace everything except that character with the empty string. Then, you have a string like '______', where the length is the number of times that the word appeared in the string. For instance, here's a query that counts 'the' in the abstract, and keeps only those where 'the' appears at least five times.

select ?x ?nThe {
  values ?x { dbr:Horse dbr:Cat dbr:Dog }
  ?x dbo:abstract ?abs 
  filter langMatches(lang(?abs),'en')
  bind(strlen(replace(replace(?abs, '\\sthe\\s', '_'),'[^_]', '')) as ?nThe)
  filter (?nThe >= 5)
}

SPARQL results

这篇关于选择DBpedia资源中摘要中至少出现N个字词吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆