使用SPARQL查询与字符串的最佳匹配？ [英] Query for best match to a string with SPARQL?

查看：462 发布时间：2020/10/19 3:02:53 string search sparql dbpedia

本文介绍了使用SPARQL查询与字符串的最佳匹配？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个包含电影标题的列表，想在 DBpedia 中查找有关导演的元信息。。但是我很难用SPARQL识别正确的电影，因为标题有时不完全匹配。

I have a list with movie titles and want to look these up in DBpedia for meta information like "director". But I have trouble to identify the correct movie with SPARQL, because the titles sometimes don't exactly match.

如何获得 best 使用SPARQL匹配DBpedia的电影标题吗？

How can I get the best match for a movie title from DBpedia using SPARQL?

一些有问题的示例：

我的清单： Die Hard：复仇 vs. DBpedia： Die Hard带着复仇

我的清单： Hachi vs. DBpedia ： Hachi：狗的故事

我当前的方法是查询 DBpedia端点（针对所有电影），然后通过检查单个标记（不带标点符号）进行过滤，按标题排序并返回第一个结果。例如：

My current approach is to query the DBpedia endpoint for all movies and then filter by checking for single tokens (without punctuations), order by title and return the first result. E.g.:

SELECT ?resource ?title ?director WHERE {
   ?resource foaf:name ?title .
   ?resource rdf:type schema:Movie .
   ?resource dbo:director ?director .
   FILTER (
      contains(lcase(str(?title)), "die") && 
      contains(lcase(str(?title)),"hard")
   )
}
ORDER BY (?title)
LIMIT 1

此方法非常慢，有时也失败，例如：

This approach is very slow and also sometimes fails, e.g.:

SELECT ?resource ?title ?director WHERE {
   ?resource foaf:name ?title .
   ?resource rdf:type schema:Movie .
   ?resource dbo:director ?director .
   FILTER (
      contains(lcase(str(?title)), "hachi") 
   )
}
ORDER BY (?title)
LIMIT 10

其中正确的结果排在第二位：

where the correct result is on second place:

  resource                                          title                        director
  http://dbpedia.org/resource/Chachi_420            "Chachi 420"@en              http://dbpedia.org/resource/Kamal_Haasan
  http://dbpedia.org/resource/Hachi:_A_Dog's_Tale   "Hachi: A Dog's Tale"@en     http://dbpedia.org/resource/Lasse_Hallström    
  http://dbpedia.org/resource/Hachiko_Monogatari    "Hachikō Monogatari"@en      http://dbpedia.org/resource/Seijirō_Kōyama
  http://dbpedia.org/resource/Thachiledathu_Chundan "Thachiledathu Chundan"@en   http://dbpedia.org/resource/Shajoon_Kariyal

有什么想法可以解决这个问题吗？甚至更好：通常如何使用SPARQL查询与字符串的最佳匹配？

Any ideas how to solve this problem? Or even better: How to query for best matches to a string with SPARQL in general?

谢谢！

推荐答案

我修改了评论中提到的正则表达式方法，并提出了一个效果很好的解决方案，比我使用bif：contains所能获得的任何优势都要好：

I adapted the regex-approach mentioned in the comments and came up with a solution that works pretty well, better than anything I could get with bif:contains:

   SELECT ?resource ?title ?match strlen(str(?title)) as ?lenTitle strlen(str(?match)) as ?lenMatch

   WHERE {
      ?resource foaf:name ?title .
      ?resource rdf:type schema:Movie .
      ?resource dbo:director ?director .
      bind( replace(LCASE(CONCAT('x',?title)), "^x(die)*(?:.*?(hard))*(?:.*?(with))*.*$", "$1$2$3") as ?match ) 
   }

   ORDER BY DESC(?lenMatch) ASC(?lenTitle)

   LIMIT 5

这并不完美，因此我仍然愿意征求建议。

It's not perfect, so I'm still open for suggestions.

这篇关于使用SPARQL查询与字符串的最佳匹配？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用SPARQL查询与字符串的最佳匹配？ [英] Query for best match to a string with SPARQL?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用SPARQL查询与字符串的最佳匹配？ [英] Query for best match to a string with SPARQL?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭