如何使用 SPARQL 查找相似内容 [英] How to find similar content using SPARQL
问题描述
我正在尝试使用 SPARQL 来识别事物之间的概念重叠.
I'm playing with the idea of using SPARQL to identify conceptual overlap between things.
以电影为例(LinkedMDB 数据),如果我有一部电影黑客帝国"并且我的目标是列出与该电影相似的电影,我可能会从以下开始:
Take movies for example (LinkedMDB data), if I have a movie, "The Matrix" and my goal is to list movies that are similar to that movie, I would probably start by doing the following:
- 矩阵
- 获取流派
- 获取演员
- 找导演
- 获取位置
- 等
然后使用我在矩阵中确定的事物,我将查询具有这些属性的事物(伪查询)
And then using the things I identified in the matrix, I would query for things with those properties (pseudo-query)
SELECT movie, genre, director, location, actors WHERE { genre is action or sci-fi . director are the Wachowski brothers . location is set in a big city . OPTIONAL( actors were in the matrix . ) }
SPARQL 中是否有允许我检查不同节点之间属性重叠的内容?还是必须像我建议的那样手动完成?
Is there something in SPARQL that allows me to check for overlap of properties between different nodes? Or must this be done manually like I've proposed?
推荐答案
匹配一些特定的属性
听起来你是在要求类似
Matching some specific properties
It sounds like you're asking for something along the lines of
select ?similarMovie ?genre ?director ?location ?actor where { values ?movie { <http://.../TheMatrix> } ?genre ^:hasGenre ?movie, ?similarMovie . ?director ^:hasDirectory ?movie, ?similarMovie . ?location ^:hasLocation ?movie, ?similarMovie . optional { ?actor ^:hasActor ?movie, ?similarMovie . }
使用后向路径符号
^
和对象列表使其比:That uses the backwards path notation
^
and object lists to make it much shorter than:select ?similarMovie ?genre ?director ?location ?actor where { values ?movie { <http://.../TheMatrix> } ?movie :hasGenre ?genre . ?movie :hasDirector ?director . ?movie :hasLocation ?location . ?similarMovie :hasGenre ?genre . ?similarMovie :hasDirector ?director . ?similarMovie :hasLocation ?location . optional { ?movie :hasActor ?actor . ?similarMovie :hasActor ?actor . } }
例如,使用 DBpedia,我们可以获得与 The Matrix 具有相同发行商和摄影师的其他电影:
For instance, using DBpedia, we can get other films that have the same distributor and cinematographer as The Matrix:
select ?similar ?cinematographer ?distributor where { values ?movie { dbpedia:The_Matrix } ?cinematographer ^dbpprop:cinematography ?movie, ?similar . ?distributor ^dbpprop:distributor ?movie, ?similar . } limit 10
结果都在同一个特许经营范围内;您将获得:The Matrix、The Matrix Reloaded、The Matrix Revolutions、The Matrix(特许经营)和 The Ultimate Matrix Collection.
The results are all within that same franchise; you get: The Matrix, The Matrix Reloaded, The Matrix Revolutions, The Matrix (franchise), and The Ultimate Matrix Collection.
也可以要求至少具有一些共同属性的事物.两个事物在被认为相似之前需要具有多少共同点显然是主观的,这取决于特定的数据,并且需要一些实验.例如,我们可以使用如下查询在 DBpedia 上查询至少有 35 个与矩阵相同的属性的电影:
It's also possible to ask for things that have at least some number of properties in common. How many properties two things need to have in common before they should be considered similar is obviously subjective, will depend on the particular data, and will need some experimentation. For instance, we can ask for Films on DBpedia that have at least 35 properties in common with the Matrix with a query like this:
select ?similar where { values ?movie { dbpedia:The_Matrix } ?similar ?p ?o ; a dbpedia-owl:Film . ?movie ?p ?o . } group by ?similar ?movie having count(?p) > 35
这给出了 13 部电影(包括黑客帝国和特许经营中的其他电影):
This gives 13 movies (including the Matrix and the other movies in the franchise):
- V字仇杀队(电影)
- 黑客帝国
- 邮差(电影)
- 行政决定
- 入侵(电影)
- 拆迁人(电影)
- 黑客帝国(特许经营)
- 重装上阵的黑客帝国
- Freejack
- 出口伤口
- 矩阵革命
- 爆发(电影)
- Speed Racer(电影)
使用这种方法,您甚至可以使用共同属性的数量来衡量相似性.例如:
Using this kind of approach, you could even use the number of common properties as a measure of similarity. For instance:
select ?similar (count(?p) as ?similarity) where { values ?movie { dbpedia:The_Matrix } ?similar ?p ?o ; a dbpedia-owl:Film . ?movie ?p ?o . } group by ?similar ?movie having count(?p) > 35 order by desc(?similarity)
The Matrix 206 The Matrix Revolutions 63 The Matrix Reloaded 60 The Matrix (franchise) 55 Demolition Man (film) 41 Speed Racer (film) 40 V for Vendetta (film) 38 The Invasion (film) 38 The Postman (film) 36 Executive Decision 36 Freejack 36 Exit Wounds 36 Outbreak (film) 36
这篇关于如何使用 SPARQL 查找相似内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!