如何使用 SPARQL 查找相似内容 [英] How to find similar content using SPARQL

查看:50
本文介绍了如何使用 SPARQL 查找相似内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 SPARQL 来识别事物之间的概念重叠.

I'm playing with the idea of using SPARQL to identify conceptual overlap between things.

以电影为例(LinkedMDB 数据),如果我有一部电影黑客帝国"并且我的目标是列出与该电影相似的电影,我可能会从以下开始:

Take movies for example (LinkedMDB data), if I have a movie, "The Matrix" and my goal is to list movies that are similar to that movie, I would probably start by doing the following:

  • 矩阵
    • 获取流派
    • 获取演员
    • 找导演
    • 获取位置

    然后使用我在矩阵中确定的事物,我将查询具有这些属性的事物(伪查询)

    And then using the things I identified in the matrix, I would query for things with those properties (pseudo-query)

    SELECT movie, genre, director, location, actors
    WHERE {
      genre is action or sci-fi .
    
      director are the Wachowski brothers .
    
      location is set in a big city .
    
      OPTIONAL( actors were in the matrix . )
    }
    

    SPARQL 中是否有允许我检查不同节点之间属性重叠的内容?还是必须像我建议的那样手动完成?

    Is there something in SPARQL that allows me to check for overlap of properties between different nodes? Or must this be done manually like I've proposed?

    推荐答案

    匹配一些特定的属性

    听起来你是在要求类似

    Matching some specific properties

    It sounds like you're asking for something along the lines of

    select ?similarMovie ?genre ?director ?location ?actor where { 
      values ?movie { <http://.../TheMatrix> }
      ?genre   ^:hasGenre ?movie, ?similarMovie .
      ?director ^:hasDirectory ?movie, ?similarMovie .
      ?location ^:hasLocation ?movie, ?similarMovie .
      optional { ?actor ^:hasActor ?movie, ?similarMovie .
    }
    

    使用后向路径符号^和对象列表使其比:

    That uses the backwards path notation ^ and object lists to make it much shorter than:

    select ?similarMovie ?genre ?director ?location ?actor where { 
      values ?movie { <http://.../TheMatrix> }
      ?movie        :hasGenre    ?genre .
      ?movie        :hasDirector ?director .
      ?movie        :hasLocation ?location .
      ?similarMovie :hasGenre    ?genre .
      ?similarMovie :hasDirector ?director .
      ?similarMovie :hasLocation ?location .
      optional { 
        ?movie        :hasActor ?actor .
        ?similarMovie :hasActor ?actor .
      }
    }
    

    例如,使用 DBpedia,我们可以获得与 The Matrix 具有相同发行商和摄影师的其他电影:

    For instance, using DBpedia, we can get other films that have the same distributor and cinematographer as The Matrix:

    select ?similar ?cinematographer ?distributor where {
      values ?movie { dbpedia:The_Matrix }
      ?cinematographer ^dbpprop:cinematography ?movie, ?similar .
      ?distributor ^dbpprop:distributor ?movie, ?similar .
    }
    limit 10
    

    SPARQL 结果

    结果都在同一个特许经营范围内;您将获得:The Matrix、The Matrix Reloaded、The Matrix Revolutions、The Matrix(特许经营)和 The Ultimate Matrix Collection.

    The results are all within that same franchise; you get: The Matrix, The Matrix Reloaded, The Matrix Revolutions, The Matrix (franchise), and The Ultimate Matrix Collection.

    也可以要求至少具有一些共同属性的事物.两个事物在被认为相似之前需要具有多少共同点显然是主观的,这取决于特定的数据,并且需要一些实验.例如,我们可以使用如下查询在 DBpedia 上查询至少有 35 个与矩阵相同的属性的电影:

    It's also possible to ask for things that have at least some number of properties in common. How many properties two things need to have in common before they should be considered similar is obviously subjective, will depend on the particular data, and will need some experimentation. For instance, we can ask for Films on DBpedia that have at least 35 properties in common with the Matrix with a query like this:

    select ?similar where { 
      values ?movie { dbpedia:The_Matrix }
      ?similar ?p ?o ; a dbpedia-owl:Film .
      ?movie   ?p ?o .
    }
    group by ?similar ?movie
    having count(?p) > 35
    

    SPARQL 结果

    这给出了 13 部电影(包括黑客帝国和特许经营中的其他电影):

    This gives 13 movies (including the Matrix and the other movies in the franchise):

    • V字仇杀队(电影)
    • 黑客帝国
    • 邮差(电影)
    • 行政决定
    • 入侵(电影)
    • 拆迁人(电影)
    • 黑客帝国(特许经营)
    • 重装上阵的黑客帝国
    • Freejack
    • 出口伤口
    • 矩阵革命
    • 爆发(电影)
    • Speed Racer(电影)

    使用这种方法,您甚至可以使用共同属性的数量来衡量相似性.例如:

    Using this kind of approach, you could even use the number of common properties as a measure of similarity. For instance:

    select ?similar (count(?p) as ?similarity) where { 
      values ?movie { dbpedia:The_Matrix }
      ?similar ?p ?o ; a dbpedia-owl:Film .
      ?movie   ?p ?o .
    }
    group by ?similar ?movie
    having count(?p) > 35
    order by desc(?similarity)
    

    SPARQL 结果

    The Matrix             206
    The Matrix Revolutions  63
    The Matrix Reloaded     60
    The Matrix (franchise)  55
    Demolition Man (film)   41
    Speed Racer (film)      40
    V for Vendetta (film)   38
    The Invasion (film)     38
    The Postman (film)      36
    Executive Decision      36
    Freejack                36
    Exit Wounds             36
    Outbreak (film)         36
    

    这篇关于如何使用 SPARQL 查找相似内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆