SPARQL完全聚合上的组聚合 [英] SPARQL full aggregation on a group aggregation

查看:110
本文介绍了SPARQL完全聚合上的组聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个本体论,用户可以使用五个谓词之一来表达他们对某项物品的满意程度.

本体包含特定谓词,这些谓词具有名为hasSimilarityValue的属性.

我正在尝试执行以下操作:

  1. 拥有一个用户,例如rs:ania
  2. 提取该用户之前已评分的所有项目. (这很容易,因为本体已经包含了从用户到项目的三重内容)
  3. 将相似项提取到步骤2中已提取的项,并计算它们的相似度. (这里我们使用自己的方法来计算相似度).但是问题是:从第2步开始,我们已经为用户评分了很多项目,从第2步开始,我们提取并计算了与第2步中得出的这些项目相似的项目.因此,第3步中的某个项目可能很相似删除第2步中的两个(或多个)项目.因此,我们得出以下结果:

    用户:ania评分为x1的项目 用户:ania额定物品x2 项y由y1类似于x1 y y2与y2相似 z项与z1相似,与x1相似

y1,y2和z1是介于0和1之间的值

问题是我们需要将这些值归一化,以了解项y和项z的最终相似性.

标准化很简单,只需按项目分组并除以最大项目数

所以要知道与y的相似性,我应该做(y1 + y2/2)

要知道与z的相似性,我应该做(z1/2)

我的问题

如您所见,我需要对项目进行计数,然后知道此计数的最大值

这是查询所有内容的查询,没有归一化部分

select  ?s  (sum(?weight * ?factor) as ?similarity) ( sum(?weight * ?factor * ?ratings) as ?similarityWithRating) (count(distinct ?x) as ?countOfItemsUsedInDeterminingTheSimilarities) where {

    values (?user) { (rs:ania) }
    values (?ratingPredict) {(rs:ratedBy4Stars)  (rs:ratedBy5Stars)}
    ?user ?ratingPredict ?x.
    ?ratingPredict rs:hasRatingValue ?ratings.
    {
      ?s ?p ?o .
      ?x ?p ?o .
      bind(4/7 as ?weight)
    }
    union
    {
      ?s ?a ?b . ?b ?p ?o .
      ?x ?c ?d . ?d ?p ?o .
      bind(1/7 as ?weight)
    }
    ?p rs:hasSimilarityValue ?factor .
      filter (?s != ?x)
  }
  group by ?s

order by ?s

结果是:

现在我需要将每一行除以计数列的最大值

我建议的解决方案是重复精确查询两次,一次获取相似性,一次获取最大值,然后将它们加入,然后进行除法(归一化).它正在工作,但是很难看,因为我要重复两次相同的查询,所以性能会很糟糕.这是愚蠢的解决方案,我想请你们提供更好的解决方案

这是我的愚蠢解决方案

 PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rs: <http://www.musicontology.com/rs#>
PREFIX pdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

#select 
#?s   ?similarityWithRating (max(?countOfItemsUsedInDeterminingTheSimilarities) as ?maxNumberOfItemsUsedInDeterminingTheSimilarities)
#where {
 # {
select ?s ?similarity ?similarityWithRating ?countOfItemsUsedInDeterminingTheSimilarities ?maxCountOfItemsUsedInDeterminingTheSimilarities ?finalSimilarity where {
{
select  ?s  (sum(?weight * ?factor) as ?similarity) ( sum(?weight * ?factor * ?ratings) as ?similarityWithRating) (count(distinct ?x) as ?countOfItemsUsedInDeterminingTheSimilarities) where {

    values (?user) { (rs:ania) }
    values (?ratingPredict) {(rs:ratedBy4Stars)  (rs:ratedBy5Stars)}
    ?user ?ratingPredict ?x.
    ?ratingPredict rs:hasRatingValue ?ratings.
    {
      ?s ?p ?o .
      ?x ?p ?o .
      bind(4/7 as ?weight)
    }
    union
    {
      ?s ?a ?b . ?b ?p ?o .
      ?x ?c ?d . ?d ?p ?o .
      bind(1/7 as ?weight)
    }
    ?p rs:hasSimilarityValue ?factor .
      filter (?s != ?x)
  }
  group by ?s
#}

#}
#group by ?s 
order by ?s
} #end first part
{
select (Max(?countOfItemsUsedInDeterminingTheSimilarities) as ?maxCountOfItemsUsedInDeterminingTheSimilarities) where {
select  ?s  (sum(?weight * ?factor) as ?similarity) ( sum(?weight * ?factor * ?ratings) as ?similarityWithRating) (count(distinct ?x) as ?countOfItemsUsedInDeterminingTheSimilarities) where {

    values (?user) { (rs:ania) }
    values (?ratingPredict) {(rs:ratedBy4Stars)  (rs:ratedBy5Stars)}
    ?user ?ratingPredict ?x.
    ?ratingPredict rs:hasRatingValue ?ratings.
    {
      ?s ?p ?o .
      ?x ?p ?o .
      bind(4/7 as ?weight)
    }
    union
    {
      ?s ?a ?b . ?b ?p ?o .
      ?x ?c ?d . ?d ?p ?o .
      bind(1/7 as ?weight)
    }
    ?p rs:hasSimilarityValue ?factor .
      filter (?s != ?x)
  }
  group by ?s
#}

#}
#group by ?s 
order by ?s
}
}#end second part
  bind (?similarityWithRating/?maxCountOfItemsUsedInDeterminingTheSimilarities as ?finalSimilarity)
}
order by desc(?finalSimilarity)

最后

如果您想自己尝试一下,这里是数据. http://www.mediafire.com/view/r4qlu3uxijs4y30/musicontology

解决方案

如果您可以在这些示例中提供 minimum 的数据,这对真的很有帮助.这意味着没有不需要我们解决问题的东西的数据,并且这要尽可能地简单.我认为如何创建最小,完整和可验证的示例对于您的堆栈溢出问题可能很有用.

无论如何,这里有一些简单数据足以供我们使用.有两个用户对数据进行了评级和相似性.注意,我指出了相似之处.您可能希望它们是双向的,但这并不是该问题的主要部分.

 @prefix : <urn:ex:>

:user1 :rated :a , :b .

:user2 :rated :b , :c , :d .

:a :similarTo [ :piece :c ; :value 0.1 ] ,
              [ :piece :d ; :value 0.2 ] .

:b :similarTo [ :piece :d ; :value 0.3 ] ,
              [ :piece :e ; :value 0.4 ] .

:c :similarTo [ :piece :e ; :value 0.5 ] ,
              [ :piece :f ; :value 0.6 ] .

:d :similarTo [ :piece :f ; :value 0.7 ] ,
              [ :piece :g ; :value 0.8 ] .
 

现在,查询只需要检索用户及其已评级的作品,以及相似的作品和实际的相似度值.现在,如果按用户和相似作品进行分组,最终将得到一个具有一个相似作品,一个用户以及一堆额定作品及其与相似作品的相似性的组.由于所有相似度等级都在固定范围(0,1)中,因此您可以对它们进行平均以得到整体相似度.在此查询中,我还添加了 group_concat 来显示相似度值基于哪些额定值.

 prefix : <urn:ex:>

select
    ?user
    (group_concat(?piece) as ?ratedPieces)
    ?similarPiece
    (avg(?similarity_) as ?similarity)
where {
  #-- Find ?pieces that ?user has rated.
  ?user :rated ?piece .

  #-- Find other pieces (?similarPiece) that are
  #-- similar to ?piece, along with the
  #-- similarity value (?similarity_)
  ?piece :similarTo [ :piece ?similarPiece ; :value ?similarity_ ] .
}
group by ?user ?similarPiece
 

 ------------------------------------------------------------
| user   | ratedPieces         | similarPiece | similarity |
============================================================
| :user1 | "urn:ex:a"          | :c           | 0.1        | ; a-c[0.1]
| :user1 | "urn:ex:b urn:ex:a" | :d           | 0.25       | ; b-d[0.3], a-d[0.2]
| :user1 | "urn:ex:b"          | :e           | 0.4        | ; b-e[0.4]
| :user2 | "urn:ex:b"          | :d           | 0.3        | ; b-d[0.3]
| :user2 | "urn:ex:c urn:ex:b" | :e           | 0.45       | ; c-e[0.5], b-e[0.4]
| :user2 | "urn:ex:d urn:ex:c" | :f           | 0.65       | ; d-f[0.7], c-f[0.6]
| :user2 | "urn:ex:d"          | :g           | 0.8        | ; d-g[0.8]
------------------------------------------------------------
 

I have an Ontology where users can use one of five predicates to express how much they like an item.

The Ontology contains specific predicates that have a property called hasSimilarityValue.

I am trying to do the following:

  1. Having a user let's say rs:ania
  2. Extract all the items that this user has rated before. (this is easy because the Ontology already contains triple from the user to the items)
  3. Extract similary items to the items that have been extracted in step 2 and calculate their similarities. (here we are using our own approach to calculate the similarites ). However the issue is: from step 2, we have many items the user has rated, from step there we are extracting and calculating similar items to these items that came from step 2. So, it is possible that an item in step 3 is similar to two (or more) items from step 2. Thus we end up with the following:

    user :ania rated item x1 user :ania rated item x2 item y is similar by y1 to x1 item y is similar by y2 to x2 item z is similar by z1 to x1

y1, y2, and z1 are values between 0 and 1

the thing is that we need to normalize these values to know the final similarities for item y and item z.

the normalization is simple, just group by the item and divide by the maximum number of items

so to know the similarity with y, i should do (y1+y2/2)

to know the similarity with z, i should do (z1/2)

my problem

as you see, i need to count the items and then know the max of this count

this is the query that calculates everything without the normalization part

select  ?s  (sum(?weight * ?factor) as ?similarity) ( sum(?weight * ?factor * ?ratings) as ?similarityWithRating) (count(distinct ?x) as ?countOfItemsUsedInDeterminingTheSimilarities) where {

    values (?user) { (rs:ania) }
    values (?ratingPredict) {(rs:ratedBy4Stars)  (rs:ratedBy5Stars)}
    ?user ?ratingPredict ?x.
    ?ratingPredict rs:hasRatingValue ?ratings.
    {
      ?s ?p ?o .
      ?x ?p ?o .
      bind(4/7 as ?weight)
    }
    union
    {
      ?s ?a ?b . ?b ?p ?o .
      ?x ?c ?d . ?d ?p ?o .
      bind(1/7 as ?weight)
    }
    ?p rs:hasSimilarityValue ?factor .
      filter (?s != ?x)
  }
  group by ?s

order by ?s

the result is:

now I need to divide each row by the maximum of the count column,

my proposed solution is to repeat the exact query twice, once to get the similarities and once to get the max and then join them and then do the divide (normalization). it is working but it is ugly, the performance will be disaster because i am repeating the same query twice. it is stupid solution and i would like to ask you guys for a better one please

here is my stupid solutions

 PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rs: <http://www.musicontology.com/rs#>
PREFIX pdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

#select 
#?s   ?similarityWithRating (max(?countOfItemsUsedInDeterminingTheSimilarities) as ?maxNumberOfItemsUsedInDeterminingTheSimilarities)
#where {
 # {
select ?s ?similarity ?similarityWithRating ?countOfItemsUsedInDeterminingTheSimilarities ?maxCountOfItemsUsedInDeterminingTheSimilarities ?finalSimilarity where {
{
select  ?s  (sum(?weight * ?factor) as ?similarity) ( sum(?weight * ?factor * ?ratings) as ?similarityWithRating) (count(distinct ?x) as ?countOfItemsUsedInDeterminingTheSimilarities) where {

    values (?user) { (rs:ania) }
    values (?ratingPredict) {(rs:ratedBy4Stars)  (rs:ratedBy5Stars)}
    ?user ?ratingPredict ?x.
    ?ratingPredict rs:hasRatingValue ?ratings.
    {
      ?s ?p ?o .
      ?x ?p ?o .
      bind(4/7 as ?weight)
    }
    union
    {
      ?s ?a ?b . ?b ?p ?o .
      ?x ?c ?d . ?d ?p ?o .
      bind(1/7 as ?weight)
    }
    ?p rs:hasSimilarityValue ?factor .
      filter (?s != ?x)
  }
  group by ?s
#}

#}
#group by ?s 
order by ?s
} #end first part
{
select (Max(?countOfItemsUsedInDeterminingTheSimilarities) as ?maxCountOfItemsUsedInDeterminingTheSimilarities) where {
select  ?s  (sum(?weight * ?factor) as ?similarity) ( sum(?weight * ?factor * ?ratings) as ?similarityWithRating) (count(distinct ?x) as ?countOfItemsUsedInDeterminingTheSimilarities) where {

    values (?user) { (rs:ania) }
    values (?ratingPredict) {(rs:ratedBy4Stars)  (rs:ratedBy5Stars)}
    ?user ?ratingPredict ?x.
    ?ratingPredict rs:hasRatingValue ?ratings.
    {
      ?s ?p ?o .
      ?x ?p ?o .
      bind(4/7 as ?weight)
    }
    union
    {
      ?s ?a ?b . ?b ?p ?o .
      ?x ?c ?d . ?d ?p ?o .
      bind(1/7 as ?weight)
    }
    ?p rs:hasSimilarityValue ?factor .
      filter (?s != ?x)
  }
  group by ?s
#}

#}
#group by ?s 
order by ?s
}
}#end second part
  bind (?similarityWithRating/?maxCountOfItemsUsedInDeterminingTheSimilarities as ?finalSimilarity)
}
order by desc(?finalSimilarity)

Finally

Here is the data if you want to try it yourself. http://www.mediafire.com/view/r4qlu3uxijs4y30/musicontology

解决方案

It's really helpful if you can provide data to work with in these examples that's minimal. That means data that doesn't have stuff we don't need in order to solve the problem, and that is pretty much as simple as possible. I think that How to create a Minimal, Complete, and Verifiable example might be useful for your Stack Overflow questions.

Anyhow, here's some simple data that should be enough for us to work with. There are two users who have made some ratings, and some similarities in the data. Note that I made the similarities directed; you'd probably want them to be bidirectional, but that's not really the main part of this problem.

@prefix : <urn:ex:>

:user1 :rated :a , :b .

:user2 :rated :b , :c , :d .

:a :similarTo [ :piece :c ; :value 0.1 ] ,
              [ :piece :d ; :value 0.2 ] .

:b :similarTo [ :piece :d ; :value 0.3 ] ,
              [ :piece :e ; :value 0.4 ] .

:c :similarTo [ :piece :e ; :value 0.5 ] ,
              [ :piece :f ; :value 0.6 ] .

:d :similarTo [ :piece :f ; :value 0.7 ] ,
              [ :piece :g ; :value 0.8 ] .

Now, the query just needs to retrieve a user and the pieces that they've rated, along with similar pieces and the actual similarity values. Now, if you group by the user and the similar piece, you end up with a groups that have a single similar piece, a single user, and a bunch of rated pieces and their similarity to the similar piece. Since all the similarity ratings are in a fixed range (0,1), you can just average them to get overall similarity. In this query, I've also added a group_concat to show which rated pieces the similarity value is based on.

prefix : <urn:ex:>

select
    ?user
    (group_concat(?piece) as ?ratedPieces)
    ?similarPiece
    (avg(?similarity_) as ?similarity)
where {
  #-- Find ?pieces that ?user has rated.
  ?user :rated ?piece .

  #-- Find other pieces (?similarPiece) that are
  #-- similar to ?piece, along with the
  #-- similarity value (?similarity_)
  ?piece :similarTo [ :piece ?similarPiece ; :value ?similarity_ ] .
}
group by ?user ?similarPiece

------------------------------------------------------------
| user   | ratedPieces         | similarPiece | similarity |
============================================================
| :user1 | "urn:ex:a"          | :c           | 0.1        | ; a-c[0.1]
| :user1 | "urn:ex:b urn:ex:a" | :d           | 0.25       | ; b-d[0.3], a-d[0.2]
| :user1 | "urn:ex:b"          | :e           | 0.4        | ; b-e[0.4]
| :user2 | "urn:ex:b"          | :d           | 0.3        | ; b-d[0.3]
| :user2 | "urn:ex:c urn:ex:b" | :e           | 0.45       | ; c-e[0.5], b-e[0.4]
| :user2 | "urn:ex:d urn:ex:c" | :f           | 0.65       | ; d-f[0.7], c-f[0.6]
| :user2 | "urn:ex:d"          | :g           | 0.8        | ; d-g[0.8]
------------------------------------------------------------

这篇关于SPARQL完全聚合上的组聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆