在 dotNETRDF 中使用 SPARQL 列表 - 列表的交集 [英] Working with SPARQL lists in dotNETRDF - intersection of lists

查看:55
本文介绍了在 dotNETRDF 中使用 SPARQL 列表 - 列表的交集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 dotNetRDF,但很难理解如何使用提供的列表助手.

I'm using dotNetRDF and am having a hard time understanding how to use the provided list helpers.

目前我没有使用列表,只有一个这样的项目:

Currently I'm not using a list, just one item like so:

 paramString.SetParameter("nickname", g.CreateLiteralNode(nicknameString));
 paramString.CommandText =
            @"INSERT DATA 
            { 
                data:person_1 app:nickname @nickname.                   
            }";

但现在我需要考虑多个昵称:

But now I need to account for multiple nicknames:

 //doesn't work with array, and there's no "CreateListNode()" 
 //paramString.SetParameter("nicknames", g.CreateLiteralNode(nicknamesArray)); 
 paramString.CommandText =
            @"INSERT DATA 
            { 
                data:person_1 app:nicknames @nicknames.                   
            }";

稍后我需要查询以检查 2 个列表是否相交:

Later I need to query to check if 2 lists intersect:

queryString.CommandText =
            @"SELECT ?personWithSameNickname WHERE { 

                data:person_1 app:nicknames ?nicknames.

                #here I need to get people that have at 
                #least one nickname in common with data:person_1, 
                #aka at least one intersection in their nickname lists
                ?personWithSameNickname app:nicknames ?nicknames. 
            }";         

我还需要按交叉点数量排序的结果,以便最佳匹配在顶部.

I also need the results ordered by the number of intersections so the best match is on top.

我如何才能完成上述任务?我只找到了 this 对列表的引用,但我可以'因为我使用的是 SPARQL,所以不太理解.

How can I accomplish the above? I only found this reference to lists but I can't quite make sense of it since I'm using SPARQL.

推荐答案

关于数据建模的说明

那么首先,您确定当您谈论列表时,您一定要使用 RDF 列表吗?区别很重要,因为它会改变数据的形状以及您完成任务的方式.

A note on Data Modelling

So firstly are you sure that when you talk about lists you necessarily intend RDF lists? The distinction is important because it changes the shape of the data and how you accomplish things.

RDF 列表是将值连接在一起的有序空白节点序列,例如

An RDF list is an ordered sequence of blank nodes that connect values together e.g.

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix : <http://example.org/> .

:root :values [ rdf:first "a" ;
                rdf:rest [ rdf:first "b" ;
                           rdf:rest [ rdf:first "c" ;
                                      rdf:reset rdf:nil ] ] ] .

正如你所看到的,它在三元组方面有很多开销,我们当然可以像这样简化 Turtle 中的语法:

As you can see it has a lot of overhead in terms of triples, we can of course simplify the syntax in Turtle like so:

@prefix : <http://example.org/> .

:root :value ( "a" "b" "c" ) .

这相当于第一个示例,它只是隐藏了 Turtle 解析器将创建的显式三元组将遇到此语法.dotNetRDF 中包含的列表扩展专门用于处理 RDF 列表.

This is equivalent to the first example it is just hiding the explicit triples that a Turtle parser will create will encountering this syntax. The list extensions included in dotNetRDF are specifically intending for working with RDF lists.

而您所说的列表可能只是与属性相关的一组值,例如

Whereas perhaps what you mean by a list is just some set of values associated with a property e.g.

@prefix : <http://example.org/> .

:root :value "a" ;
      :value "b" ;
      :value "c" .

正如您所看到的,这实际上只是说明了几个三元组,每个三元组都说明了该属性的值.在 Turtle 中,我们可以使用 , 语法进一步简化,以避免重复谓词:

As you can see this is literally just stating several triples each of which states a value for the property. In Turtle we can simplify this further using the , syntax to avoid repeating the predicate:

@prefix : <http://example.org/> .

:root :value "a" , "b" , "c" .

这种方法的缺点是,由于 RDF 图是三元组的无序集合,因此无法保留值的顺序或重复值.如果您需要订单或重复,那么您将需要使用 RDF 列表方法.

The downside of this approach is that since RDF graphs are unordered sets of triples neither the order of values or duplicate values can be preserved. If you need either order or duplicates then you will need to use the RDF list approach.

我的其余答案将展示如何使用任一数据建模方法来做事.

The rest of my answer will show how to do things using either data modelling approach.

你如何做到这一点取决于你是想要一个 RDF 列表还是只是一些值,你说得对,dotNetRDF 在处理构建参数化 SPARQL 时没有任何内置支持来处理这些事情.

How you do this depends on whether you want a RDF list or just a number of values, you are quite right that dotNetRDF does not have any built in support for handling such things when dealing with building parameterised SPARQL.

如果你想要一个 RDF 列表,那么你需要编写你的模板,以便它可以包含必要数量的项目,例如

If you want a RDF list then you would need to write your template such that it can take the necessary number of items e.g.

paramString.CommandText =
        @"INSERT DATA 
        { 
            data:person_1 app:nicknames [ rdf:first @nick1 ;
                                          rdf:rest [ rdf:first @nick2 ;
                                                      rdf:rest rdf:nil ] ] .                   
        }";
paramString.SetParameter("nick1", "Rob");
paramString.SetParameter("nick2", "Bob");

并且您显然可以根据需要扩展此模式以处理更短/更长的列表.显然,这需要用户进行大量工作,因此如果这是您的需要,那么我们当然可以考虑在未来版本中为用户添加一项功能.

And you obviously can extend this pattern to deal with shorter/longer lists as necessary. Clearly this requires a lot of work on the part of the user so if this is what you need then we can certainly look at adding a feature to do this for users in future releases.

如果您只是插入多个值,您可以使用单个三元组模板,只需依次插入每个参数并执行它,例如

If you are just inserting several values either you can use a single triple template and simply insert each parameter in turn and execute it e.g.

paramString.CommandText =
        @"INSERT DATA 
        { 
            data:person_1 app:nicknames @nickname.                   
        }";
foreach (String nick : nicknames)
{
   paramString.SetParameter("nickname", nick);
   // Execute the update
}

或者您可以将模板更改为每个昵称的三元组,这里我再次使用 , 语法以避免重复主语和谓词:

Or you can change your template to have a triple for each nickname, here I use the , syntax again to avoid repeating the subject and predicate:

paramString.CommandText =
        @"INSERT DATA 
        { 
            data:person_1 app:nicknames @nick1 , @nick2 .                   
        }";
paramString.SetParameter("nick1", "Rob");
paramString.SetParameter("nick2", "Bob");

与 RDF 列表方法一样,您可以根据需要将此模式扩展到更多/更少的列表项.同样,如果这是您希望 dotNetRDF 为您做的事情,我们可以考虑在未来版本中添加它.

Like the RDF lists approach you can extend this pattern to more/less list items as necessary. Again if this is something you would prefer to have dotNetRDF do for you we can look at adding it in future releases.

对于 RDF 列表方法:

For the RDF lists approach:

queryString.CommandText =
        @"SELECT ?personWithSameNickname WHERE { 
            data:person_1 app:nicknames [ rdf:rest*/rdf:first ?nicknames ].
            ?personWithSameNickname app:nicknames [ rdf:rest*/rdf:first ?nicknames ] .
            FILTER(!SAMETERM(data:person_1, ?personWithSameNickname))
        }"; 

本质上,您只需为您的起始节点选择所有昵称,然后为所有人做同样的事情,并依靠 SPARQL 连接语义为我们找到交集.

Essentially you just select all the nicknames for your starting node and then do the same for all persons and rely on SPARQL join semantics to find us the intersection.

注意使用rdf:rest*/rdf:first 属性路径遍历RDF 列表的所有值节点以提取实际昵称.此外,由于起始节点将与自身相交,我们在 FILTER 中使用 !SAMETERM(data:person_1, ?personWithSameNickname) 来消除自身的匹配,但是你可以这样做在代码中,如果你想避免 FILTER

Note the usage of the rdf:rest*/rdf:first property path to traverse to all the value nodes of the RDF list in order to extract the actual nicknames. Also since the starting node will intersect with itself we use a !SAMETERM(data:person_1, ?personWithSameNickname) in a FILTER to eliminate the match on itself however you could do this in code if you prefer to avoid the FILTER

如果您只是使用多重三元组方法,查询就更简单了:

If you are just using the multiple triple approach the query is even simpler:

queryString.CommandText =
        @"SELECT ?personWithSameNickname WHERE { 
            data:person_1 app:nicknames ?nicknames .
            ?personWithSameNickname app:nicknames ?nicknames .
            FILTER(!SAMETERM(data:person_1, ?personWithSameNickname))
        }"; 

再次简单地为您的起始节点选择所有昵称,然后为所有人做同样的事情,并依靠 SPARQL 连接语义为我们找到交集.

Again simply select all the nicknames for your starting node and then do the same for all persons and rely on SPARQL join semantics to find us the intersection.

现在,如果您想按交叉点的数量对人员进行排名,那么我们可以使用 GROUP BYORDER BY 来实现,这可以添加到询问.我将使用第二种变体,因为基本查询更简单:

Now if you want to rank people by the number of intersections then we can do this using GROUP BY and ORDER BY and this can be added to either variation of the query. I will use the second variation because the base query is simpler:

queryString.CommandText =
        @"SELECT ?personWithSameNickname (COUNT(?nicknames) AS ?matches) WHERE { 
            data:person_1 app:nicknames ?nicknames .
            ?personWithSameNickname app:nicknames ?nicknames .
            FILTER(!SAMETERM(data:person_1, ?personWithSameNickname))
        }
        GROUP BY ?personWithSameNickname
        ORDER BY DESC(?matches)";

所以首先我们向SELECT添加一个聚合,特别是我们要计算昵称的数量.然后,我们还需要在 ?personWithSameNickname 变量上添加一个 GROUP BY,因为我们希望每个具有交叉昵称的人都有一个组.这也意味着将为每个组计算我们的聚合,因此我们可以使用 ORDER BY 按降序对匹配项进行排名.

So firstly we add an aggregate to the SELECT, specially we want to count the number of nicknames. We then also need to add a GROUP BY on the ?personWithSameNickname variable because we want a group for each person who has intersecting nicknames. This also means that our aggregate will be calculated for each group so we can then use ORDER BY to rank the matches in descending order.

这篇关于在 dotNETRDF 中使用 SPARQL 列表 - 列表的交集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆