我如何编写在Java Code中使用相似性度量的SPARQL查询 [英] How I can write SPARQL query that uses similarity measures in Java Code

查看:136
本文介绍了我如何编写在Java Code中使用相似性度量的SPARQL查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道在Java代码中编写此SPARQL查询的简单方法:

I would like to know a simple method to write this SPARQL query in Java Code:

select ?input
       ?string
       (strlen(?match)/strlen(?string) as ?percent)
where {
  values ?string { "London" "Londn" "London Fog" "Lando" "Land Ho!"
                   "concatenate" "catnap" "hat" "cat" "chat" "chart" "port" "part" }

  values (?input ?pattern ?replacement) {
    ("cat"   "^x[^cat]*([c]?)[^at]*([a]?)[^t]*([t]?).*$"                              "$1$2$3")
    ("Londn" "^x[^Londn]*([L]?)[^ondn]*([o]?)[^ndn]*([n]?)[^dn]*([d]?)[^n]*([n]?).*$" "$1$2$3$4$5")
  }

  bind( replace( concat('x',?string), ?pattern, ?replacement) as ?match )
}
order by ?pattern desc(?percent)

此代码包含在讨论中使用iSPARQL比较使用相似性度量的值
此代码的目的是查找与DBPedia上的给定单词类似的资源。
这个方法考虑到我事先知道字符串及其长度。我想知道如何在参数化方法中编写此查询,无论单词和长度如何,它都会返回给我相似性度量。

This code is contained in the discussion To use iSPARQL to compare values using similarity measures. The purpose of this code is to find the resources similar to a given word on DBPedia. This method takes into consideration that I know in advance the strings and the length of it. I would like to know how I can write this query in a parameterized method that, regardless of the word and the length of it, it returns to me the similarity measures.

推荐答案

更新: ARQ - 编写属性函数现在是标准Jena文档的一部分。

Update: ARQ - Writing Property Functions is now part of the standard Jena documentation.

看起来你喜欢对SPARQL进行语法扩展,执行更复杂的操作部分查询。例如:

It looks like you'd enjoy having a syntactic extension to SPARQL that performs the more complex portions of your query. For example:

SELECT ?input ?string ?percent WHERE
{
   VALUES ?string { "London" "Londn" "London Fog" "Lando" "Land Ho!"
                    "concatenate" "catnap" "hat" "cat" "chat" "chart" "port" "part" }

   VALUES ?input  { "cat" "londn" }

   ?input <urn:ex:fn#matches> (?string ?percent) .
}
ORDER BY DESC(?percent)

在这个例子中,它是假设< urn:ex:fn#matches> 是一个属性函数,它将自动执行匹配操作并计算相似度。

In this example, it's assumed that <urn:ex:fn#matches> is a property function that will automatically perform the matching operation and calculate the similarity.

Jena文档很好地解释了如何编写自定义过滤器函数
但是(截至2014年8月7日)几乎无法解释如何实现自定义属性函数。

The Jena documentation does a great job explaining how to write a custom filter function, but (as of 07/08/2014) does little to explain how to implement a custom property function.

我将制作假设您可以将您的答案转换为Java代码以计算字符串相似性,并专注于可以存储代码的属性函数的实现。

I will make the assumption that you can convert your answer into java code for the purpose of calculating string similarity, and focus on the implementation of a property function that can house your code.

实现属性函数

每个属性函数都与特定的 Context 相关联。这允许您将函数的可用性限制为全局或与特定数据集相关联。

Every property function is associated with a particular Context. This allows you to limit the availability of the function to be global or associated with a particular dataset.

假设您有 PropertyFunctionFactory的实现(稍后显示),您可以按如下方式注册该功能:

Assuming you have an implementation of PropertyFunctionFactory (shown later), you can register the function as follows:

注册

Registration

final PropertyFunctionRegistry reg = PropertyFunctionRegistry.chooseRegistry(ARQ.getContext());
reg.put("urn:ex:fn#matches", new MatchesPropertyFunctionFactory);
PropertyFunctionRegistry.set(ARQ.getContext(), reg);

全局和数据集特定注册之间的唯一区别是 Context 对象来自:

The only difference between global and dataset-specific registration is where the Context object comes from:

final Dataset ds = DatasetFactory.createMem();
final PropertyFunctionRegistry reg = PropertyFunctionRegistry.chooseRegistry(ds.getContext());
reg.put("urn:ex:fn#matches", new MatchesPropertyFunctionFactory);
PropertyFunctionRegistry.set(ds.getContext(), reg);

MatchesPropertyFunctionFactory

MatchesPropertyFunctionFactory

public class MatchesPropertyFunctionFactory implements PropertyFunctionFactory {
    @Override
    public PropertyFunction create(final String uri)
    {   
        return new PFuncSimpleAndList()
        {
            @Override
            public QueryIterator execEvaluated(final Binding parent, final Node subject, final Node predicate, final PropFuncArg object, final ExecutionContext execCxt) 
            {   
                /* TODO insert your stuff to perform testing. Note that you'll need
                 * to validate that things like subject/predicate/etc are bound
                 */
                final boolean nonzeroPercentMatch = true; // XXX example-specific kludge
                final Double percent = 0.75; // XXX example-specific kludge
                if( nonzeroPercentMatch ) {
                    final Binding binding = 
                                BindingFactory.binding(parent, 
                                                       Var.alloc(object.getArg(1)),
                                                       NodeFactory.createLiteral(percent.toString(), XSDDatatype.XSDdecimal));
                    return QueryIterSingleton.create(binding, execCtx);
                }
                else {
                    return QueryIterNullIterator.create(execCtx);
                }
            }
        };
    }

}

因为我们创建的属性函数我们使用 PFuncSimpleAndList 作为抽象实现。除此之外,在这些属性函数中发生的大部分魔法是创建 Binding s, QueryIterator s,并执行输入参数的验证。

Because the property function that we create takes a list as an argument, we use PFuncSimpleAndList as an abstract implementation. Aside from that, most of the magic that happens inside these property functions is the creation of Bindings, QueryIterators, and performing validation of the input arguments.

验证/结算备注

Validation/Closing Notes

这应该足以让你继续编写自己的属性函数,如果那是你想要存放字符串匹配逻辑的地方。

This should be more than enough to get you going on writing your own property function, if that is where you'd like to house your string-matching logic.

未显示的是输入验证。在这个答案中,我假设 subject 并绑定了第一个列表参数( object.getArg(0))( Node.isConcrete()),并且第二个列表参数( object.getArg(1))不是( Node.isVariable())。如果不以这种方式调用您的方法,事情就会爆炸。强化方法(放置许多 if-else 块进行条件检查)或支持替代用例(即:查找 object.getArg的值( 0)如果它是一个变量)留给读者(因为它在演示过程中很难演示,易于测试,并且很明显)。

What hasn't been shown is input validation. In this answer, I assume that subject and the first list argument (object.getArg(0)) are bound (Node.isConcrete()), and that the second list argument (object.getArg(1)) is not (Node.isVariable()). If your method isn't called in this manner, things would explode. Hardening the method (putting many if-else blocks with condition checks) or supporting alternative use-cases (ie: looking up values for object.getArg(0) if it is a variable) are left to the reader (because it's tedious to demonstrate, easily testable, and readily apparent during implementation).

这篇关于我如何编写在Java Code中使用相似性度量的SPARQL查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆