将Jena OntModels与bnodes同步 [英] Synchronizing Jena OntModels with bnodes

查看:122
本文介绍了将Jena OntModels与bnodes同步的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此问题与 rcreswick 问题有关/375605/serializing-jena-ontmodel-changes>序列化Jena OntModel更改.我在需要通过套接字保持同步的两台(或更多台)计算机上拥有Jena模型.我需要解决的主要问题是模型可能包含匿名节点(bnode),它们可以起源于任何模型.

This question relates to rcreswick's question on Serializing Jena OntModel Changes. I have Jena models on two (or more) machines that need to remain synchronized over sockets. The main issue that I need to address is that the models may contain anonymous nodes (bnodes), which can originate in any of the models.

问题:我是在正确的轨道上吗?还是有我没有考虑的更好,更强大的方法?

Question: Am I on the right track here, or is there a better, more robust approach that I'm failing to consider?

我可以想到3种解决此问题的方法:

I can think of 3 approaches to this problem:

  1. 序列化完整模型:这对于同步小型更新而言非常昂贵.另外,由于任何一台机器上都可能发生更改,所以我不能仅用机器A的序列化模型替换机器B的模型.我需要将它们合并.
  2. 序列化部分模型:使用专用的模型进行序列化,该模型仅包含需要通过套接字发送的更改.这种方法需要特殊的词汇表来表示从模型中删除的语句.大概,当我将模型从机器A序列化到机器B时,匿名节点ID对于机器A将是唯一的,但可能与在机器B上创建的匿名节点的ID重叠.因此,我将不得不重命名匿名节点并保留映射从机器A的匿名ID到机器B的ID,以便正确处理将来的更改.
  3. 序列化单个语句:这种方法不需要特殊的词汇,但可能不够健壮.除了匿名节点之外,我还没有遇到其他问题吗?
  4. 生成全局唯一的bnode ID(NEW):我们可以通过为ID加上唯一的计算机ID来为匿名节点生成全局唯一的ID.不幸的是,我还没有弄清楚如何告诉Jena使用我的ID生成器,而不是它的ID生成器.自己的.这将使我们能够序列化单个语句,而无需重新映射bnode ID.
  1. Serialize the complete model: This is prohibitively expensive for synchronizing small updates. Also, since changes can occur on either machine, I can't just replace machine B's model with the serialized model from machine A. I need to merge them.
  2. Serialize a partial model: Use a dedicated model for serialization that only contains the changes that need to be sent over the socket. This approach requires special vocabulary to represent statements that were removed from the model. Presumably, when I serialize the model from machine A to machine B, anonymous node IDs will be unique to machine A but may overlap with IDs for anonymous nodes created on machine B. Therefore, I'll have to rename anonymous nodes and keep a mapping from machine A's anon ids to machine B's ids in order to handle future changes correctly.
  3. Serialize individual statements: This approach requires no special vocabulary, but may not be as robust. Are there issues other than anonymous nodes that I just haven't encountered yet?
  4. Generate globally unique bnode ids (NEW): We can generate globally unique IDs for anonymous nodes by prefixing the ID with a unique machine ID. Unfortunately, I haven't figured out how to tell Jena to use my ID generator instead of its own. This would allow us to serialize individual statements without remapping bnode IDs.

这是一个使讨论更加基础的示例.假设我在机器A上有一个列表,表示为:

Here's an example to ground this discussion a bit more. Suppose I have a list on machine A represented as:


    _:a rdf:first myns:tom
    _:a rdf:rest rdf:nil

我将此模型从机器A序列化到机器B.现在,由于机器B可能已经具有一个ID为'a'的(不相关)匿名节点,因此我将ID'a'重映射为一个新的ID'b':

I serialize this model from machine A to machine B. Now, because machine B may already have an (unrelated) anonymous node with id 'a', I remap id 'a' to a new id 'b':


    _:b rdf:first myns:tom
    _:b rdf:rest rdf:nil

现在列表在计算机A上更改:

Now the list changes on machine A:


    _:a rdf:first myns:tom
    _:a rdf:rest _:b
    _:b rdf:first myns:dick
    _:b rdf:rest rdf:nil

由于机器B之前从未遇到过机器A的ID'b',因此它将新的映射从机器A的ID'b'添加到新ID'c':

Since machine B has never encountered machine A's id 'b' before, it adds a new mapping from machine A's id 'b' to a new id 'c':


    _:b rdf:first myns:tom
    _:b rdf:rest _:c
    _:c rdf:first myns:dick
    _:c rdf:rest rdf:nil

如果使用两台以上的机器,问题将进一步复杂化.例如,如果有第三台机器C,则它可能具有与机器A的匿名节点"a"不同的自己的匿名节点"a".因此,机器B确实确实需要保持从其他每台机器的匿名节点ID到其本地ID的映射,而不仅仅是从一般的远程ID到本地ID的映射.处理传入的更改时,必须考虑更改来自何处,以正确映射ID.

The problem is further complicated with more than two machines. If there is a third machine C, for example, it may have it's own anonymous node 'a' that is different from machine A's anonymous node 'a'. Thus, machine B really does need to keep a map from each of the other machines' anonymous node IDs to its local IDs, not just from remote IDs in general to local IDs. When processing incoming changes, it must take into account where the changes came from to map the IDs correctly.

推荐答案

是否允许将自己的三元组添加到模型中?如果是这样,我将为每个bnode引入一条语句,以URN的形式为每个节点提供一个备用公共ID.现在,您可以开始在两个模型之间匹配bnode.

Are you allowed to add your own triples to the model? If so, I would introduce a statement for every bnode, giving each an alternate public id in the form of a URN. You can now start matching bnodes between the two models.

不过,不管是否有空白节点,双向同步只会使您步入正轨.如果您试图在两个模型上检测到相同的并发更改,那么类似的策略将使您无所适从.

Blank nodes or not, though, the two-way sync will only get you so far. If you are trying to detect equivalent concurrent changes on both models, strategies like this will only get you so far.

这是一个例子.假设您正在建立一家新的草坪护理公司.为了促进业务发展,您和您的伴侣参加了当地的户外活动,并尝试预订一些打折的试用约会.你们两个,每个人都手持一台笔记本电脑,打交道并录制任何有兴趣的人.记录是:

Here's an example. Let's say you are starting a new lawn care company. In order to drum up some business, you and your partner go to a local outdoor event, and try to book some discounted trial appointments. The two of you, each armed with a laptop, mingle and record anyone interested. The record is has:

address and zip
phone number
appointment dateTime

比方说,每条记录都作为资源存储在模型中.您可能会遇到丈夫,而您的伴侣可能会遇到同一家庭的妻子.无论您是否巧合地预订了相同的约会dateTime,系统都将很难对条目进行重复数据删除.无论您为每个记录使用bnode还是基于UUID的URI,都不会删除重复数据.唯一的希望是,如果您使用某种规范形式的电话号码来综合记录的确定性URI.

Let's say each record is stored as a resource in your model. It is possible for you to meet the husband, and your partner to meet the wife of the same household. Whether you coincidentally book the same appointment dateTime or not, the system would be hard-pressed to de-duplicate the entry. Whether you use a bnode for each record or a UUID based URI, it would not de-dup. The only hope is if you use say the phone number in some canonical form to synthesis a deterministic URI for the record.

这篇关于将Jena OntModels与bnodes同步的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆