上创建一个对象图的校验 [英] Creating a checksum on an object graph

查看:182
本文介绍了上创建一个对象图的校验的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题是关系到<一个href="http://stackoverflow.com/questions/5308057/generating-an-safe-hash$c$c-for-an-objectgraph">this 之一,但我认为应该单独询问。

我有对象实例的一个复杂的图形。现在,我想创建该对象图直接在内存校验和来检测是否修改过它,因为最后一次校验保存的对象图。校验和计算应迅速,不应该占用太多内存。

由于我现在明白了最佳的解决方案可能会生成对象图(纠正我,如果我错了)的二进制序列化形式的加密密钥。但是,这还附带了一些问题:

  1. 我应该如何序列化对象?的它一定要快,而不是 占用太多内存。此外,它 必须可靠地始终被序列化 一样的方法。 如果我使用.NET默认的序列化我真的能肯定的是,在二进制流总是相同的,如果实际数据是一样的吗?的我对此表示怀疑。
  2. 那么,什么将是序列化的另一种方法不采取长期执行?

更新:

你觉得这个方法:

  1. 在导航图和 图中的foreach对象创建 使用标准的INT散列code <一href="http://stackoverflow.com/questions/263400/what-is-the-best-algorithm-for-an-overridden-system-object-gethash$c$c">this算法(但不包括引用类型的成员重新在图形presenting节点)。将每个 哈希code到一个整数列表
  2. 整数列表转换为一个字节 数组
  3. 创建的字节数组哈希 使用MD5,CRC或类似

提及应迅速计算散列code是pretty的碰撞,对于单个对象,只需要它的原始成员考虑GetHash code算法。基于此字节数组上也应该是pretty的碰撞,再$ P $对象图和MD5 / CRC散列这个太psentation。

解决方案
  

你觉得这个方法:

     
      
  • 导航图中的曲线图和foreach对象创建使用此算法标准INT散列code(但不包括参考类型成员重新在图形presenting节点)。
  •   
  • 添加每个散code到一个整数列表
  •   
  • 整数列表转换为字节数组
  •   
  • 创建使用MD5,CRC或类似
  • 字节数组的哈希   

此方法的想法是相当接近我倒是认为最好的,但它可以使用一些抛光。

散列

考虑,你会preFER速度超过准确性和一个 INT -sized散列code每个项目留下足够的空间,以避免collissions中,选择的散列code算法中似乎是正确的。不包括参加图引用类型意味着我们'重新抛出一些信息了;请参阅下面的还有更多。

改善节点哈希

中没有考虑到连接到我们的散列是正确的节点以外的节点,但也许我们可以做的不是简单地抛出所有的信息远更好的主意吗?我们不希望采取的哈希codeS其他节点的考虑(它们会被散列本身以及),但我们扔掉由图提供的信息的边缘的位置:中散列code表示具有连接到N的其他节点内部数据X的节点不应该是相同的,与相连到M其他节点数据X的节点

如果您在使用边缘数据的一部分考虑的一种廉价的方式,使用它。例如,如果图是针对然后可以添加到哈希code计算每个节点的边从它出去其他节点的数目。

散集结codeS

创建哈希codeS的名单将总结散列codeS在一个(非常快之间的中间路线的做法,并保持一些的附加在总结成 INT ),并创建哈希codeS列表相关的图形的项目的总订单上的信息。如果您预计大量的项目在图中,则总结可能更适合(我会尝试,第一,看它是否无碰撞就够了);如果图中没有许多项目(说&LT; 1000),然后我会尝试的总订单的方式先。 记住要分配足够的内存列表(或者干脆使用数组)创建它的时候;你已经知道它的最终长度所以这是一个免费的速度增加的。

生产一个固定大小的哈希

如果你已经总结了散codeS为原始,并不需要在所有这一步。否则,散列列表作为字节[] 是我倒是认为最好的。由于散列字节需要很少的时间相比,创建列表,您可能需要使用更大尺寸的散列函数比MD5或CRC32,以减少碰撞没有实际的性能损失。

改善最终的散列质量

得到这个最终散后,我倒是prePEND或追加给它的项目数的散列图作为固定大小的六角连接codeD字符串,因为:

  • 这可能在减少碰撞(多少取决于图形的性质)帮助
  • 我们已经知道的项目在图中的号码(我们只是哈希他们每个人),所以它是一个O(1)操作

定义全序

如果在其中在被处理的图形的物品没有严格定义的顺序,则该门可打开的假阴性:两个图应散列为相同的值,不会因为即使它们是逻辑上相等,则执行散列函数的选择来处理每个项目的散列以不同的顺序。这个问题只会如果您使用的列表,以便添加到办法都难它出现,因为除了是传递的。

要打击的是,你需要处理的一个定义良好的秩序,图中的节点。这可能是一个订单,很容易产生从节点的数据结构(例如,像一棵树preorder遍历)和/或其他信息(如类名或节点类型的每一个节点,节点ID,如果这样的存在等)。

由于preprocessing图形以产生一个总订单是要花费一些时间,您可能需要权衡,对发生的假阴性结果,正如我上面提到的成本。此外,如果图是足够大,那么这种讨论可能是因​​为节点哈希code求和的方式是更适合您的需求,没有实际意义。

This question is related to this one but I think should be asked separately.

I have a complex graph of object instances. Now I would like to create a checksum on this object graph directly in memory to detect whether changes have been made to it since the last time the checksum was saved with the object graph. The checksum calculation should be quick and should not consume too much memory.

As I understand now the best solution would probably be to generate a cryptographic key on a binary serialized form of the object graph (correct me if I am wrong). But that comes with a few questions:

  1. How should I serialize the object? It must be fast and not consume too much memory. Also it must reliably always be serialized the same way. If I use the .NET default serialization can I really be sure that the created binary stream is always the same if the actual data is the same? I doubt it.
  2. So what would be an alternative way to serialize that doesn't take to long to implement?

Update:

What do you think about this approach:

  1. navigate through the graph and foreach object in the graph create a standard int hashcode using this algorithm (but exclude reference type members representing nodes in the graph). Add each hashcode to a integer list
  2. convert the integer list to a byte array
  3. create a hash on the byte array using MD5, CRC or similar

The GetHashCode algorithm mentioned should quickly calculate a hashcode that is pretty collision safe for a single object that only takes its primitive members into account. Based on this the byte array should also be a pretty collision safe representation of the object graph and the MD5/CRC hash on this too.

解决方案

What do you think about this approach:

  • navigate through the graph and foreach object in the graph create a standard int hashcode using this algorithm (but exclude reference type members representing nodes in the graph).
  • Add each hashcode to a integer list
  • Convert the integer list to a byte array
  • Create a hash on the byte array using MD5, CRC or similar

This approach idea is quite near to what I 'd consider best, but it could use some polishing.

Hashing

Considering that you would prefer speed over accuracy and that an int-sized hashcode for each item leaves plenty of room for avoiding collissions, the choice of hashcode algo seems right. Excluding reference types that participate in the graph means we 're throwing some information away; see below for more on that.

Improving the node hash

The idea of not taking into account other nodes connected to the node we are hashing is correct, but maybe we can do better than simply throwing all that information away? We don't want to take the hashcodes of other nodes into account (they will be hashed themselves as well), but we are throwing away the information provided by the graph edges here: the hashcode for a node with internal data X connected to N other nodes should not be the same for a node with data X connected to M other nodes.

If you have a cheap way of using a part of the edge data into account, use it. For example, if the graph is directed then you can add to the hashcode computed for each node the number of edges going out from it to other nodes.

Aggregating hashcodes

Creating a list of hashcodes would be the middle-ground approach between summing the hashcodes in one long (very fast and keeps some additional information over summing into an int) and creating a list of hashcodes dependent on a total order of the items in the graph. If you expect lots of items in the graph then summing might be more appropriate (I 'd try that first and see if it's collision-free enough); if the graph doesn't have many items (say < 1000) then I 'd try the total-order approach first. Remember to allocate enough memory for the list (or simply use an array) when creating it; you already know its final length so that's a free speed increase.

Producing a fixed-size hash

If you have summed the hashcodes into a primitive, this step is not required at all. Otherwise, hashing the list as a byte[] is what I 'd consider best. Since hashing the bytes will take very little time in comparison to creating the list, you may want to use a larger-sized hash function than md5 or crc32 to reduce collisions without a practical performance hit.

Improving the final hash quality

After getting this "final" hash, I 'd prepend or append to it the number of items in the hashed graph as fixed-size hex-encoded string because:

  • It might help in reducing collisions (how much depends on the nature of the graphs)
  • We already know the number of items in the graph (we just hashed each one of them) so it's an O(1) operation

Defining a total order

If the order in which the items in the graph are processed is not strictly defined, then the door is open for false negatives: two graphs which should hash to the same value do not because even though they are logically equivalent, the implementation of the hash function chose to process the per-item hashes in a different order. This problem will appear only if you use a list, since addition is transitive so the "add into a long approach" is immune to it.

To combat that, you need to process the nodes in the graph in a well-defined order. That might be an order that's easy to produce from the data structure of the nodes (e.g. like preorder traversal on a tree) and/or other information (e.g. class names or node types for each node, node ids if such exist etc).

Since preprocessing the graph to produce a total order is going to take some time, you may want to weigh that against the cost incurred by a false negative result as I mentioned above. Also, if the graphs are large enough then this discussion might be moot because of the node hashcode summation approach being more suited to your needs.

这篇关于上创建一个对象图的校验的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆