使用XSL创建XML文件的散列 [英] Using XSL to make a hash of XML file

查看:161
本文介绍了使用XSL创建XML文件的散列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图找到一种方法来散列XML文件的内容。根源在于需要比较一些传递到文本节点的文本节点,我期望这些文本节点确保校验和相同。传入的文本节点已从表单提交中返回,我需要确保它们没有被更改(在合理范围内排除冲突)。



该架构非常糟糕,所以请不要问它!我被锁定到一个给定的实现SharePoint的一些非常糟糕的自定义代码,我需要解决。

是否有一个良好的校验/哈希函数,可以实施?我需要检查大约100个文本节点。

解决方案

听起来像你需要一个依赖于位置的校验和。你是在寻求一个XSLT实现,还是只是算法?



这是在C中实现Fletcher的校验和,这对于移植到XSLT应该不是很难。

更新:下面是Fletcher校验和的XSLT 2.0改编。它是否足够快取决于数据的大小和时间。我很想听听你的测试结果如何。为了优化,我会尝试将 xs:integer 更改为 xs:int



请注意,我已将上面链接的实现I的按位或( | )替换为纯加法。我并没有真正有资格分析统一性这一变化的后果或不可逆性,但只要你没有一个聪明的黑客试图恶意绕过你的校验和检查。



请注意,由于上述更改,此实现不会给出相同的结果作为Fletcher校验和(@MDBiker)的真正实现。所以你不能比较这个函数的输出和Java的Fletcher16的输出。然而, will 总是会为同一个输入(它是确定性的)返回相同的结果,因此您可以比较两个文本字符串上该函数的输出。

 <?xml version =1.0encoding =UTF-8?> 
< xsl:stylesheet xmlns:xsl =http://www.w3.org/1999/XSL/Transform
version =2.0xmlns:xs =http:// www。 w3.org/2001/XMLSchema
xmlns:foo =my.foo.org>

< xsl:variable name =str1>快速棕色狐狸跳过懒狗。< / xsl:variable>
< xsl:variable name =str2>快速皱眉框跳过朦胧的青蛙。< / xsl:variable>

< xsl:template match =/>
Checksum 1:< xsl:value-of select =foo:checksum($ str1)/>
Checksum 2:< xsl:value-of select =foo:checksum($ str2)/>
< / xsl:template>

< xsl:function name =foo:checksumas =xs:int>
< xsl:param name =stras =xs:string/>
< / xsl:function>

<! - - 我可以将某些xs:整数更改为xs:int并提高性能吗? - >
< xsl:function name =foo:fletcher16>
< xsl:param name =stras =xs:integer */>
< xsl:param name =lenas =xs:integer/>
< xsl:param name =indexas =xs:integer/>
< xsl:param name =sum1as =xs:integer/>
< xsl:param name =sum2as =xs:integer/>
< xsl:when test =$ index gt $ len>
< xsl:sequence select =$ sum2 * 256 + $ sum1/>
< / xsl:when>
< xsl:otherwise>
select =($ sum1 + $ str [$ index])mod 255/>
< xsl:sequence select =foo:fletcher16($ str,$ len,$ index + 1,$ newSum1,
($ sum2 + $ newSum1)mod 255)/>
< / xsl:otherwise>
< / xsl:function>
< / xsl:stylesheet>

输出:

  Checksum 1:65256 
Checksum 2:25689

用法:你说你需要在XML文件的内容上运行校验和,其根源是需要比较一些文本节点。如果您将文本节点传递给foo:checksum(),它将正常工作:将提取其字符串值。



仅供参考,我运行性能测试,计算535KB XML输入文件中文本节点的校验和。这是我使用的初始模板:

 < xsl:template match =/> 
输入的校验和:< xsl:value-of
select =sum(for $ t in // text()return foo:checksum($ t))mod 65536/>
< / xsl:template>

完成时间为0.8s,使用Saxon PE。

或者:

如果文字的数量不是很大,那么简单地比较将他们自己(而不是校验和)串起来。但是,由于您的架构限制,您可能无法同时访问两个文本节点......我不清楚您的描述。


I am trying to find a way to "hash" the contents of an XML file. At the root of this is a need to compare some text nodes that are passed in to text nodes that I am expecting to make sure that the checksum is the same. The passed-in text nodes have returned from a form submission and I need to ensure that they were not changed (within reason, ruling out collisions).

The architecture is horrible, so please don't ask about it! I am locked in to a given implementation of sharepoint with some very bad custom code that I need to work around.

Is there a well-performing checksum/hash function that can be implemented? I would need to check about 100 text nodes.

解决方案

Sounds like you need a position-dependent checksum. Are you asking for an XSLT implementation, or just the algorithm?

Here is an implementation of Fletcher's checksum in C, which should not be very hard to port to XSLT.

Update: Below is an XSLT 2.0 adaptation of Fletcher's checksum. Whether it's fast enough, depends on the size of your data and the amount of time you have. I'd be interested to hear how your tests go. To optimize, I would attempt to change xs:integer to xs:int.

Note that I have substituted plain addition for the bitwise OR (|) of the implementation I linked to above. I'm not really qualified to analyze the ramifications of this change in regard to uniformity or non-invertibility, but it seems OK as long as you don't have a smart hacker trying to maliciously bypass your checksum checks.

Do note that because of the above change, this implementation will not give the same results as true implementations of Fletcher's checksum (@MDBiker). So you can't compare the output of this function with that of Java's Fletcher16, for example. However it will always return the same result for the same input (it's deterministic), so you can compare the output of this function on two text strings.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="2.0" xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:foo="my.foo.org">

    <xsl:variable name="str1">The quick brown fox jumps over the lazy dog.</xsl:variable>
    <xsl:variable name="str2">The quick frown box jumps over the hazy frog.</xsl:variable>

    <xsl:template match="/">
        Checksum 1: <xsl:value-of select="foo:checksum($str1)"/>    
        Checksum 2: <xsl:value-of select="foo:checksum($str2)"/>    
    </xsl:template>

    <xsl:function name="foo:checksum" as="xs:int">
        <xsl:param name="str" as="xs:string"/>
        <xsl:variable name="codepoints" select="string-to-codepoints($str)"/>
        <xsl:value-of select="foo:fletcher16($codepoints, count($codepoints), 1, 0, 0)"/>
    </xsl:function>

    <!-- can I change some xs:integers to xs:int and help performance? -->
    <xsl:function name="foo:fletcher16">
        <xsl:param name="str" as="xs:integer*"/>
        <xsl:param name="len" as="xs:integer" />
        <xsl:param name="index" as="xs:integer" />
        <xsl:param name="sum1" as="xs:integer" />
        <xsl:param name="sum2" as="xs:integer"/>
        <xsl:choose>
            <xsl:when test="$index gt $len">
                <xsl:sequence select="$sum2 * 256 + $sum1"/>
            </xsl:when>
            <xsl:otherwise>
                <xsl:variable name="newSum1" as="xs:integer"
                    select="($sum1 + $str[$index]) mod 255"/>
                <xsl:sequence select="foo:fletcher16($str, $len, $index + 1, $newSum1,
                        ($sum2 + $newSum1) mod 255)" />
            </xsl:otherwise>
        </xsl:choose>
    </xsl:function>
</xsl:stylesheet>

The output:

    Checksum 1: 65256    
    Checksum 2: 25689

A note on usage: You said you needed to run checksum on "the contents of an XML file. At the root of this is a need to compare some text nodes". If you pass a text node to foo:checksum(), it will work fine: its string value will be extracted.

FYI, I ran a performance test, to calculate the checksum of text nodes in a 535KB XML input file. Here was the initial template I used:

<xsl:template match="/">
    Checksum of input: <xsl:value-of
      select="sum(for $t in //text() return foo:checksum($t)) mod 65536"/>    
</xsl:template>

It finished in 0.8s, using Saxon PE.

Alternatively:

If the amount of text is not very large, it would probably be faster and more accurate to simply compare the strings themselves (instead of checksums) to each other. But maybe you can't get access to both text nodes at the same time, due to your architecture restrictions... I'm not clear on that from your description.

这篇关于使用XSL创建XML文件的散列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆