高效内存的XSLT,用于转换大型XML文件 [英] Memory efficient XSLT for transforming large XML files

查看:108
本文介绍了高效内存的XSLT,用于转换大型XML文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此问题与最近的答案 michael.hor257k 有关,该问题位于与 Dimitre Novatchev .

This question is related to a recent answer by michael.hor257k, which is in-turn related to an answer by Dimitre Novatchev.

在上述答案中(由 michael.hor257k 使用)中的样式表时,对于大型XML(大约60MB,下面提供了示例XML),并且转换成功完成.

When used the stylesheet in the above mentioned answer(by michael.hor257k), for a large XML(around 60MB, sample XML is present below) and the transformation was carried out successfully.

当尝试另一个样式表时,与michael.hor257k的样式表略有不同,并且打算将元素(带有子元素sectPr)及其后继兄弟姐妹分组(直到下一个带有子元素sectPr的aftering-sibling元素). ,以递归方式进行(即,将元素分组到输入XML的深度).

When tried another stylesheet, a little different from michael.hor257k's, and is intended to group elements(with a child sectPr) and their following-siblings(until the next following-sibling element with a child sectPr), recursively(i.e., group the elements to the depth of the input XML).

示例输入XML:

<body>
    <p/>
    <p>
        <sectPr/>
    </p>
    <p/>
    <p/>
    <tbl/>
    <p>
        <sectPr/>
    </p>
    <p/>
</body>

我尝试过的样式表:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

    <xsl:output method="xml" indent="yes"/>

    <xsl:template match="*">
        <xsl:copy>
            <xsl:apply-templates select="*[1] | *[sectPr]"/>
        </xsl:copy>
        <xsl:apply-templates select="following-sibling::*[1][not(sectPr)]"/>
    </xsl:template>

    <xsl:template match="*[sectPr]">
        <myTag>
            <xsl:copy>
                <xsl:apply-templates select="*[1] | *[sectPr]"/>
            </xsl:copy>
            <xsl:apply-templates select="following-sibling::*[1][not(sectPr)]"/>
        </myTag>
    </xsl:template>

</xsl:stylesheet>

出于好奇,我遇到了 OutOfMemoryError 转换了大约60MB的XML.

To my curiosity, I encountered OutOfMemoryError transforming an XML of around 60MB.

我想知道,我想我不理解michael.hor257k和Dimitre Novatchev所提供的XSLT背后的技巧,它们不会引起内存异常.

I wonder, and I think I do not understand the trick behind the XSLTs provided by both michael.hor257k and Dimitre Novatchev, which wouldn't cause memory exceptions.

我的样式表与上面得到的答案OutOfMemoryError之间的最大区别是什么.以及如何更新样式表以提高内存效率.

What is the big difference between my stylesheet and the above mentioned answers that I get OutOfMemoryError. And how can I update the stylesheet to be memory efficient.

推荐答案

Lingamurthy CS,

Lingamurthy CS,

请添加从原始解决方案中删除的<xsl:strip-space elements="*"/>声明.这从源XML文档中删除了任何仅空白文本节点.

Please, add the <xsl:strip-space elements="*"/> declaration, which you removed from the original solution. This strips from the source XML document any whitespace-only text node.

不剥离这些节点可能会大大增加节点的数量以及用于保存它们的内存-在您的情况下,用于保存XML文档的所需内存几乎是用于保存XML文档的所需内存的两倍.这些节点被剥离.

Not stripping these nodes may significantly increase the number of nodes and the memory to hold them -- in your case, the required memory to hold the XML document will be almost twice as much compared to the necessary memory to hold the XML document with these nodes stripped.

我运行您的转换正常,但是剥离了节点后,运行速度提高了20%-在MS XslCompiledTransform上运行.

I run your transformation OK, but with the nodes stripped it runs 20% faster -- on MS XslCompiledTransform.

然后我运行您的转换-一次是在问题中发布的,第二次是在Saxon 9.1J中添加<xsl:strip-space elements="*"/>的第二次-因为它还显示了转换的内存消耗.两次运行均成功.在第一种情况下,处理的节点数为9525004,并且使用了340MB RAM.转换耗时5.3秒.在第二种情况下,节点数为4336366,并且使用了215MB RAM.转换在5.06 sec

Then I ran your transformation -- one time as published in the question, and a second time with added <xsl:strip-space elements="*"/> with Saxon 9.1J -- because it shows also the memory consumption of the transformation. Both runs were successful. In the first case the number of nodes processed was 9525004 and 340MB RAM was used. The transformation took 5.3 sec. In the second case the number of nodes was 4336366 and 215MB RAM was used. The transformation ran in 5.06sec

这篇关于高效内存的XSLT,用于转换大型XML文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆