在大型目录中转换XML时出现java.lang.OutOfMemoryError [英] java.lang.OutOfMemoryError while transforming XML in a huge directory

查看:90
本文介绍了在大型目录中转换XML时出现java.lang.OutOfMemoryError的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用XSLT2在一个具有很多级别的巨大目录中转换XML文件.有超过一百万个文件,每个文件为4到10 kB.一段时间后,我总是收到java.lang.OutOfMemoryError:Java堆空间.

I want to transform XML files using XSLT2, in a huge directory with a lot of levels. There are more than 1 million files, each file is 4 to 10 kB. After a while I always receive java.lang.OutOfMemoryError: Java heap space.

我的命令是: java -Xmx3072M -XX:+ UseConcMarkSweepGC -XX:+ CMSClassUnloadingEna 放血-XX:MaxPermSize = 512M ...

My command is: java -Xmx3072M -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEna bled -XX:MaxPermSize=512M ...

向-Xmx添加更多内存不是一个好的解决方案.

Add more memory to -Xmx is not a good solution.

这是我的代码:

for (File file : dir.listFiles()) {
    if (file.isDirectory()) {
        pushDocuments(file);
    } else {
        indexFiles.index(file);
    }
}

public void index(File file) {
    ByteArrayOutputStream outputStream = new ByteArrayOutputStream();

    try {
        xslTransformer.xslTransform(outputStream, file);
        outputStream.flush();
        outputStream.close();
    } catch (IOException e) {
        System.err.println(e.toString());
    }
}

通过net.sf.saxon.s9api进行XSLT转换

XSLT transform by net.sf.saxon.s9api

public void xslTransform(ByteArrayOutputStream outputStream, File xmlFile) {
    try {
        XdmNode source = proc.newDocumentBuilder().build(new StreamSource(xmlFile));
        Serializer out = proc.newSerializer();
        out.setOutputStream(outputStream);
        transformer.setInitialContextNode(source);
        transformer.setDestination(out);
        transformer.transform();

        out.close();
    } catch (SaxonApiException e) {
        System.err.println(e.toString());
    }
}

推荐答案

我对Saxon s9api接口的通常建议是重用XsltExecutable对象,但要为每次转换创建一个新的XsltTransformer. XsltTransformer会缓存您已阅读的文档,以备再次需要时使用,在这种情况下,这不是您想要的.

My usual recommendation with the Saxon s9api interface is to reuse the XsltExecutable object, but to create a new XsltTransformer for each transformation. The XsltTransformer caches documents you have read in case they are needed again, which is not what you want in this case.

作为替代方案,您可以在每次转换后调用xsltTransformer.getUnderlyingController().clearDocumentPool().

As an alternative, you could call xsltTransformer.getUnderlyingController().clearDocumentPool() after each transformation.

(请注意,您可以在saxonica.plan.io上向Saxon提问,这很有可能使我们[Saxonica]注意到并回答它们.您也可以在此处提问并标记为"saxon",这意味着我们可能会在某个时候回答这个问题,尽管并非总是立即.如果您在StackOverflow上询问没有特定于产品的标签,是否有人会注意到这个问题是一厢情愿的.)

(Please note, you can ask Saxon questions at saxonica.plan.io, which gives a good chance we [Saxonica] will notice them and answer them. You can also ask them here and tag them "saxon", which means we'll probably respond to the question at some point, though not always immediately. If you ask on StackOverflow with no product-specific tags, it's entirely hit-and-miss whether anyone will notice the question.)

这篇关于在大型目录中转换XML时出现java.lang.OutOfMemoryError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆