Apache的POI快得多使用HSSF比XSSF - 何去何从? [英] Apache POI much quicker using HSSF than XSSF - what next?

查看:484
本文介绍了Apache的POI快得多使用HSSF比XSSF - 何去何从?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直有与解析的.xlsx 文件与Apache POI的一些问题 - 我得到 java.lang.OutOfMemoryError:Java堆空间在我的部署应用程序。我只处理5MB下文件及约70,000行,以便从阅读一些其他的问题,我怀疑是出了什么错。

作为<建议href=\"http://stackoverflow.com/questions/26067405/heap-space-error-with-apache-poi-xssf#comment40844721_26067405\">this评论我决定跑 SSPerformanceTest.java 与建议的变量,以便看看有什么毛病我的code或设置。结果表明HSSF(的.xls )和XSSF之间的差异显著(的.xlsx

1) HSSF 50000 50 1:已播放1秒。

2)的 SXSSF 50000 50 1:经过的5秒

3) XSSF 50000 50 1:经过的15秒

借助 FAQ 特别说:


  

如果您不能在运行以50,000行和50列的所有HSSF,XSSF和SXSSF的不足3秒(最好少了很多!),问题是你的环境。


其次,它说运行 XLS2CSV.java 这是我做的事。上面生成的文件XSSF饲养(使用50000行和列50)需要15秒左右 - 它采取了同样的金额写入文件

什么毛病我的环境,如果是这样我怎么进一步调查?

从统计资料显示的VisualVM在加工过程中投篮命中率高达1.2GB使用的堆。当然,这是太高考虑的堆顶部一个额外的演出相比之前开始处理?

注:以上仅提到的堆空间异常生产(在谷歌App引擎)发生,只为的.xlsx 文件,然而,在这个问题中提到的测试都我的机器上使用 -Xmx2g 被运行。我希望,如果我可以在我的开发设置解决这个问题,将使用较少的内存,当我部署。

从应用程序引擎堆栈跟踪:


  

产生的原因:java.lang.OutOfMemoryError:Java堆空间
      在org.apache.xmlbeans.impl.store.Cur.createElementXobj(Cur.java:260)
      在org.apache.xmlbeans.impl.store.Cur $ CurLoadContext.startElement(Cur.java:2997)
      在org.apache.xmlbeans.impl.store.Locale $ SaxHandler.startElement(Locale.java:3211)
      在org.apache.xmlbeans.impl.piccolo.xml.Piccolo.reportStartTag(Piccolo.java:1082)
      在org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseAttributesNS(PiccoloLexer.java:1802)
      在org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseOpenTagNS(PiccoloLexer.java:1521)



解决方案

我正面临着同一种问题,使用Apache POI读取庞大的.xlsx文件,我碰到

Excel的流阅读器,github上

本库作为围绕流API,而preserving标准POI API的语法包装

这个库可以帮助你阅读大文件。

I've been having some issues with parsing .xlsx files with Apache POI - I am getting java.lang.OutOfMemoryError: Java heap space in my deployed app. I'm only processing files under 5MB and around 70,000 rows so my suspicion from reading number other questions is that something is amiss.

As suggested in this comment I decided to run SSPerformanceTest.java with the suggested variables so see if there is anything wrong with my code or setup. The results show a significant difference between HSSF (.xls) and XSSF (.xlsx):

1) HSSF 50000 50 1: Elapsed 1 seconds

2) SXSSF 50000 50 1: Elapsed 5 seconds

3) XSSF 50000 50 1: Elapsed 15 seconds

The FAQ specifically says:

If you can't run that with 50,000 rows and 50 columns in all of HSSF, XSSF and SXSSF in under 3 seconds (ideally a lot less!), the problem is with your environment.

Next, it says to run XLS2CSV.java which I have done. Feeding in the XSSF file generated above (with 50000 rows and 50 columns) takes around 15 seconds - the same amount it took to write the file.

Is something wrong with my environment, and if so how do I investigate further?

Stats from VisualVM show the heap used shooting up to 1.2Gb during the processing. Surely this is way too high considering that's an extra gig on top of the heap compared to before processing began?

Note: The heap space exception mentioned above only happens in production (on Google App Engine) and only for .xlsx files, however the tests mentioned in this question have all been run on my development machine with -Xmx2g. I'm hoping that if I can fix the problem on my development setup it will use less memory when I deploy.

Stack trace from app engine:

Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.xmlbeans.impl.store.Cur.createElementXobj(Cur.java:260) at org.apache.xmlbeans.impl.store.Cur$CurLoadContext.startElement(Cur.java:2997) at org.apache.xmlbeans.impl.store.Locale$SaxHandler.startElement(Locale.java:3211) at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.reportStartTag(Piccolo.java:1082) at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseAttributesNS(PiccoloLexer.java:1802) at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseOpenTagNS(PiccoloLexer.java:1521)

解决方案

I was facing same kind of issue to read bulky .xlsx file using Apache POI and I came across

excel-streaming-reader-github

This library serves as a wrapper around that streaming API while preserving the syntax of the standard POI API

This library can help you to read large files.

这篇关于Apache的POI快得多使用HSSF比XSSF - 何去何从?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆