如何加载与Apache POI大XLSX文件? [英] How to load a large xlsx file with Apache POI?

查看:871
本文介绍了如何加载与Apache POI大XLSX文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大的.xlsx文件(141 MB,包含293413行,每行62列),我需要在执行某些操作。

I have a large .xlsx file (141 MB, containing 293413 lines with 62 columns each) I need to perform some operations within.

我有加载该文件的问题(的OutOfMemoryError ),如POI对XSSF大容量内存(XLSX)工作簿。

I am having problems with loading this file (OutOfMemoryError), as POI has a large memory footprint on XSSF (xlsx) workbooks.

<一个href=\"http://stackoverflow.com/questions/6069847/java-lang-outofmemoryerror-java-heap-space-while-reading-excel-with-apache-poi\">This太问题是相似的,并且将溶液presented是增加VM的分配的/最大存储器

This SO question is similar, and the solution presented is to increase the VM's allocated/maximum memory.

这似乎对于那种文件大小(9MB)的工作,但对我来说,它只是简单地不即使分配所有可用的系统内存工作。 (当然,这并不奇怪考虑该文件是超过15倍)

It seems to work for that kind of file-size (9MB), but for me, it just simply doesn't work even if a allocate all available system memory. (Well, it's no surprise considering the file is over 15 times larger)

我想知道是否有任何方式来加载在某种程度上,它不会消耗所有的内存,但该工作簿,而不做基础的处理(进入)的XSSF的基础XML。 (换言之,保持清教徒POI溶液)

I'd like to know if there is any way to load the workbook in a way it won't consume all the memory, and yet, without doing the processing based (going into) the XSSF's underlying XML. (In other words, maintaining a puritan POI solution)

如果没有坚韧,欢迎您说它(没有。),并指向我的方式为XML的解决方案。

If there isn't tough, you are welcome to say it ("There isn't.") and point me the ways to a "XML" solution.

推荐答案

我是在一个web服务器环境类似的情况。在上传的典型大小为150K〜行,它不会一直不错,从单个请求消耗大量的内存。在Apache POI流API可以很好地用于这一点,但它需要你读逻辑的完全重新设计。我已经使用过,我不希望有重做标准API读逻辑的一群,所以我写了这个:<一href=\"https://github.com/monitorjbl/excel-streaming-reader\">https://github.com/monitorjbl/excel-streaming-reader

I was in a similar situation with a webserver environment. The typical size of the uploads were ~150k rows and it wouldn't have been good to consume a ton of memory from a single request. The Apache POI Streaming API works well for this, but it requires a total redesign of your read logic. I already had a bunch of read logic using the standard API that I didn't want to have to redo, so I wrote this instead: https://github.com/monitorjbl/excel-streaming-reader

这并不完全是一个下拉更换为标准 XSSFWorkbook 类,但是如果你只是通过迭代行,它的行为类似:

It's not entirely a drop-in replacement for the standard XSSFWorkbook class, but if you're just iterating through rows it behaves similarly:

import com.monitorjbl.xlsx.StreamingReader;

InputStream is = new FileInputStream(new File("/path/to/workbook.xlsx"));
StreamingReader reader = StreamingReader.builder()
        .rowCacheSize(100)    // number of rows to keep in memory (defaults to 10)
        .bufferSize(4096)     // buffer size to use when reading InputStream to file (defaults to 1024)
        .sheetIndex(0)        // index of sheet to use (defaults to 0)
        .read(is);            // InputStream or File for XLSX file (required)

for (Row r : reader) {
  for (Cell c : r) {
    System.out.println(c.getStringCellValue());
  }
}     

有一些注意事项,使用它;由于方式XLSX片材结构,不是所有的数据是在流的当前窗口中可用。不过,如果你只是想从细胞中读出简单的数据,它的工作原理pretty好了点。

There are some caveats to using it; due to the way XLSX sheets are structured, not all data is available in the current window of the stream. However, if you're just trying to read simple data out from the cells, it works pretty well for that.

这篇关于如何加载与Apache POI大XLSX文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆