处理大型xlsx文件 [英] Processing large xlsx file

查看:292
本文介绍了处理大型xlsx文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要自动调整大(30k +行)xlsx文件中的所有行。

I need to auto-fit all rows in large (30k+ rows) xlsx file.

以下代码通过apache poi适用于小文件,但是出去了大单上的 OutOfMemoryError

The following code via apache poi works on small files, but goes out with OutOfMemoryError on large ones:

Workbook workbook = WorkbookFactory.create(inputStream);
Sheet sheet = workbook.getSheetAt(0);

for (Row row : sheet) {
    row.setHeight((short) -1);
}

workbook.write(outputStream);

更新:不幸的是,增加堆大小不是一个选项 - OutOfMemoryError 出现在 -Xmx1024m ,而且30k行不是上限。

Update: Unfortunately, increasing heap size is not an option - OutOfMemoryError appears at -Xmx1024m and 30k rows is not an upper limit.

推荐答案

尝试使用事件API。请参阅事件API(仅限HSSF) XSSF和SAX(事件API)。来自该页面的两个引用:

Try using the event API. See Event API (HSSF only) and XSSF and SAX (Event API) in the POI documentation for details. A couple of quotes from that page:

HSSF:


事件API比User API更新。它适用于愿意学习一些低级API结构的中间开发人员。它使用起来相对简单,但需要对Excel文件的各个部分有基本的了解(或者愿意学习)。提供的优点是您可以读取内存占用相对较小的XLS。

The event API is newer than the User API. It is intended for intermediate developers who are willing to learn a little bit of the low level API structures. Its relatively simple to use, but requires a basic understanding of the parts of an Excel file (or willingness to learn). The advantage provided is that you can read an XLS with a relatively small memory footprint.

XSSF:


如果内存占用是一个问题,然后对于XSSF,您可以获取基础XML数据,并自己处理它。这适用于愿意学习.xlsx文件的一些低级结构的中级开发人员,以及在java中处理XML的人。它使用起来比较简单,但需要对文件结构有基本的了解。提供的优势是您可以读取内存占用相对较小的XLSX文件。

If memory footprint is an issue, then for XSSF, you can get at the underlying XML data, and process it yourself. This is intended for intermediate developers who are willing to learn a little bit of low level structure of .xlsx files, and who are happy processing XML in java. Its relatively simple to use, but requires a basic understanding of the file structure. The advantage provided is that you can read a XLSX file with a relatively small memory footprint.

对于输出,一种可能的方法在博客文章 Streaming xlsx files 。 (基本上,使用XSSF生成容器XML文件,然后将实际内容作为纯文本流式传输到xlsx zip存档的相应xml部分。)

For output, one possible approach is described in the blog post Streaming xlsx files. (Basically, use XSSF to generate a container XML file, then stream the actual content as plain text into the appropriate xml part of the xlsx zip archive.)

这篇关于处理大型xlsx文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆