处理大型 xlsx 文件 [英] Processing large xlsx file

查看:46
本文介绍了处理大型 xlsx 文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要自动适应大(30k+ 行)xlsx 文件中的所有行.

I need to auto-fit all rows in large (30k+ rows) xlsx file.

通过 apache poi 的以下代码适用于小文件,但在大文件上会出现 OutOfMemoryError:

The following code via apache poi works on small files, but goes out with OutOfMemoryError on large ones:

Workbook workbook = WorkbookFactory.create(inputStream);
Sheet sheet = workbook.getSheetAt(0);

for (Row row : sheet) {
    row.setHeight((short) -1);
}

workbook.write(outputStream);

更新:不幸的是,增加堆大小不是一个选项 - OutOfMemoryError 出现在 -Xmx1024m 并且 30k 行不是上限.

Update: Unfortunately, increasing heap size is not an option - OutOfMemoryError appears at -Xmx1024m and 30k rows is not an upper limit.

推荐答案

尝试使用事件 API.请参阅事件 API(仅限 HSSF)XSSF 和 SAX(事件 API) POI 文档中的详细信息.来自该页面的几句话:

Try using the event API. See Event API (HSSF only) and XSSF and SAX (Event API) in the POI documentation for details. A couple of quotes from that page:

HSSF:

事件 API 比用户 API 更新.它适用于愿意学习一些低级 API 结构的中级开发人员.它使用起来相对简单,但需要对 Excel 文件的各个部分有基本的了解(或愿意学习).提供的优点是您可以读取占用相对较小内存的 XLS.

The event API is newer than the User API. It is intended for intermediate developers who are willing to learn a little bit of the low level API structures. Its relatively simple to use, but requires a basic understanding of the parts of an Excel file (or willingness to learn). The advantage provided is that you can read an XLS with a relatively small memory footprint.

XSSF:

如果内存占用是一个问题,那么对于 XSSF,您可以获取底层 XML 数据并自行处理.这适用于愿意学习一些 .xlsx 文件的低级结构并乐于用 Java 处理 XML 的中级开发人员.它使用起来相对简单,但需要对文件结构有基本的了解.提供的优点是您可以以相对较小的内存占用读取 XLSX 文件.

If memory footprint is an issue, then for XSSF, you can get at the underlying XML data, and process it yourself. This is intended for intermediate developers who are willing to learn a little bit of low level structure of .xlsx files, and who are happy processing XML in java. Its relatively simple to use, but requires a basic understanding of the file structure. The advantage provided is that you can read a XLSX file with a relatively small memory footprint.

对于输出,博客文章中描述了一种可能的方法流式传输 xlsx 文件.(基本上,使用 XSSF 生成容器 XML 文件,然后将实际内容作为纯文本流式传输到 xlsx zip 存档的相应 xml 部分.)

For output, one possible approach is described in the blog post Streaming xlsx files. (Basically, use XSSF to generate a container XML file, then stream the actual content as plain text into the appropriate xml part of the xlsx zip archive.)

这篇关于处理大型 xlsx 文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆