用于阅读的 Apache POI 流 (SXSSF) [英] Apache POI Streaming (SXSSF) for Reading

查看:63
本文介绍了用于阅读的 Apache POI 流 (SXSSF)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要读取大型 Excel 文件并将它们的数据导入我的应用程序.

I need to read large excel files and import their data to my application.

由于POI占用大量堆工作,经常抛出OutOfMemory错误,我发现有Streaming API用于串行处理excel数据时尚(而不是将文件完全加载到内存中)

Since POI takes up a large amount of heap to work, often throwing OutOfMemory errors, I found out that there is a Streaming API for handling excel data in a serial fashion (rather than loading the file completely into memory)

我创建了一个带有单个工作表的 xlsx 工作簿,并在单元格中输入了多个值,并想出了以下代码来尝试读取它:

I created a xlsx workbook, with a single worksheet, and typed in several values in cells and came up with the following code to attempt reading it:

public static void main(String[] args) throws Throwable {
    // keep 100 rows in memory, exceeding rows will be flushed to disk
    SXSSFWorkbook wb = new SXSSFWorkbook(new XSSFWorkbook(new FileInputStream("C:\\test\\tst.xlsx")));
    SXSSFSheet sheet = (SXSSFSheet) wb.getSheetAt(0);
    Row row = sheet.getRow(0);
    //row is always null
    while(row.iterator().hasNext()){ //-> NullPointerException
        System.out.println(row.getCell(0).getStringCellValue());
    }
}

然而,尽管能够正确获取其工作表,但它始终带有空 (null) 行.

However, despite being able to get its worksheets properly, it always comes with empty (null) rows.

我在互联网上研究并发现了几个 Streaming API 的例子,但它们都不是关于读取现有文件,它们都是关于生成 excel 文件.

I have researched and found out several examples of the Streaming API in the internet, but none of them are about reading existing files, they're all about generating excel files.

是否真的可以从流中现有的 .xlsx 文件中读取数据?

Is it actually possible to read data from existing .xlsx files in a stream?

推荐答案

在进一步挖掘之后,我发现了这个 图书馆:

After digging up some more, I found out this library:

如果您过去曾使用 Apache POI 读取 Excel 文件,您可能会注意到它的内存效率不是很高.读取整个工作簿会导致严重的内存使用高峰,这可能会对服务器造成严重破坏.

If you've used Apache POI in the past to read in Excel files, you probably noticed that it's not very memory efficient. Reading in an entire workbook will cause a severe memory usage spike, which can wreak havoc on a server.

Apache 必须读取整个工作簿的原因有很多,但其中大部分都与库允许您使用随机地址进行读写有关.如果(且仅当)您只想以快速且高效的内存方式读取 Excel 文件的内容,您可能不需要此功能.不幸的是,POI 库中唯一用于读取流式工作簿的内容需要您的代码使用类似 SAX 的解析器.该 API 中缺少所有友好的类,例如 Row 和 Cell.

There are plenty of good reasons for why Apache has to read in the whole workbook, but most of them have to do with the fact that the library allows you to read and write with random addresses. If (and only if) you just want to read the contents of an Excel file in a fast and memory effecient way, you probably don't need this ability. Unfortunately, the only thing in the POI library for reading a streaming workbook requires your code to use a SAX-like parser. All of the friendly classes like Row and Cell are missing from that API.

该库充当该流 API 的包装器,同时保留了标准 POI API 的语法.请继续阅读,看看它是否适合您.

This library serves as a wrapper around that streaming API while preserving the syntax of the standard POI API. Read on to see if it's right for you.

InputStream is = new FileInputStream(new File("/path/to/workbook.xlsx"));
StreamingReader reader = StreamingReader.builder()
        .rowCacheSize(100)    // number of rows to keep in memory (defaults to 10)
        .bufferSize(4096)     // buffer size to use when reading InputStream to file (defaults to 1024)
        .sheetIndex(0)        // index of sheet to use (defaults to 0)
        .sheetName("sheet1")  // name of sheet to use (overrides sheetIndex)
        .read(is);            // InputStream or File for XLSX file (required)

还有 SAX Event API,它读取文档并通过事件解析其内容.

There is also SAX Event API, which reads the document and parse its contents through events.

如果内存占用是一个问题,那么对于 XSSF,您可以获取底层 XML 数据并自行处理.这适用于愿意学习一些 .xlsx 文件的低级结构并乐于用 Java 处理 XML 的中级开发人员.它使用起来相对简单,但需要对文件结构有基本的了解.提供的优点是您可以以相对较小的内存占用读取 XLSX 文件.

If memory footprint is an issue, then for XSSF, you can get at the underlying XML data, and process it yourself. This is intended for intermediate developers who are willing to learn a little bit of low level structure of .xlsx files, and who are happy processing XML in java. Its relatively simple to use, but requires a basic understanding of the file structure. The advantage provided is that you can read a XLSX file with a relatively small memory footprint.

这篇关于用于阅读的 Apache POI 流 (SXSSF)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆