Apache POI Excel 工作簿创建需要很长时间 [英] Apache POI Excel workbook creation taking a long time

查看:45
本文介绍了Apache POI Excel 工作簿创建需要很长时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我注意到 Apache POI v3.10 的 xlsx 文件的工作簿创建语句,例如`

I've noticed that the workbook creation statement for xlsx files with Apache POI v3.10 e.g. `

Workbook wb = WorkbookFactory.create(inputStream) 

Workbook wb = new XSSFWorkbook(inputStream)

...需要很长时间(约 30 秒)并且文件只有 72 行和 10 列 (365KB).

...is taking a long time (~30 seconds) and the file only has 72 rows with 10 columns (365KB).

这不是问题,只是看起来有点过分了.我想知道我是否做错了什么或没有做我应该做的事情.实例化具有相同数据(但只有 25KB)的 xls 文件只需要 1 或 2 秒.如果这是正常的,那么有人可以告诉我.

It's not a problem, but it just seems a bit excessive. I'm wondering if I'm doing anything wrong or not doing something I should be doing. Instantiation of an xls file with the same data (but only 25KB) only takes 1 or 2 seconds. If this is normal, then could someone just let me know.

这是我正在使用的工作簿创建代码:

This is the workbook creation code I'm using:

LOG.info("Loading Excel Workbook...");
Workbook workbook;
try {
    workbook = WorkbookFactory.create(dataStream);
} catch (InvalidFormatException e) {
    throw new IOException("Invalid file format ==> " + e.getMessage());
}
LOG.info("Workbook loaded.");

明确地说,dataStream 是一个 InputStream.30 秒延迟发生在第一个和第二个日志语句之间.正如我之前所说,我已经尝试用 new XSSFWorkbook(dataStream) 替换工厂,但延迟仍然存在.

Just to be clear, dataStream is an InputStream. The 30 second delay occurs between the first and second log statements. As I said previously, I've tried replacing the factory with new XSSFWorkbook(dataStream) but the delay remains.

Edit-2:

我运行了一个独立的测试,除了使用 1) 一个 File 和 2) 一个 InputStream 的工作簿初始化之外什么都不做,其中源是 xlsx 文件 I'一直有问题.他们都在大约 2 秒内完成.

I ran a standalone test which does nothing except the workbook initialization using 1) a File, and also 2) an InputStream where the source is the xlsx file I've been having trouble with. They both completed in ~2 seconds.

我应该早点添加一些背景.我正在使用 Google App Engine.我提供给 POI 的输入流是从上传到服务器的文件中检索的.App Engine 不支持 Servlet 3.0(用于处理文件上传),因此我必须使用 Apache Commons FileUpload lib 来检索文件数据.最终,我得到的数据是从 FileItemStream#openStream().这就是我提供给 POI 的东西.

I should have added some background earlier. I'm using the Google App Engine. The input stream that I'm giving to POI is retrieved from a file upload to the server. App Engine doesn't support Servlet 3.0 (for handling file uploads) so I have to use Apache Commons FileUpload lib to retrieve the file data. Ultimately, the data I get is an InputStream retrieved from FileItemStream#openStream(). This is what I supply to POI.

所以,我不知道这是 App Engine 的问题,还是 POI 不喜欢 FileItemStream 返回的 InputStream 的风格.顺便说一句,我不能尝试使用 File 而不是 InputStream 进行初始化,因为 App Engine 不允许写入文件系统.

So, I don't know if this is a problem with the App Engine, or if POI doesn't like the flavor of the InputStream that FileItemStream is returning. Incidentally, I cannot try the initialization with a File instead of a InputStream because App Engine doesn't allow writes to the file system.

推荐答案

我会使用可用的分析工具之一进行一些分析,例如JVisualVM、Dynatrace、JProfiler、..

I would do some profiling using one of the available profiling tools, e.g. JVisualVM, Dynatrace, JProfiler, ..

只有这样你才能确定你的代码把时间花在了哪里,毕竟它可能是一些意想不到的地方,你会在这里追错马.

Only then you know for sure where the time is spent in your code, it might be some unexpected place after all and you would be chasing after the wrong horse here.

即您可能会从其他地方收到 InputStream,它实际上可能是通过 Internet 从某些外部内容下载的,并且线路可能很慢,因此所有阅读都需要很长时间.或者可能是由于磁盘设置或内存不足导致大量 GC 正在运行,因为您已接近限制,...

I.e. you might receive the InputStream from somewhere else and it might be actually a download from some external content via the Internet and the line might be slow and thus all the reading just takes ages. Or it might be something with the disk-setup or memory shortage where lots of GC is running because you are near the limit, ...

另一种选择是提取尽可能小的代码片段来重现这一点,然后您可以查看还需要删除哪些内容以使其运行得更快.

One other option would be to extract the smallest possible snippet of code which reproduces this, then you can see what else you need to remove to make it run faster.

这篇关于Apache POI Excel 工作簿创建需要很长时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆