Apache POI Excel工作簿的创建需要很长时间 [英] Apache POI Excel workbook creation taking a long time

查看:540
本文介绍了Apache POI Excel工作簿的创建需要很长时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我注意到带有Apache POI v3.10的xlsx文件的工作簿创建语句,例如`

I've noticed that the workbook creation statement for xlsx files with Apache POI v3.10 e.g. `

Workbook wb = WorkbookFactory.create(inputStream) 

Workbook wb = new XSSFWorkbook(inputStream)

...花费的时间很长(〜30秒),文件只有72行,每列10列(365KB).

...is taking a long time (~30 seconds) and the file only has 72 rows with 10 columns (365KB).

这不是问题,但似乎有点过头了.我想知道我是在做错什么还是没有做应该做的事情.具有相同数据(但仅25KB)的xls文件的实例化仅需要1或2秒.如果这是正常现象,那么有人可以让我知道.

It's not a problem, but it just seems a bit excessive. I'm wondering if I'm doing anything wrong or not doing something I should be doing. Instantiation of an xls file with the same data (but only 25KB) only takes 1 or 2 seconds. If this is normal, then could someone just let me know.

这是我正在使用的工作簿创建代码:

This is the workbook creation code I'm using:

LOG.info("Loading Excel Workbook...");
Workbook workbook;
try {
    workbook = WorkbookFactory.create(dataStream);
} catch (InvalidFormatException e) {
    throw new IOException("Invalid file format ==> " + e.getMessage());
}
LOG.info("Workbook loaded.");

请注意,dataStreamInputStream. 30秒的延迟发生在第一条日志语句和第二条日志语句之间.正如我之前所说,我曾尝试用new XSSFWorkbook(dataStream)替换工厂,但延迟仍然存在.

Just to be clear, dataStream is an InputStream. The 30 second delay occurs between the first and second log statements. As I said previously, I've tried replacing the factory with new XSSFWorkbook(dataStream) but the delay remains.

编辑2 :

我运行了一个独立测试,除了使用1)File和2)InputStream进行工作簿初始化外,该测试什么都不做,其中源是我一直遇到问题的xlsx文件.他们都在约2秒内完成.

I ran a standalone test which does nothing except the workbook initialization using 1) a File, and also 2) an InputStream where the source is the xlsx file I've been having trouble with. They both completed in ~2 seconds.

我应该早些添加一些背景知识.我正在使用Google App Engine.我提供给POI的输入流是从上传到服务器的文件中检索的. App Engine不支持Servlet 3.0(用于处理文件上传),因此我必须使用Apache Commons FileUpload lib来检索文件数据.最终,我得到的数据是从

I should have added some background earlier. I'm using the Google App Engine. The input stream that I'm giving to POI is retrieved from a file upload to the server. App Engine doesn't support Servlet 3.0 (for handling file uploads) so I have to use Apache Commons FileUpload lib to retrieve the file data. Ultimately, the data I get is an InputStream retrieved from FileItemStream#openStream(). This is what I supply to POI.

因此,我不知道这是否是App Engine的问题,或者POI是否不喜欢FileItemStream返回的InputStream的风格.顺便说一句,我无法使用File而不是InputStream进行初始化,因为App Engine不允许写入文件系统.

So, I don't know if this is a problem with the App Engine, or if POI doesn't like the flavor of the InputStream that FileItemStream is returning. Incidentally, I cannot try the initialization with a File instead of a InputStream because App Engine doesn't allow writes to the file system.

推荐答案

我会使用一种可用的分析工具进行一些分析,例如JVisualVM,Dynatrace,JProfiler,..

I would do some profiling using one of the available profiling tools, e.g. JVisualVM, Dynatrace, JProfiler, ..

只有这样,您才能确定时间在代码中所花费的时间,毕竟这可能是一个出乎意料的地方,您将在这里追逐错误的马.

Only then you know for sure where the time is spent in your code, it might be some unexpected place after all and you would be chasing after the wrong horse here.

即您可能会从其他地方接收到InputStream,并且实际上可能是通过Internet从某些外部内容下载的内容,并且该线路的速度可能很慢,因此所有阅读都需要很长时间.否则可能是磁盘设置不足或内存不足而正在运行大量GC的原因,因为您已接近极限,...

I.e. you might receive the InputStream from somewhere else and it might be actually a download from some external content via the Internet and the line might be slow and thus all the reading just takes ages. Or it might be something with the disk-setup or memory shortage where lots of GC is running because you are near the limit, ...

另一种选择是提取可能重现该代码的最小代码段,然后您可以看到还需要删除哪些内容以使其运行更快.

One other option would be to extract the smallest possible snippet of code which reproduces this, then you can see what else you need to remove to make it run faster.

这篇关于Apache POI Excel工作簿的创建需要很长时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆