将大文本文件数据写入excel [英] Write large text file data into excel

查看:75
本文介绍了将大文本文件数据写入excel的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在阅读一个用一些分隔符分隔的文本文件.

I am reading a text file separated with some delimiters.

我的文本文件内容示例

avc def efg jksjd
1 2 3 5
3 4 6 0

Avc def efg jksjd
1 2 3 5
3 4 6 0

一行一行并使用 hashmap 将其保存在内存中,其中行号作为整数类型的键,并且文本文件的每一行作为 List 对象

line by line and holding it in memory using hashmap having line numbers as key of integer type and each line of text file as List object

考虑一下,我的地图会存储这样的信息

Consider, my map would store information like this

整数列表

1 [Avc def efg jksjd]

1 [Avc def efg jksjd]

我正在使用 Apache POI 写入 excel.使用 Apache POI 写入 excel 时,我遵循这种方法,这是我的代码片段

I am using Apache POI to write into excel. When writing into excel using Apache POI, I am following this approach, here is my code snippet

HSSFWorkbook workbook = new HSSFWorkbook();
HSSFSheet sheet = workbook.createSheet("Sample sheet");
Map<Integer, List<Object>> excelDataHolder = new LinkedHashMap<Integer, List<Object>>();
int rownum = 0;
for (Integer key : keyset) {
            Row row = sheet.createRow(rownum++);
            List<Object> objList = excelHolder.get(key);//excelHolder is my map
            int cellnum = 0;
            for (Object obj : objList) {
                Cell cell = row.createCell(cellnum++);
                    cell.setCellValue((Date) obj);
            }
}

如果要写入 excel 的行数/记录数较少,则此方法效果很好.想象一下,如果记录以十亿计,或者如果文本文件有更多行,假设为 100 000.我想,我的方法失败了,因为createRow 和 createCell 在堆中创建超过 100 000 个对象.无论java到excel api,我认为写入它(excel)是基于相同的方法,即如上所示的集合迭代.我也用 aspose 做了一些例子,结果我猜 aspose 也有同样的问题.

This works quite well if the number of lines/records to be written into excel are less. Imagine, if the records are in billion number or if the text file has more lines assume in 100 000. I think, my approach fails, because createRow and createCell creates more than 100 000 objects in heap. Whatever the java to excel api, I think writing into it(excel) is based on the same approach i.e.., iteration of collection as shown above. I did some examples with aspose as well, as a result aspose also have the same problem I guess.

  • createRow 和 createCell 是否在每次调用时都创建新对象?
  • 如果是,替代方案是什么?.我将如何编写大数据以提高性能?

推荐答案

最新版本的 apache-poi 具有 sxssf.来自网站的无耻复制

A recent version of apache-poi has sxssf. Shameless copy from website

SXSSF(包:org.apache.poi.xssf.streaming)是一个API兼容的非常大的电子表格时使用的 XSSF 的流扩展必须生产,而且堆空间有限.SXSSF 达到其低通过限制对某个行内的行的访问来占用内存滑动窗口,而 XSSF 可以访问文档中的所有行.不再在窗口中的旧行变得不可访问,因为它们被写入磁盘.

SXSSF (package: org.apache.poi.xssf.streaming) is an API-compatible streaming extension of XSSF to be used when very large spreadsheets have to be produced, and heap space is limited. SXSSF achieves its low memory footprint by limiting access to the rows that are within a sliding window, while XSSF gives access to all rows in the document. Older rows that are no longer in the window become inaccessible, as they are written to the disk.

我曾用它来创建包含 150 万行的电子表格.

I had used it for creating spreadsheet with 1.5 million rows.

这篇关于将大文本文件数据写入excel的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆