解析超大 Excel 2007 文件的最佳语言 [英] Best language to parse extremely large Excel 2007 files

查看:23
本文介绍了解析超大 Excel 2007 文件的最佳语言的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的老板习惯于对我们的数据库执行查询,这些数据库返回数万行并将它们保存到 excel 文件中.作为实习生,我经常需要编写脚本来处理这些文件中的信息.到目前为止,我已经尝试了 VBScript 和 Powershell 来满足我的脚本需求.这两者都可能需要几分钟才能执行最简单的任务,这意味着脚本完成后将花费一天 8 小时的大部分时间.

My boss has a habit of performing queries on our databases that return tens of thousands of rows and saving them into excel files. I, being the intern, constantly have to write scripts that work with the information from these files. Thus far I've tried VBScript and Powershell for my scripting needs. Both of these can take several minutes to perform even the simplest of tasks, which would mean that the script when finished would take most of an 8 hour day.

我现在的解决方法只是编写一个 PowerShell 脚本,从 xlsx 文件中删除所有逗号和换行符,将 .xlsx 文件保存到 .csv,然后让 Java 程序处理数据收集和输出,完成后让我的脚本清理 .csv 文件.对于我当前的项目,这会在几秒钟内运行,但我不禁想知道我的下一个项目是否有更优雅的替代方案.有什么建议吗?

My workaround right now is simply to write a PowerShell script that removes all of the commas and newline characters from an xlsx file, saves the .xlsx files to .csv, and then have a Java program handle the data gathering and output, and have my script clean up the .csv files when finished. This runs in a matter of seconds for my current project, but I can't help but wonder if there's a more elegant alternative for my next one. Any suggestions?

推荐答案

在处理 .xlsx 文件时,我不断收到各种奇怪的错误.

I kept getting all kinds of weird errors when working with .xlsx files.

这是一个使用 Apache POI 遍历 .xlsx 文件的简单示例.另请参阅升级到 POI 3.5,包括将现有的 HSSF Usermodel 代码转换为 SS Usermodel(对于 XSSF 和HSSF).

Here's a simple example of using Apache POI to traverse an .xlsx file. See also Upgrading to POI 3.5, including converting existing HSSF Usermodel code to SS Usermodel (for XSSF and HSSF).

import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.DateUtil;
import org.apache.poi.ss.usermodel.FormulaEvaluator;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

public class XlsxReader {

    public static void main(String[] args) throws IOException {
        InputStream myxls = new FileInputStream("test.xlsx");
        Workbook book = new XSSFWorkbook(myxls);
        FormulaEvaluator eval =
            book.getCreationHelper().createFormulaEvaluator();
        Sheet sheet = book.getSheetAt(0);
        for (Row row : sheet) {
            for (Cell cell : row) {
                printCell(cell, eval);
                System.out.print("; ");
            }
            System.out.println();
        }
        myxls.close();
    }

    private static void printCell(Cell cell, FormulaEvaluator eval) {
        switch (cell.getCellType()) {
            case Cell.CELL_TYPE_BLANK:
                System.out.print("EMPTY");
                break;
            case Cell.CELL_TYPE_STRING:
                System.out.print(cell.getStringCellValue());
                break;
            case Cell.CELL_TYPE_NUMERIC:
                if (DateUtil.isCellDateFormatted(cell)) {
                    System.out.print(cell.getDateCellValue());
                } else {
                    System.out.print(cell.getNumericCellValue());
                }
                break;
            case Cell.CELL_TYPE_BOOLEAN:
                System.out.print(cell.getBooleanCellValue());
                break;
            case Cell.CELL_TYPE_FORMULA:
                System.out.print(cell.getCellFormula());
                break;
            default:
                System.out.print("DEFAULT");
        }
    }
}

这篇关于解析超大 Excel 2007 文件的最佳语言的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆