最佳语言来解析极大的Excel 2007文件 [英] Best language to parse extremely large Excel 2007 files

查看:108
本文介绍了最佳语言来解析极大的Excel 2007文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的老板有一种习惯,对我们的数据库执行查询,返回数万行,并将其保存到excel文件中。我作为实习生,不断地编写与这些文件信息一起使用的脚本。到目前为止,我已经尝试了VBScript和Powershell的脚本需求。这两个都可能需要几分钟的时间来执行即使最简单的任务,这意味着完成后的脚本大部分时间将是8小时。



我的解决方法现在只需编写一个从xlsx文件中删除所有逗号和换行符的PowerShell脚本,将.xlsx文件保存到.csv,然后让Java程序处理数据收集和输出,并使我的脚本清理完成后的.csv文件。这对我目前的项目来说只有几秒钟的时间,但是我不禁想到,下一个是否有更优雅的选择。任何建议?

解决方案


使用.xlsx文件时,我不断收到各种奇怪的错误。


以下是使用 Apache POI ,以遍历 .xlsx 文件。另请参见 升级到POI 3.5,包括将现有的HSSF Usermodel代码转换为SS Usermodel(适用于XSSF和HSSF)

  import java.io.FileInputStream; 
import java.io.IOException;
import java.io.InputStream;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.DateUtil;
import org.apache.poi.ss.usermodel.FormulaEvaluator;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

public class XlsxReader {

public static void main(String [] args)throws IOException {
InputStream myxls = new FileInputStream(test.xlsx);
工作簿book = new XSSFWorkbook(myxls);
FormulaEvaluator eval =
book.getCreationHelper()。createFormulaEvaluator();
Sheet sheet = book.getSheetAt(0);
for(Row row:sheet){
for(Cell cell:row){
printCell(cell,eval);
System.out.print(;);
}
System.out.println();
}
myxls.close();
}

private static void printCell(Cell cell,FormulaEvaluator eval){
switch(cell.getCellType()){
case Cell.CELL_TYPE_BLANK:
System.out.print(EMPTY);
break;
case Cell.CELL_TYPE_STRING:
System.out.print(cell.getStringCellValue());
break;
case Cell.CELL_TYPE_NUMERIC:
if(DateUtil.isCellDateFormatted(cell)){
System.out.print(cell.getDateCellValue());
} else {
System.out.print(cell.getNumericCellValue());
}
break;
case Cell.CELL_TYPE_BOOLEAN:
System.out.print(cell.getBooleanCellValue());
break;
case Cell.CELL_TYPE_FORMULA:
System.out.print(cell.getCellFormula());
break;
默认值:
System.out.print(DEFAULT);
}
}
}


My boss has a habit of performing queries on our databases that return tens of thousands of rows and saving them into excel files. I, being the intern, constantly have to write scripts that work with the information from these files. Thus far I've tried VBScript and Powershell for my scripting needs. Both of these can take several minutes to perform even the simplest of tasks, which would mean that the script when finished would take most of an 8 hour day.

My workaround right now is simply to write a PowerShell script that removes all of the commas and newline characters from an xlsx file, saves the .xlsx files to .csv, and then have a Java program handle the data gathering and output, and have my script clean up the .csv files when finished. This runs in a matter of seconds for my current project, but I can't help but wonder if there's a more elegant alternative for my next one. Any suggestions?

解决方案

I kept getting all kinds of weird errors when working with .xlsx files.

Here's a simple example of using Apache POI to traverse an .xlsx file. See also Upgrading to POI 3.5, including converting existing HSSF Usermodel code to SS Usermodel (for XSSF and HSSF).

import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.DateUtil;
import org.apache.poi.ss.usermodel.FormulaEvaluator;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

public class XlsxReader {

    public static void main(String[] args) throws IOException {
        InputStream myxls = new FileInputStream("test.xlsx");
        Workbook book = new XSSFWorkbook(myxls);
        FormulaEvaluator eval =
            book.getCreationHelper().createFormulaEvaluator();
        Sheet sheet = book.getSheetAt(0);
        for (Row row : sheet) {
            for (Cell cell : row) {
                printCell(cell, eval);
                System.out.print("; ");
            }
            System.out.println();
        }
        myxls.close();
    }

    private static void printCell(Cell cell, FormulaEvaluator eval) {
        switch (cell.getCellType()) {
            case Cell.CELL_TYPE_BLANK:
                System.out.print("EMPTY");
                break;
            case Cell.CELL_TYPE_STRING:
                System.out.print(cell.getStringCellValue());
                break;
            case Cell.CELL_TYPE_NUMERIC:
                if (DateUtil.isCellDateFormatted(cell)) {
                    System.out.print(cell.getDateCellValue());
                } else {
                    System.out.print(cell.getNumericCellValue());
                }
                break;
            case Cell.CELL_TYPE_BOOLEAN:
                System.out.print(cell.getBooleanCellValue());
                break;
            case Cell.CELL_TYPE_FORMULA:
                System.out.print(cell.getCellFormula());
                break;
            default:
                System.out.print("DEFAULT");
        }
    }
}

这篇关于最佳语言来解析极大的Excel 2007文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆