Excel 到文本转换正确处理公式和空单元格 [英] Excel to text conversion properly handle formula and empty cells

查看:22
本文介绍了Excel 到文本转换正确处理公式和空单元格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试通过 Apache POI 将 Excel 文件转换为制表符分隔的文本文件.excel有一些用公式格式化的单元格和一些空单元格.

I'm trying to convert an excel file into a tab separated text file via Apache POI. The excel has some cells formatted with formulas and some empty cells.

这是原始 Excel 文件的示例:

Here's a sample of original excel file:

这是最终输出的摘录:

<代码> 'US' 'USORACLEAP' SYSTEMREFERENCE供应商ID SUPPLIERNAME分类VENDOR_SITE_CODE SUPPLIERADDRESS1 SUPPLIERADDRESS2 STATE ZIPCODE COUNTRY SOURCE INVOICENUM INVOICEDATE PAYMENTDATE LINE_DESC GL_COMPANY GL_CODE GL_DESCR COSTCENTER CC_DESCR CURRENCY_CODE CHECK_NUMBER NUM_DOCS SPEND TERM PAYMENT_METHOD SYSTEM_APPROVED PO_DISTRIBUTION_ID WALKER_COST_CENTER RGL_LEDGER_ENTITYUS US Oracle AP RANDBETWEEN(3000,100000)"TEXT"&D2 VENDOR"TEXT"&D3"TEXT"&D3"TEXT"&D3 ONTARIO RIGHT(D2,5)美国TEXT"&D3TEXT"&D3 RANDBETWEEN(43831, 44150) RANDBETWEEN(44105,44135) TEXT"&D3 RIGHT(000"),(9RANDBETWEEN)3) RANDBETWEEN(55000, 60000)"TEXT"&D3"TEXT"&D3"TEXT"&D3 USD RANDBETWEEN(2000000,2100000) RANDBETWEEN(1,01000))/100 立即检查文本"&D2 X2US US Oracle AP 31836TEXT"&D3 1099TEXT"&D4TEXT"&D4TEXT"&D4 NY RIGHT(D3,5) USTEXT";&D4 "文本"&D4 RANDBETWEEN(43831,44150) RANDBETWEEN(44105,44135)"文本"&D4 RIGHT("000"&RANDBETWEEN), RANDBETWEEN(44105,44135), RANDBETWEEN(1,005, 44135),60000) "TEXT "&D4 &D4 &D4 "TEXT &D4 USD RANDBETWEEN(2000000,2100000) RANDBETWEEN(1,4) RANDBETWEEN(1,10100000)GSUEDCM03 AF2美国美国 Oracle AP 3504TEXT"&D4 VENDORTEXT"&D5TEXT"&D5TEXT"&D5 NY RIGHT(D4,5) USTEXT";&D5 "文本"&D5 RANDBETWEEN(43831,44150) RANDBETWEEN(44105,44135)"文本"&D5 RIGHT("000"&RANDBETWEEN), RANDBETWEEN(44105,44135), RANDBETWEEN(1,005, 44135),60000) "TEXT "&D5 &D5 "TEXT "&D5 "TEXT &D5 USD RANDBETWEEN(2000000,2100000) RANDBETWEEN(1,4) RANDBETWEEN(1,10100000)GSUEIT001 AF3美国美国 Oracle AP 3504TEXT"&D5 VENDORTEXT"&D6TEXT"&D6TEXT"&D6 NY RIGHT(D5,5) USTEXT";&D6 "文本"&D6 RANDBETWEEN(43831,44150) RANDBETWEEN(44105,44135)"文本"&D6 RIGHT("000"&RANDBETWEEN), RANDBETWEEN(44105,44135), RANDBETWEEN(1,005, 44135),60000) "TEXT "&D6 &D6 &D6 "TEXT &D6 USD RANDBETWEEN(2000000,2100000) RANDBETWEEN(1,4) RANDBETWEEN(1,10100000)GSUEIT001 AF4美国美国 Oracle AP 3504 TEXT &D6 VENDOR"TEXT &D7 TEXT"&D7 TEXT"&D7 NY RIGHT(D6,5) US TEXT";&D7 "文本"&D7 RANDBETWEEN(43831,44150) RANDBETWEEN(44105,44135)"文本"&D7 RIGHT("000"&RANDBETWEEN), RANDBETWEEN(44105,44135), RANDBETWEEN(1,005, 44135),60000) "TEXT "&D7 &D7 &D7 "TEXT &D7 USD RANDBETWEEN(2000000,2100000) RANDBETWEEN(1,4) RANDBETWEEN(1,10100000)GSUEIT001 AF5

如您所见,第一行代表列标题.某些单元格 (D1) 已转换为实际公式.第三列没有任何值,因此整个内容在文本文件中向左移动.

As you can see, the 1st row represents column headers. Some of the cells (D1) have been converted to the actual formula. The 3rd column doesn't have any values so the whole content shifted towards left in the text file.

代码如下:

private void convertXlsToText(InputStream inputStream, String delimiter, File targetFile) throws IOException {
        StringBuilder sb = new StringBuilder();
        setMinInflateRatio(0);
        try (Workbook wb = create(inputStream)) {
            Sheet firstSheet = wb.getSheetAt(0);

            for (Row nextRow : firstSheet) {
                Iterator<Cell> cellIterator = nextRow.cellIterator();
                while (cellIterator.hasNext()) {
                    Cell cell = cellIterator.next();
                    switch (cell.getCellType()) {
                        case STRING:
                            sb.append(cell.getStringCellValue()).append(delimiter);
                            break;
                        case BOOLEAN:
                            sb.append(cell.getBooleanCellValue()).append(delimiter);
                            break;
                        case NUMERIC:
                            sb.append(cell.getNumericCellValue()).append(delimiter);
                            break;
                        case FORMULA:
                            sb.append(cell.getCellFormula()).append(delimiter);
                            break;
                        default:
                            sb.append(EMPTY).append(delimiter);
                    }
                }
                sb.append(DEFAULT_LINE_END);
            }
        }

        dumpStringBuilderToFile(sb, targetFile);
    }

有人可以指出我应该在我的代码中进行哪些更改以修复对齐和公式问题吗?PS:我使用 TAB (\t) 作为分隔符.

Can someone please point out what changes should i be making in my code to fix the alignments and the formula issue? PS: I'm using TAB (\t) as my delimiter.

更新:这是建议后的更新代码.

UPDATE: Here's the updated code after suggestions.

    private void convertXlsToText(InputStream inputStream, String delimiter, File targetFile) throws IOException {
        StringBuilder sb = new StringBuilder();
        setMinInflateRatio(0);
        try (Workbook wb = create(inputStream)) {
            Sheet firstSheet = wb.getSheetAt(0);
            FormulaEvaluator evaluator = wb.getCreationHelper().createFormulaEvaluator();
            DataFormatter formatter = new DataFormatter();
            for (Row nextRow : firstSheet) {
                Iterator<Cell> cellIterator = nextRow.cellIterator();
                while (cellIterator.hasNext()) {
                    Cell cell = cellIterator.next();
                    if (cell != null) {
                        sb.append(format("%-20s", formatter.formatCellValue(cell, evaluator))).append(delimiter);
                    } else {
                        sb.append(format("%-20s", EMPTY)).append(delimiter);
                    }
                }
                sb.append(DEFAULT_LINE_END);
            }
        }

        dumpStringBuilderToFile(sb, targetFile);
    }

推荐答案

如果要求将 Excel 数据写入文本文件,则所有单元格值都需要获取为 String.一个方便的方法是使用 DataFormatter<apache poi 的/a>.使用 DataFormatter,您将获得在 Excel 工作表中显示的单元格值.例如.具有数字格式和日期格式.如果您将 DataFormatterFormulaEvaluator 一起使用,则会对公式进行评估,并将评估值转换为 String.

If the requirement is writing Excel data into a text file, then all cell values needs to be get as String. A convenient way to do so is using DataFormatter of apache poi. Using DataFormatter you will get cell values as they are shown in Excel sheets. E.g. having number formats and date formats. And if you are using DataFormatter together with a FormulaEvaluator then formulas get evaluated and evaluated values are converted to String.

为了避免丢失空单元格,首先需要获取单元格计数,因为单元格迭代器将跳过空单元格.例如,标题行中的单元格计数也将是每行的单元格 cont.

To avoid missing empty cells one needs to get the cells count first, because the cell iterator will skip empty cells. For example the cells count from the header row will be the cells cont for each further row also.

所以整个代码会像这样简单:

So the whole code would be as simple as this:

import org.apache.poi.ss.usermodel.*;
import java.io.*;

class ExcelToText {
 static final String DEFAULT_LINE_END = System.getProperty("line.separator");

 static void convertXlsToText(InputStream inputStream, String delimiter, OutputStream outputStream) throws Exception {
  StringBuilder sb = new StringBuilder();
  Workbook workbook = WorkbookFactory.create(inputStream);
  DataFormatter dataFormatter = new DataFormatter(java.util.Locale.US);
  FormulaEvaluator formulaEvaluator = workbook.getCreationHelper().createFormulaEvaluator();
  String cellValue = "";
  Sheet sheet = workbook.getSheetAt(0);
  Row headerRow = sheet.getRow(0);
  int cellCount = 0;
  if (headerRow != null) {
   cellCount = headerRow.getLastCellNum();
  }
  if (cellCount > 0) {
   for (Row row : sheet) {
    for (int c = 0; c < cellCount; c++) {
     Cell cell = row.getCell(c, Row.MissingCellPolicy.CREATE_NULL_AS_BLANK);
     cellValue = dataFormatter.formatCellValue(cell, formulaEvaluator);
     sb.append(cellValue);
     if (c < cellCount-1) sb.append(delimiter);
    }
    sb.append(DEFAULT_LINE_END);
   }
  }
  workbook.close();
  BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(outputStream, java.nio.charset.StandardCharsets.UTF_8));
  bw.append(sb);
  bw.flush();
  bw.close();
 }

 public static void main(String[] args) throws Exception {
  convertXlsToText(new FileInputStream("./Excel.xlsx"), "\t", new FileOutputStream("./Data.txt"));
 }
}

无需CellType检查和额外的公式评估.

No CellType checking and extra formula evaluating needed.

对于您的其他要求:分隔文本文件应仅包含用分隔符分隔的真实内容.不应该有内容操纵.因此,在我看来,为内容添加空格或填充特定宽度的空格并不是一个好主意.例如,如果您将制表符作为分隔符,则只有在文本查看器中设置的制表符位置才会影响视图.补充添加的空格只会打扰.

To your other requrement: A delimeted text file should only contain real content delimited with the delimiter. There should not be content manipulation. So prepending spaces to the content or filling up with spaces up to a special width is not a good idea in my opinion. If you have tabulator as the delimiter for example, then only tabulator positions set in the text viewer should affect the view. Supplementary added spaces will only disturb.

这篇关于Excel 到文本转换正确处理公式和空单元格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆