Apache POI 如何添加自定义 DataFormatter 以将 13 位整数作为字符串而不是数字处理 [英] Apache POI How to add a custom DataFormatter for handling 13 digit integers as strings, not numbers

查看:41
本文介绍了Apache POI 如何添加自定义 DataFormatter 以将 13 位整数作为字符串而不是数字处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在构建一个将 XLSX 转换为 CSV 文件的 XLSX 处理器.因为文件可能会变得很大,我使用基于事件的方法使用 XSSFSheetXMLHandler

这工作得很好,但我的 XLSX 文件包含很长的数字(13 位数字),它们是唯一的标识号,而不是实数.在 Windows 机器上运行我的代码时,它可以正确提取数字,但在 Linux 机器上运行时,它会将其转换为 E-notation.

例如:源值是 7401075293087.在 Windows 上,这被正确提取到我的 CSV 中,但在 Linux 上,该值是 7.40108E+12

XSSFSheetXMLHandler 的问题在于它在幕后读取 XLSX,然后抛出由您需要实现的 SheetContentsHandler 捕获的事件.SheetContentsHandler 中的一个方法是带有签名的单元格方法:cell(String cellReference, String formattedValue, XSSFComment comment)

如您所见,此方法已收到格式化的单元格(因此在我的情况下它收到7.40108E+12").所有其余的逻辑都在幕后进行.

根据我的调查,我认为解决方案在于定义一个自定义 DataFormatter,它将专门将 13 位整数视为字符串,而不是将它们格式化为 E-notation.

不幸的是,我的计划没有按预期进行,而且我无法在线找到帮助.下面是我的代码的摘录.我在 processSheet 方法中尝试了以下内容:

 Locale locale = new Locale.Builder().setLanguage("en").setRegion("ZA").build();DataFormatter formatter = new DataFormatter(locale);格式 format = new MessageFormat("{0,number,full}");formatter.addFormat("#############", 格式);

这是我的代码的摘录:

代码主体:

 public void process(String Filename)throws IOException, OpenXML4JException, ParserConfigurationException, SAXException {ReadOnlySharedStringsTable strings = new ReadOnlySharedStringsTable(this.xlsxPackage);XSSFReader xssfReader = new XSSFReader(this.xlsxPackage);StylesTable 样式 = xssfReader.getStylesTable();XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator) xssfReader.getSheetsData();而 (iter.hasNext()) {InputStream stream = iter.next();String sheetName = iter.getSheetName();outStream = new FileOutputStream(文件名);logger.info(sheetName);this.output = new PrintWriter(文件名);processSheet(样式,字符串,新的 SheetToCSV(),流);logger.info("已完成工作表:"+sheetName);输出.flush();流.关闭();outStream.close();输出关闭();++索引;}}public void processSheet(StylesTable style,ReadOnlySharedStringsTable strings,SheetContentsHandler sheetHandler, InputStream sheetInputStream)抛出 IOException、ParserConfigurationException、SAXException {InputSource sheetSource = new InputSource(sheetInputStream);尝试 {XMLReader sheetParser = SAXHelper.newXMLReader();ContentHandler handler = new XSSFSheetXMLHandler(styles, null, strings, sheetHandler, formatter, false);sheetParser.setContentHandler(handler);sheetParser.parse(sheetSource);} catch(ParserConfigurationException e) {throw new RuntimeException("SAX 解析器似乎坏了 - " + e.getMessage());}}

这是自定义处理程序:

私有类 SheetToCSV 实现 SheetContentsHandler {私有布尔 firstCellOfRow = false;私有 int currentRow = -1;私人 int currentCol = -1;private void outputMissingRows(int number) {for (int i=0; i

解决方案

如果文件是用 Excel 生成的,并且包含 13 位数字的单元格使用数字格式 0 格式化,则无法重现#not General.

但是在 Linux 机器上运行"是什么意思?如果我使用 Libreoffice Calc 创建 *.xlsx 文件,其中包含使用数字格式 General 格式化的 13 位数字的单元格,则 Calc 会将它们显示为 13 位数字,但 Excel 不会.要在 Excel 中显示 13 位数字,单元格必须使用数字格式 0# 进行格式化.

apache poi DataFormatter 可以像 Excel 那样工作.当使用 General 格式化时,Excel 将 12 位以上的值显示为科学记数法.

您可以使用以下方法更改此行为:

<预><代码>...public void processSheet(样式表样式,ReadOnlySharedStringsTable 字符串,SheetContentsHandler sheetHandler,InputStream sheetInputStream) 抛出 IOException, SAXException {DataFormatter formatter = new DataFormatter();formatter.addFormat("General", new java.text.DecimalFormat("#.###############"));...

I'm building a XLSX processor that transforms a XLSX into a CSV file. Because the files can get quite big, I'm using the event-based approach using XSSFSheetXMLHandler

This works perfectly fine, but my XLSX files contains long numbers (13 digits) which are unique identification numbers, not real numbers. When running my code on a Windows machine it correctly extracts the numbers, but when running on a Linux machine it converts it to E-notation.

For example: the source value is 7401075293087. On windows this is correctly extracted into my CSV, but on Linux the value comes through as 7.40108E+12

The problem with the XSSFSheetXMLHandler is that it reads the XLSX under the covers and then throws events that are caught by a SheetContentsHandler that you need to implement. Once of the method in the SheetContentsHandler is a cell method with the signature: cell(String cellReference, String formattedValue, XSSFComment comment)

As your can see, this method already received the formatted cell (so in my case it receives "7.40108E+12"). All the rest of the logic happens under the covers.

Based on my investigations I believe the solution lies in defining a custom DataFormatter that will specifically treat 13 digit integers as a string, instead of formatting them as E-notation.

Unfortunately my plan didn't work as expected and I couldn't find an help online. Below is an extract of my code. I tried the following in the processSheet method:

     Locale locale = new Locale.Builder().setLanguage("en").setRegion("ZA").build(); 
     DataFormatter formatter = new DataFormatter(locale);
     Format format = new MessageFormat("{0,number,full}");
     formatter.addFormat("#############", format);

Here's an extract of my code:

The main body of the code:

 public void process(String Filename)throws IOException, OpenXML4JException, ParserConfigurationException, SAXException {
     ReadOnlySharedStringsTable strings = new ReadOnlySharedStringsTable(this.xlsxPackage);
     XSSFReader xssfReader = new XSSFReader(this.xlsxPackage);
     StylesTable styles = xssfReader.getStylesTable();
     XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator) xssfReader.getSheetsData();
     while (iter.hasNext()) {
          InputStream stream = iter.next();
          String sheetName = iter.getSheetName();
          outStream = new FileOutputStream(Filename);
          logger.info(sheetName);
          this.output = new  PrintWriter(Filename);
          processSheet(styles, strings, new SheetToCSV(), stream);
          logger.info("Done with Sheet   :"+sheetName);
          output.flush();
          stream.close();
          outStream.close();
          output.close();
         ++index; 
     }
 } 

 public void processSheet(StylesTable styles,ReadOnlySharedStringsTable strings,SheetContentsHandler sheetHandler, InputStream sheetInputStream)
         throws IOException, ParserConfigurationException, SAXException {

     InputSource sheetSource = new InputSource(sheetInputStream);
     try {
         XMLReader sheetParser = SAXHelper.newXMLReader();
         ContentHandler handler = new XSSFSheetXMLHandler(styles, null, strings, sheetHandler, formatter, false);
         sheetParser.setContentHandler(handler);
         sheetParser.parse(sheetSource);
      } catch(ParserConfigurationException e) {
         throw new RuntimeException("SAX parser appears to be broken - " + e.getMessage());
      }
 }

And here's the custom handler:

private class SheetToCSV implements SheetContentsHandler {
         private boolean firstCellOfRow = false;
         private int currentRow = -1;
         private int currentCol = -1;

     private void outputMissingRows(int number) {

         for (int i=0; i<number; i++) {
             for (int j=0; j<minColumns; j++) {
                 output.append(',');
             }
             output.append('\n');
         }
     }

     public void startRow(int rowNum) {
         // If there were gaps, output the missing rows
         outputMissingRows(rowNum-currentRow-1);
         // Prepare for this row
         firstCellOfRow = true;
         currentRow = rowNum;
         currentCol = -1;
     }

     public void endRow(int rowNum) {
         // Ensure the minimum number of columns
         for (int i=currentCol; i<minColumns; i++) {
             output.append(',');
         }
         output.append('\n');
     }

     public void cell(String cellReference, String formattedValue,
             XSSFComment comment) {
         logger.info("CellRef :: Formatted Value   :"+cellReference+" :: "+formattedValue);              
         if (firstCellOfRow) {
             firstCellOfRow = false;
         } else {
             output.append(',');
         }

         // gracefully handle missing CellRef here in a similar way as XSSFCell does
         if(cellReference == null) {
             cellReference = new CellRangeAddress(currentRow, currentCol, currentCol, currentCol).formatAsString();
         }

         // Did we miss any cells?
         int thisCol = (new CellReference(cellReference)).getCol();
         int missedCols = thisCol - currentCol - 1;
         for (int i=0; i<missedCols; i++) {
             output.append(',');
         }
         currentCol = thisCol;

         // Number or string?
         try {
             Double.parseDouble(formattedValue);
             output.append(formattedValue);
         } catch (NumberFormatException e) {
             //formattedValue = formattedValue.replaceAll("\\t", "");
             //formattedValue = formattedValue.replaceAll("\\n", "");
             //formattedValue = formattedValue.trim();
             output.append('"');
             output.append(formattedValue.replace("\"", "\\\"").trim());
             output.append('"');
         }
     }

     public void headerFooter(String text, boolean isHeader, String tagName) {
         // Skip, no headers or footers in CSV
     }

    @Override
    public void ovveriddenFormat(String celRef, int formatIndex,
            String formatedString) {
        // TODO Auto-generated method stub

    }

 }

解决方案

Cannot reproducing if the file is generated using Excel and the cells containing the 13 digit numbers are formatted using number format 0 or #, not General.

But what is meant with "running on a Linux machine"? If I am creating the *.xlsx file using Libreoffice Calc having the cells containing the 13 digit numbers formatted using number format General, then Calc will showing them as 13 digit numbers but Excel will not. For showing the numbers 13 digit in Excel the cells must be formatted using number format 0 or #.

The apache poi DataFormatter is made to work like Excel would do. And Excel shows values from 12 digits on as scientific notation when formatted using General.

You could changing this behavior using:

...
    public void processSheet(
            StylesTable styles,
            ReadOnlySharedStringsTable strings,
            SheetContentsHandler sheetHandler, 
            InputStream sheetInputStream) throws IOException, SAXException {
        DataFormatter formatter = new DataFormatter();
        formatter.addFormat("General", new java.text.DecimalFormat("#.###############"));
...

这篇关于Apache POI 如何添加自定义 DataFormatter 以将 13 位整数作为字符串而不是数字处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆