Apache POI如何添加自定义DataFormatter以将13位整数作为字符串而非数字进行处理 [英] Apache POI How to add a custom DataFormatter for handling 13 digit integers as strings, not numbers

查看:181
本文介绍了Apache POI如何添加自定义DataFormatter以将13位整数作为字符串而非数字进行处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在构建将XLSX转换为CSV文件的XLSX处理器.由于文件可能会变得很大,因此我正在使用基于事件的方法,即使用XSSFSheetXMLHandler

I'm building a XLSX processor that transforms a XLSX into a CSV file. Because the files can get quite big, I'm using the event-based approach using XSSFSheetXMLHandler

这很好用,但是我的XLSX文件包含长号(13位数字),这是唯一的标识号,而不是实数.在Windows计算机上运行我的代码时,它会正确提取数字,但是在Linux计算机上运行时,它将其转换为E表示法.

This works perfectly fine, but my XLSX files contains long numbers (13 digits) which are unique identification numbers, not real numbers. When running my code on a Windows machine it correctly extracts the numbers, but when running on a Linux machine it converts it to E-notation.

例如:源值为7401075293087.在Windows上,此值已正确提取到我的CSV中,但在Linux上,该值通过7.40108E + 12

For example: the source value is 7401075293087. On windows this is correctly extracted into my CSV, but on Linux the value comes through as 7.40108E+12

XSSFSheetXMLHandler的问题在于,它在幕后读取XLSX,然后引发需要实现的SheetContentsHandler捕获的事件. SheetContentsHandler中的方法之一是具有签名的单元格方法:cell(String cellReference,String formattedValue,XSSFComment comment)

The problem with the XSSFSheetXMLHandler is that it reads the XLSX under the covers and then throws events that are caught by a SheetContentsHandler that you need to implement. Once of the method in the SheetContentsHandler is a cell method with the signature: cell(String cellReference, String formattedValue, XSSFComment comment)

如您所见,此方法已经接收到格式化的单元格(因此,在我的情况下,它接收到"7.40108E + 12").其余所有逻辑都在后台进行.

As your can see, this method already received the formatted cell (so in my case it receives "7.40108E+12"). All the rest of the logic happens under the covers.

根据我的调查,我认为解决方案在于定义一个自定义DataFormatter,它将特定地将13位整数视为字符串,而不是将其格式化为E表示法.

Based on my investigations I believe the solution lies in defining a custom DataFormatter that will specifically treat 13 digit integers as a string, instead of formatting them as E-notation.

不幸的是,我的计划没有按预期工作,我无法在线找到帮助.下面是我的代码的一部分.我在processSheet方法中尝试了以下方法:

Unfortunately my plan didn't work as expected and I couldn't find an help online. Below is an extract of my code. I tried the following in the processSheet method:

     Locale locale = new Locale.Builder().setLanguage("en").setRegion("ZA").build(); 
     DataFormatter formatter = new DataFormatter(locale);
     Format format = new MessageFormat("{0,number,full}");
     formatter.addFormat("#############", format);

这是我的代码的一部分:

Here's an extract of my code:

代码主体:

 public void process(String Filename)throws IOException, OpenXML4JException, ParserConfigurationException, SAXException {
     ReadOnlySharedStringsTable strings = new ReadOnlySharedStringsTable(this.xlsxPackage);
     XSSFReader xssfReader = new XSSFReader(this.xlsxPackage);
     StylesTable styles = xssfReader.getStylesTable();
     XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator) xssfReader.getSheetsData();
     while (iter.hasNext()) {
          InputStream stream = iter.next();
          String sheetName = iter.getSheetName();
          outStream = new FileOutputStream(Filename);
          logger.info(sheetName);
          this.output = new  PrintWriter(Filename);
          processSheet(styles, strings, new SheetToCSV(), stream);
          logger.info("Done with Sheet   :"+sheetName);
          output.flush();
          stream.close();
          outStream.close();
          output.close();
         ++index; 
     }
 } 

 public void processSheet(StylesTable styles,ReadOnlySharedStringsTable strings,SheetContentsHandler sheetHandler, InputStream sheetInputStream)
         throws IOException, ParserConfigurationException, SAXException {

     InputSource sheetSource = new InputSource(sheetInputStream);
     try {
         XMLReader sheetParser = SAXHelper.newXMLReader();
         ContentHandler handler = new XSSFSheetXMLHandler(styles, null, strings, sheetHandler, formatter, false);
         sheetParser.setContentHandler(handler);
         sheetParser.parse(sheetSource);
      } catch(ParserConfigurationException e) {
         throw new RuntimeException("SAX parser appears to be broken - " + e.getMessage());
      }
 }

这是自定义处理程序:

private class SheetToCSV implements SheetContentsHandler {
         private boolean firstCellOfRow = false;
         private int currentRow = -1;
         private int currentCol = -1;

     private void outputMissingRows(int number) {

         for (int i=0; i<number; i++) {
             for (int j=0; j<minColumns; j++) {
                 output.append(',');
             }
             output.append('\n');
         }
     }

     public void startRow(int rowNum) {
         // If there were gaps, output the missing rows
         outputMissingRows(rowNum-currentRow-1);
         // Prepare for this row
         firstCellOfRow = true;
         currentRow = rowNum;
         currentCol = -1;
     }

     public void endRow(int rowNum) {
         // Ensure the minimum number of columns
         for (int i=currentCol; i<minColumns; i++) {
             output.append(',');
         }
         output.append('\n');
     }

     public void cell(String cellReference, String formattedValue,
             XSSFComment comment) {
         logger.info("CellRef :: Formatted Value   :"+cellReference+" :: "+formattedValue);              
         if (firstCellOfRow) {
             firstCellOfRow = false;
         } else {
             output.append(',');
         }

         // gracefully handle missing CellRef here in a similar way as XSSFCell does
         if(cellReference == null) {
             cellReference = new CellRangeAddress(currentRow, currentCol, currentCol, currentCol).formatAsString();
         }

         // Did we miss any cells?
         int thisCol = (new CellReference(cellReference)).getCol();
         int missedCols = thisCol - currentCol - 1;
         for (int i=0; i<missedCols; i++) {
             output.append(',');
         }
         currentCol = thisCol;

         // Number or string?
         try {
             Double.parseDouble(formattedValue);
             output.append(formattedValue);
         } catch (NumberFormatException e) {
             //formattedValue = formattedValue.replaceAll("\\t", "");
             //formattedValue = formattedValue.replaceAll("\\n", "");
             //formattedValue = formattedValue.trim();
             output.append('"');
             output.append(formattedValue.replace("\"", "\\\"").trim());
             output.append('"');
         }
     }

     public void headerFooter(String text, boolean isHeader, String tagName) {
         // Skip, no headers or footers in CSV
     }

    @Override
    public void ovveriddenFormat(String celRef, int formatIndex,
            String formatedString) {
        // TODO Auto-generated method stub

    }

 }

推荐答案

如果文件是使用Excel生成的,并且包含13位数字的单元格使用数字格式0#不是 General.

Cannot reproducing if the file is generated using Excel and the cells containing the 13 digit numbers are formatted using number format 0 or #, not General.

但是在Linux机器上运行"是什么意思?如果我使用Libreoffice Calc创建*.xlsx文件,使包含以数字格式General格式化的13位数字的单元格,则Calc会将它们显示为13位数字,但Excel不会.为了显示Excel中的13位数字,必须使用数字格式0#格式化单元格.

But what is meant with "running on a Linux machine"? If I am creating the *.xlsx file using Libreoffice Calc having the cells containing the 13 digit numbers formatted using number format General, then Calc will showing them as 13 digit numbers but Excel will not. For showing the numbers 13 digit in Excel the cells must be formatted using number format 0 or #.

apache poi DataFormatter的工作方式与Excel相同.当使用General进行格式化时,Excel会以科学计数法的形式显示12位数字的值.

The apache poi DataFormatter is made to work like Excel would do. And Excel shows values from 12 digits on as scientific notation when formatted using General.

您可以使用以下方式更改此行为:

You could changing this behavior using:

...
    public void processSheet(
            StylesTable styles,
            ReadOnlySharedStringsTable strings,
            SheetContentsHandler sheetHandler, 
            InputStream sheetInputStream) throws IOException, SAXException {
        DataFormatter formatter = new DataFormatter();
        formatter.addFormat("General", new java.text.DecimalFormat("#.###############"));
...

这篇关于Apache POI如何添加自定义DataFormatter以将13位整数作为字符串而非数字进行处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆