异常读取 XLSB 文件 Apache POI java.io.CharConversionException [英] Exception reading XLSB File Apache POI java.io.CharConversionException

查看:53
本文介绍了异常读取 XLSB 文件 Apache POI java.io.CharConversionException的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发一个使用Apache POI读取excel xlsb文件的Java应用程序,但在读取它时出现异常,我的代码如下:

import java.io.IOException;导入 java.io.InputStream;导入 org.apache.poi.xssf.eventusermodel.XSSFReader;导入 org.apache.poi.xssf.model.SharedStringsTable;导入 org.apache.poi.xssf.usermodel.XSSFRichTextString;导入 org.apache.poi.openxml4j.exceptions.InvalidFormatException;导入 org.apache.poi.openxml4j.exceptions.OpenXML4JException;导入 org.apache.poi.openxml4j.opc.Package;导入 org.xml.sax.Attributes;导入 org.xml.sax.ContentHandler;导入 org.xml.sax.InputSource;导入 org.xml.sax.SAXException;导入 org.xml.sax.XMLReader;导入 org.xml.sax.helpers.DefaultHandler;导入 org.xml.sax.helpers.XMLReaderFactory;导入 java.util.Iterator;公共类普鲁巴{公共静态无效主(字符串 [] args){String direccion = "C:/Documents and Settings/RSalasL/My Documents/New Folder/masstigeoct12.xlsb";包装 pkg;尝试 {pkg = Package.open(direction);XSSFReader r = 新 XSSFReader(pkg);SharedStringsTable sst = r.getSharedStringsTable();XMLReader 解析器 = fetchSheetParser(sst);迭代器sheet = r.getSheetsData();while(sheets.hasNext()) {System.out.println("正在处理新工作表:\n");InputStream sheet = sheet.next();InputSource sheetSource = new InputSource(sheet);parser.parse(sheetSource);sheet.close();System.out.println("");}} catch (InvalidFormatException e) {//TODO 自动生成的 catch 块e.printStackTrace();} catch (IOException e) {//TODO 自动生成的 catch 块e.printStackTrace();} catch (OpenXML4JException e) {//TODO 自动生成的 catch 块e.printStackTrace();} catch (SAXException e) {//TODO 自动生成的 catch 块e.printStackTrace();}}public void processAllSheets(String filename) 抛出异常 {包 pkg = Package.open(filename);XSSFReader r = 新 XSSFReader( pkg );SharedStringsTable sst = r.getSharedStringsTable();XMLReader 解析器 = fetchSheetParser(sst);迭代器sheet = r.getSheetsData();while(sheets.hasNext()) {System.out.println("正在处理新工作表:\n");InputStream sheet = sheet.next();InputSource sheetSource = new InputSource(sheet);parser.parse(sheetSource);sheet.close();System.out.println("");}}公共静态 XMLReader fetchSheetParser(SharedStringsTable sst) 抛出 SAXException {XMLReader 解析器 =XMLReaderFactory.createXMLReader(org.apache.xerces.parsers.SAXParser");ContentHandler handler = new SheetHandler(sst);parser.setContentHandler(handler);返回解析器;}私有静态类 SheetHandler 扩展 DefaultHandler {私人 SharedStringsTable sst;私人字符串 lastContents;私有布尔 nextIsString;私人 SheetHandler(SharedStringsTable sst){this.sst = sst;}public void startElement(String uri, String localName, String name,属性属性)抛出 SAXException {//c =>细胞if(name.equals("c")) {//打印单元格引用System.out.print(attributes.getValue("r") + " - ");//判断该值是否是 SST 中的索引String cellType = attributes.getValue("t");if(cellType != null && cellType.equals("s")) {nextIsString = true;} 别的 {nextIsString = false;}}//清除内容缓存lastContents = "";}public void endElement(String uri, String localName, String name)抛出 SAXException {//根据需要处理最后的内容.//现在开始,因为 characters() 可能会被多次调用如果(nextIsString){int idx = Integer.parseInt(lastContents);lastContents = new XSSFRichTextString(sst.getEntryAt(idx)).toString();nextIsString = false;}//v =>一个单元格的内容//看到字符串内容后的输出if(name.equals("v")) {System.out.println(lastContents);}}public void characters(char[] ch, int start, int length)抛出 SAXException {lastContents += new String(ch, start, length);}}}

例外是这样的:

java.io.CharConversionException:不支持大于 4 个字节的字符:字节 0x83 表示长度超过 4 个字节在 org.apache.xmlbeans.impl.piccolo.xml.UTF8XMLDecoder.decode(UTF8XMLDecoder.java:162)在 org.apache.xmlbeans.impl.piccolo.xml.XMLStreamReader$FastStreamDecoder.read(XMLStreamReader.java:762)在 org.apache.xmlbeans.impl.piccolo.xml.XMLStreamReader.read(XMLStreamReader.java:162)在 org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.yy_refill(PiccoloLexer.java:3474)在 org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.yylex(PiccoloLexer.java:3958)在 org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yylex(Piccolo.java:1290)在 org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yyparse(Piccolo.java:1400)在 org.apache.xmlbeans.impl.piccolo.xml.Piccolo.parse(Piccolo.java:714)在 org.apache.xmlbeans.impl.store.Locale$SaxLoader.load(Locale.java:3439)在 org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1270)在 org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1257)在 org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.parse(SchemaTypeLoaderBase.java:345)在 org.openxmlformats.schemas.spreadsheetml.x2006.main.WorkbookDocument$Factory.parse(来源不明)在 org.apache.poi.xssf.eventusermodel.XSSFReader$SheetIterator.<init>(XSSFReader.java:207)在 org.apache.poi.xssf.eventusermodel.XSSFReader$SheetIterator.<init>(XSSFReader.java:166)在 org.apache.poi.xssf.eventusermodel.XSSFReader.getSheetsData(XSSFReader.java:160)在 EDManager.Prueba.main(Prueba.java:36)

该文件有 2 张纸,一张有 329 行 3 列,另一张有 566 行 3 列,我只想读取文件以查找第二张纸中是否有值.

解决方案

Apache POI 不支持 .xlsb 文件格式用于文本提取以外的任何内容.Apache POI 将很乐意提供完整的读或写支持 .xls 文件(通过 HSSF)和 .xlsx 文件(通过 XSSF),或两者(通过公共 SS UserModel 接口).

但是,生成操作不支持 .xlsb 格式 - 这是两者之间的一种非常奇怪的混合,所涉及的大量工作意味着没有人愿意自愿/赞助所需的工作.

从 Apache POI 3.15 beta3/3.16 开始,Apache POI 为 .xlsb 提供的是 .xlsb 文件的文本提取器 - XSSFBEventBasedExcelExtractor.您可以使用它从文件中获取文本,或者通过一些调整将其转换为 CSV 之类的内容

要获得完整的读/写支持,您需要将文件转换为 .xls(如果行/列的数量不多)或 .xlsx(如果有).如果你真的很想提供帮助,你可以查看 XSSFBEventBasedExcelExtractor 的源代码,然后尝试贡献补丁以添加对 POI 的全面支持!

(此外,我认为您的特定 .xlsb 文件部分损坏,但即使不是这样,Apache POI 仍然不支持文本提取以外的任何其他内容,抱歉)

Im developing a Java aplication that reads an excel xlsb file using Apache POI, but I got an exception while reading it, my code is as follows:

import java.io.IOException;
import java.io.InputStream;

import org.apache.poi.xssf.eventusermodel.XSSFReader;
import org.apache.poi.xssf.model.SharedStringsTable;
import org.apache.poi.xssf.usermodel.XSSFRichTextString;
import org.apache.poi.openxml4j.exceptions.InvalidFormatException;
import org.apache.poi.openxml4j.exceptions.OpenXML4JException;
import org.apache.poi.openxml4j.opc.Package;
import org.xml.sax.Attributes;
import org.xml.sax.ContentHandler;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.helpers.XMLReaderFactory;

import java.util.Iterator;

public class Prueba {

    public static void main (String [] args){

        String direccion = "C:/Documents and Settings/RSalasL/My Documents/New Folder/masstigeoct12.xlsb";

        Package pkg;
        try {
            pkg = Package.open(direccion);
            XSSFReader r = new XSSFReader(pkg);
            SharedStringsTable sst = r.getSharedStringsTable();

            XMLReader parser = fetchSheetParser(sst);

            Iterator<InputStream> sheets = r.getSheetsData();
            while(sheets.hasNext()) {
                System.out.println("Processing new sheet:\n");
                InputStream sheet = sheets.next();
                InputSource sheetSource = new InputSource(sheet);
                parser.parse(sheetSource);
                sheet.close();
                System.out.println("");
            }

        } catch (InvalidFormatException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (OpenXML4JException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (SAXException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

    }

    public void processAllSheets(String filename) throws Exception {
        Package pkg = Package.open(filename);
        XSSFReader r = new XSSFReader( pkg );
        SharedStringsTable sst = r.getSharedStringsTable();

        XMLReader parser = fetchSheetParser(sst);

        Iterator<InputStream> sheets = r.getSheetsData();
        while(sheets.hasNext()) {
            System.out.println("Processing new sheet:\n");
            InputStream sheet = sheets.next();
            InputSource sheetSource = new InputSource(sheet);
            parser.parse(sheetSource);
            sheet.close();
            System.out.println("");
        }
    }


    public static XMLReader fetchSheetParser(SharedStringsTable sst) throws SAXException {
        XMLReader parser =
            XMLReaderFactory.createXMLReader(
                    "org.apache.xerces.parsers.SAXParser"
            );
        ContentHandler handler = new SheetHandler(sst);
        parser.setContentHandler(handler);
        return parser;
    }

    private static class SheetHandler extends DefaultHandler {
        private SharedStringsTable sst;
        private String lastContents;
        private boolean nextIsString;

        private SheetHandler(SharedStringsTable sst) {
            this.sst = sst;
        }

        public void startElement(String uri, String localName, String name,
                Attributes attributes) throws SAXException {
            // c => cell
            if(name.equals("c")) {
                // Print the cell reference
                System.out.print(attributes.getValue("r") + " - ");
                // Figure out if the value is an index in the SST
                String cellType = attributes.getValue("t");
                if(cellType != null && cellType.equals("s")) {
                    nextIsString = true;
                } else {
                    nextIsString = false;
                }
            }
            // Clear contents cache
            lastContents = "";
        }

        public void endElement(String uri, String localName, String name)
                throws SAXException {
            // Process the last contents as required.
            // Do now, as characters() may be called more than once
            if(nextIsString) {
                int idx = Integer.parseInt(lastContents);
                lastContents = new XSSFRichTextString(sst.getEntryAt(idx)).toString();
            nextIsString = false;
            }

            // v => contents of a cell
            // Output after we've seen the string contents
            if(name.equals("v")) {
                System.out.println(lastContents);
            }
        }

        public void characters(char[] ch, int start, int length)
                throws SAXException {
            lastContents += new String(ch, start, length);
        }
    }

}

And the exception is this:

java.io.CharConversionException: Characters larger than 4 bytes are not supported: byte 0x83 implies a length of more than 4 bytes
    at org.apache.xmlbeans.impl.piccolo.xml.UTF8XMLDecoder.decode(UTF8XMLDecoder.java:162)
    at org.apache.xmlbeans.impl.piccolo.xml.XMLStreamReader$FastStreamDecoder.read(XMLStreamReader.java:762)
    at org.apache.xmlbeans.impl.piccolo.xml.XMLStreamReader.read(XMLStreamReader.java:162)
    at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.yy_refill(PiccoloLexer.java:3474)
    at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.yylex(PiccoloLexer.java:3958)
    at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yylex(Piccolo.java:1290)
    at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yyparse(Piccolo.java:1400)
    at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.parse(Piccolo.java:714)
    at org.apache.xmlbeans.impl.store.Locale$SaxLoader.load(Locale.java:3439)
    at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1270)
    at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1257)
    at org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.parse(SchemaTypeLoaderBase.java:345)
    at org.openxmlformats.schemas.spreadsheetml.x2006.main.WorkbookDocument$Factory.parse(Unknown Source)
    at org.apache.poi.xssf.eventusermodel.XSSFReader$SheetIterator.<init>(XSSFReader.java:207)
    at org.apache.poi.xssf.eventusermodel.XSSFReader$SheetIterator.<init>(XSSFReader.java:166)
    at org.apache.poi.xssf.eventusermodel.XSSFReader.getSheetsData(XSSFReader.java:160)
    at EDManager.Prueba.main(Prueba.java:36)

The file has 2 sheets, one with 329 rows and 3 columns and the other with 566 rows and 3 columns, I just want to read the file to find if a value is in the second sheet.

解决方案

Apache POI doesn't support the .xlsb file format for anything other than text extraction. Apache POI will happily provide full read or write support .xls files (via HSSF) and .xlsx files (via XSSF), or both (via the common SS UserModel interface).

However, the .xlsb format is not supported for generatl operations - it's a very odd hybrid between the two, and the large amount of work involved has meant no-one has been willing to volunteer/sponsor the work required.

What Apache POI does offer for .xlsb, as of Apache POI 3.15 beta3 / 3.16, is a text extractor for .xlsb files - XSSFBEventBasedExcelExtractor. You can use that to get the text out of your file, or with a few tweaks convert it to something like CSV

For full read/write support, you'll need to convert your file to either .xls (if it doesn't have very large numbers of rows/columns), or .xlsx (if it does). If you're really really keen to help though, you could review the source code for XSSFBEventBasedExcelExtractor, then have a go at contributing patches to add full support to POI for it!

(Additionally, I think from the exception that your particular .xlsb file is partly corrupt, but even if it wasn't it still wouldn't be supported by Apache POI for anything other than text extraction, sorry)

这篇关于异常读取 XLSB 文件 Apache POI java.io.CharConversionException的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆