异常读取XLSB文件Apache POI java.io.CharConversionException [英] Exception reading XLSB File Apache POI java.io.CharConversionException

查看:525
本文介绍了异常读取XLSB文件Apache POI java.io.CharConversionException的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我开发了一个使用Apache POI读取excel xlsb文件的Java应用程序,但是在阅读时遇到异常,我的代码如下:

  import java.io.IOException; 
import java.io.InputStream;

import org.apache.poi.xssf.eventusermodel.XSSFReader;
import org.apache.poi.xssf.model.SharedStringsTable;
import org.apache.poi.xssf.usermodel.XSSFRichTextString;
import org.apache.poi.openxml4j.exceptions.InvalidFormatException;
import org.apache.poi.openxml4j.exceptions.OpenXML4JException;
import org.apache.poi.openxml4j.opc.Package;
import org.xml.sax.Attributes;
import org.xml.sax.ContentHandler;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.helpers.XMLReaderFactory;

import java.util.Iterator;

public class Prueba {

public static void main(String [] args){

String direccion =C:/ Documents and Settings / RSalasL / My Documents / New Folder / masstigeoct12.xlsb;

包pkg;
try {
pkg = Package.open(directccion);
XSSFReader r = new XSSFReader(pkg);
SharedStringsTable sst = r.getSharedStringsTable();

XMLReader parser = fetchSheetParser(sst);

迭代器< InputStream> sheet = r.getSheetsData();
while(sheets.hasNext()){
System.out.println(Processing new sheet:\\\
);
InputStream sheet = sheets.next();
InputSource sheetSource = new InputSource(sheet);
parser.parse(sheetSource);
sheet.close();
System.out.println();
}

} catch(InvalidFormatException e){
// TODO自动生成的catch块
e.printStackTrace();
} catch(IOException e){
// TODO自动生成的catch块
e.printStackTrace();
} catch(OpenXML4JException e){
// TODO自动生成的catch块
e.printStackTrace();
} catch(SAXException e){
// TODO自动生成的catch块
e.printStackTrace();
}

}

public void processAllSheets(String filename)throws异常{
包pkg = Package.open(filename);
XSSFReader r = new XSSFReader(pkg);
SharedStringsTable sst = r.getSharedStringsTable();

XMLReader parser = fetchSheetParser(sst);

迭代器< InputStream> sheet = r.getSheetsData();
while(sheets.hasNext()){
System.out.println(Processing new sheet:\\\
);
InputStream sheet = sheets.next();
InputSource sheetSource = new InputSource(sheet);
parser.parse(sheetSource);
sheet.close();
System.out.println();
}
}


public static XMLReader fetchSheetParser(SharedStringsTable sst)throws SAXException {
XMLReader parser =
XMLReaderFactory.createXMLReader(
org.apache.xerces.parsers.SAXParser
);
ContentHandler handler = new SheetHandler(sst);
parser.setContentHandler(handler);
返回解析器;
}

private static class SheetHandler extends DefaultHandler {
private SharedStringsTable sst;
private String lastContents;
private boolean nextIsString;

private SheetHandler(SharedStringsTable sst){
this.sst = sst;
}

public void startElement(String uri,String localName,String name,
属性属性)throws SAXException {
// c => cell
if(name.equals(c)){
//打印单元格引用
System.out.print(attributes.getValue(r)+ - ) ;
//找出该值是否是SST中的索引
String cellType = attributes.getValue(t);
if(cellType!= null&& cellType.equals(s)){
nextIsString = true;
} else {
nextIsString = false;
}
}
//清除内容缓存
lastContents =;
}

public void endElement(String uri,String localName,String name)
throws SAXException {
//根据需要处理最后一个内容。
//现在做,因为可以多次调用character()
if(nextIsString){
int idx = Integer.parseInt(lastContents);
lastContents = new XSSFRichTextString(sst.getEntryAt(idx))。toString();
nextIsString = false;
}

// v =>单元格的内容
//看到字符串内容后,输出
if(name.equals(v)){
System.out.println(lastContents);
}
}

public void characters(char [] ch,int start,int length)
throws SAXException {
lastContents + = new String ch,开始,长度);
}
}

}

例外是:

  java.io.CharConversionException:不支持大于4个字节的字符:字节0x83意味着更长的一段超过4个字节
在org.apache.xmlbeans.impl.piccolo.xml.UTF8XMLDecoder.decode(UTF8XMLDecoder.java:162)
在org.apache.xmlbeans.impl.piccolo.xml.XMLStreamReader $ FastStreamDecoder .read(XMLStreamReader.java:762)
在org.apache.xmlbeans.impl.piccolo.xml.XMLStreamReader.read(XMLStreamReader.java:162)
在org.apache.xmlbeans.impl.piccolo .xml.PiccoloLexer.yy_refill(PiccoloLexer.java:3474)
在org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.yylex(PiccoloLexer.java:3958)
在org.apache.xmlbeans .impl.piccolo.xml.Piccolo.yylex(Piccolo.java:1290)
在org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yyparse(Piccolo.java:1400)
在org .apache.xmlbeans.impl.piccolo.xml.Piccolo.parse(Piccolo.java:714)
在org.apache.xmlbeans.impl.store.Locale $ SaxLoader.load(Locale.java:3439)
在org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1270 )
在org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1257)
在org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.parse(SchemaTypeLoaderBase.java:345 )
在org.openxmlformats.schemas.spreadsheetml.x2006.main.WorkbookDocument $ Factory.parse(未知源)
在org.apache.poi.xssf.eventusermodel.XSSFReader $ SheetIterator。< init> (XSSFReader.java:207)
在org.apache.poi.xssf.eventusermodel.XSSFReader $ SheetIterator。< init>(XSSFReader.java:166)
在org.apache.poi.xssf。 eventusermodel.XSSFReader.getSheetsData(XSSFReader.java:160)
在EDManager.Prueba.main(Prueba.java:36)

该文件有2张,一张有329行和3列,另一张有566行和3列,我只想读取文件来查找是否有值在第二张表中。

解决方案

Apache POI不支持除文本提取之外的任何其他文件格式的.xlsb文件格式。 Apache POI将乐意提供完整的读/写支持.xls文件(通过HSSF)和.xlsx文件(通过XSSF)或两者(通过普通的SS UserModel接口)。



但是,对于Generatl操作,不支持.xlsb格式 - 这两者之间是一个非常奇怪的混合,涉及的大量工作意味着没有人拥有愿意自愿/主办所需的工作。



Apache POI为Apache POI 3.15 beta3 / 3.16提供了什么Apache POI是一个 .xlsb文件的文本提取器 - XSSFBEventBasedExcelExtractor 。您可以使用它将文本从文件中取出,或者通过一些调整将其转换为CSV



要完整的读/写支持,您需要将文件转换为.xls(如果没有非常大的行/列数),或.xlsx(如果有)。如果您真的很想帮助,您可以查看 XSSFBEventBasedExcelExtractor的源代码,然后在提供修补程序中添加对POI的完全支持!



(另外,我认为从你的特殊的.xlsb文件是部分损坏的例外,但即使它不是Apache POI仍然不支持除了文本提取之外的任何东西,抱歉) / p>

Im developing a Java aplication that reads an excel xlsb file using Apache POI, but I got an exception while reading it, my code is as follows:

import java.io.IOException;
import java.io.InputStream;

import org.apache.poi.xssf.eventusermodel.XSSFReader;
import org.apache.poi.xssf.model.SharedStringsTable;
import org.apache.poi.xssf.usermodel.XSSFRichTextString;
import org.apache.poi.openxml4j.exceptions.InvalidFormatException;
import org.apache.poi.openxml4j.exceptions.OpenXML4JException;
import org.apache.poi.openxml4j.opc.Package;
import org.xml.sax.Attributes;
import org.xml.sax.ContentHandler;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.helpers.XMLReaderFactory;

import java.util.Iterator;

public class Prueba {

    public static void main (String [] args){

        String direccion = "C:/Documents and Settings/RSalasL/My Documents/New Folder/masstigeoct12.xlsb";

        Package pkg;
        try {
            pkg = Package.open(direccion);
            XSSFReader r = new XSSFReader(pkg);
            SharedStringsTable sst = r.getSharedStringsTable();

            XMLReader parser = fetchSheetParser(sst);

            Iterator<InputStream> sheets = r.getSheetsData();
            while(sheets.hasNext()) {
                System.out.println("Processing new sheet:\n");
                InputStream sheet = sheets.next();
                InputSource sheetSource = new InputSource(sheet);
                parser.parse(sheetSource);
                sheet.close();
                System.out.println("");
            }

        } catch (InvalidFormatException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (OpenXML4JException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (SAXException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

    }

    public void processAllSheets(String filename) throws Exception {
        Package pkg = Package.open(filename);
        XSSFReader r = new XSSFReader( pkg );
        SharedStringsTable sst = r.getSharedStringsTable();

        XMLReader parser = fetchSheetParser(sst);

        Iterator<InputStream> sheets = r.getSheetsData();
        while(sheets.hasNext()) {
            System.out.println("Processing new sheet:\n");
            InputStream sheet = sheets.next();
            InputSource sheetSource = new InputSource(sheet);
            parser.parse(sheetSource);
            sheet.close();
            System.out.println("");
        }
    }


    public static XMLReader fetchSheetParser(SharedStringsTable sst) throws SAXException {
        XMLReader parser =
            XMLReaderFactory.createXMLReader(
                    "org.apache.xerces.parsers.SAXParser"
            );
        ContentHandler handler = new SheetHandler(sst);
        parser.setContentHandler(handler);
        return parser;
    }

    private static class SheetHandler extends DefaultHandler {
        private SharedStringsTable sst;
        private String lastContents;
        private boolean nextIsString;

        private SheetHandler(SharedStringsTable sst) {
            this.sst = sst;
        }

        public void startElement(String uri, String localName, String name,
                Attributes attributes) throws SAXException {
            // c => cell
            if(name.equals("c")) {
                // Print the cell reference
                System.out.print(attributes.getValue("r") + " - ");
                // Figure out if the value is an index in the SST
                String cellType = attributes.getValue("t");
                if(cellType != null && cellType.equals("s")) {
                    nextIsString = true;
                } else {
                    nextIsString = false;
                }
            }
            // Clear contents cache
            lastContents = "";
        }

        public void endElement(String uri, String localName, String name)
                throws SAXException {
            // Process the last contents as required.
            // Do now, as characters() may be called more than once
            if(nextIsString) {
                int idx = Integer.parseInt(lastContents);
                lastContents = new XSSFRichTextString(sst.getEntryAt(idx)).toString();
            nextIsString = false;
            }

            // v => contents of a cell
            // Output after we've seen the string contents
            if(name.equals("v")) {
                System.out.println(lastContents);
            }
        }

        public void characters(char[] ch, int start, int length)
                throws SAXException {
            lastContents += new String(ch, start, length);
        }
    }

}

And the exception is this:

java.io.CharConversionException: Characters larger than 4 bytes are not supported: byte 0x83 implies a length of more than 4 bytes
    at org.apache.xmlbeans.impl.piccolo.xml.UTF8XMLDecoder.decode(UTF8XMLDecoder.java:162)
    at org.apache.xmlbeans.impl.piccolo.xml.XMLStreamReader$FastStreamDecoder.read(XMLStreamReader.java:762)
    at org.apache.xmlbeans.impl.piccolo.xml.XMLStreamReader.read(XMLStreamReader.java:162)
    at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.yy_refill(PiccoloLexer.java:3474)
    at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.yylex(PiccoloLexer.java:3958)
    at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yylex(Piccolo.java:1290)
    at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yyparse(Piccolo.java:1400)
    at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.parse(Piccolo.java:714)
    at org.apache.xmlbeans.impl.store.Locale$SaxLoader.load(Locale.java:3439)
    at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1270)
    at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1257)
    at org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.parse(SchemaTypeLoaderBase.java:345)
    at org.openxmlformats.schemas.spreadsheetml.x2006.main.WorkbookDocument$Factory.parse(Unknown Source)
    at org.apache.poi.xssf.eventusermodel.XSSFReader$SheetIterator.<init>(XSSFReader.java:207)
    at org.apache.poi.xssf.eventusermodel.XSSFReader$SheetIterator.<init>(XSSFReader.java:166)
    at org.apache.poi.xssf.eventusermodel.XSSFReader.getSheetsData(XSSFReader.java:160)
    at EDManager.Prueba.main(Prueba.java:36)

The file has 2 sheets, one with 329 rows and 3 columns and the other with 566 rows and 3 columns, I just want to read the file to find if a value is in the second sheet.

解决方案

Apache POI doesn't support the .xlsb file format for anything other than text extraction. Apache POI will happily provide full read or write support .xls files (via HSSF) and .xlsx files (via XSSF), or both (via the common SS UserModel interface).

However, the .xlsb format is not supported for generatl operations - it's a very odd hybrid between the two, and the large amount of work involved has meant no-one has been willing to volunteer/sponsor the work required.

What Apache POI does offer for .xlsb, as of Apache POI 3.15 beta3 / 3.16, is a text extractor for .xlsb files - XSSFBEventBasedExcelExtractor. You can use that to get the text out of your file, or with a few tweaks convert it to something like CSV

For full read/write support, you'll need to convert your file to either .xls (if it doesn't have very large numbers of rows/columns), or .xlsx (if it does). If you're really really keen to help though, you could review the source code for XSSFBEventBasedExcelExtractor, then have a go at contributing patches to add full support to POI for it!

(Additionally, I think from the exception that your particular .xlsb file is partly corrupt, but even if it wasn't it still wouldn't be supported by Apache POI for anything other than text extraction, sorry)

这篇关于异常读取XLSB文件Apache POI java.io.CharConversionException的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆