Excel电子表格中的字符编码(以及用于解码的Java字符集) [英] Character encoding in Excel spreadsheet (and what Java charset to use to decode it)

查看:1310
本文介绍了Excel电子表格中的字符编码(以及用于解码的Java字符集)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用JExcel库来读取excel电子表格。电子表格上的每个单元格可以包含44种语言(英语,葡萄牙语,法语,中文等)中的任何一种的本地化字符串。今天我不告诉API任何关于它应该使用的编码。它处理中文好,但它总是拧葡萄牙语和德语。不知何故默认编码(我的dev盒子上的MacRoman,生产上的UTF-8)无法正确解释它从excel工作簿中拉出的字符串。

I am using the JExcel library to read excel spreadsheets. Each cell on the spreadsheet may contain localization strings in any of something like 44 languages (English, Portugese, French, Chinese, etc). Today I don't tell the API anything regarding the encoding its supposed to use. Its handling the Chinese OK, but it always screws up Portugese and German. Somehow the default encoding (MacRoman on my dev box, UTF-8 on production) is failing to properly interpret the strings it pulls out of the excel workbook. There has to be something wrong with how JExcel is interpreting the character encoding of the file.

这是说...

在excel工作簿中的所有字符串是否编码有相同的字符集?

Are all the strings in an excel workbook encoded with the same character set?

有工作簿元数据我可以问这个字符集是什么尚未找到它)?

Is there workbook meta-data I can ask what this character set is (I haven't found it yet)?

如果我运行所有的单元格通过像jchardet(http://jchardet.sourceforge.net/),是可能能够描绘整个工作簿的字符编码(这是基于第一个问题是是的,给定工作簿中的所有字符串都使用相同的字符集编码)。

If I run all the cells through something like jchardet (http://jchardet.sourceforge.net/), is it likely to be able to divine the character encoding for the whole workbook (this is pretty much predicated on the first question being "yes, all stings in a given workbook are encoded with the same character set")?

这么多问题,这么少时间。

So many questions, so little time.

推荐答案

我没有直接得到答案, Matt发现一个规范指向了一个实际的答案: http://sc.openoffice.org/excelfileformat.pdf

Well I didn't get an answer directly, but Matt's discovery of a spec points the way towards an actual answer: http://sc.openoffice.org/excelfileformat.pdf

同时,我的问题只是设置编码为Cp1252。我不知道为什么,但我不是在口中的礼物马,所以可以说,并继续前进。

In the mean time, my problem went away by just setting the encoding to always be "Cp1252". I'm not sure exactly why, but I'm not looking a gift horse in the mouth, so to speak, and am moving on.

    WorkbookSettings workbookSettings = new WorkbookSettings();
    workbookSettings.setEncoding( "Cp1252" );
    Workbook.getWorkbook( theFile, workbookSettings );

我会打电话给这个人。

这篇关于Excel电子表格中的字符编码(以及用于解码的Java字符集)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆