Microsoft Excel 会破坏 .csv 文件中的变音符号? [英] Microsoft Excel mangles Diacritics in .csv files?

查看:27
本文介绍了Microsoft Excel 会破坏 .csv 文件中的变音符号?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在以编程方式将数据(使用 PHP 5.2)导出到 .csv 测试文件中.
示例数据:Numéro 1(注意带重音的 e).数据为 utf-8(无前置 BOM).

I am programmatically exporting data (using PHP 5.2) into a .csv test file.
Example data: Numéro 1 (note the accented e). The data is utf-8 (no prepended BOM).

当我在 MS Excel 中打开此文件时显示为 Numéro 1.

When I open this file in MS Excel is displays as Numéro 1.

我能够在正确显示它的文本编辑器 (UltraEdit) 中打开它.UE报告字符为十进制233.

I am able to open this in a text editor (UltraEdit) which displays it correctly. UE reports the character is decimal 233.

我如何导出 .csv 文件中的文本数据,以便MS Excel 可以正确呈现它,最好不强制使用导入向导,或者非默认向导设置?

How can I export text data in a .csv file so that MS Excel will correctly render it, preferably without forcing the use of the import wizard, or non-default wizard settings?

推荐答案

格式正确的 UTF8 文件可以有 字节顺序标记作为它的前三个八位字节.这些是十六进制值 0xEF、0xBB、0xBF.这些八位字节用于将文件标记为 UTF8(因为它们与字节顺序"信息无关).1 如果这个 BOM 不存在,消费者/读者就可以推断文本的编码类型.不支持 UTF8 的阅读器将读取字节作为其他一些编码(例如 Windows-1252)并在文件开头显示字符 .

A correctly formatted UTF8 file can have a Byte Order Mark as its first three octets. These are the hex values 0xEF, 0xBB, 0xBF. These octets serve to mark the file as UTF8 (since they are not relevant as "byte order" information).1 If this BOM does not exist, the consumer/reader is left to infer the encoding type of the text. Readers that are not UTF8 capable will read the bytes as some other encoding such as Windows-1252 and display the characters  at the start of the file.

存在一个已知错误,即 Excel 在通过文件关联打开 UTF8 CSV 文件时假定它们采用单字节编码,忽略 UTF8 BOM 的存在.这可以无法通过任何系统默认代码页或语言设置修复.BOM 不会在 Excel 中提供线索 - 它不起作用.(少数报告声称 BOM 有时会触发导入文本"向导.)此错误似乎存在于 Excel 2003 及更早版本中.大多数报告(在此处的答案中)都说这在 Excel 2007 和更新版本中已修复.

There is a known bug where Excel, upon opening UTF8 CSV files via file association, assumes that they are in a single-byte encoding, disregarding the presence of the UTF8 BOM. This can not be fixed by any system default codepage or language setting. The BOM will not clue in Excel - it just won't work. (A minority report claims that the BOM sometimes triggers the "Import Text" wizard.) This bug appears to exist in Excel 2003 and earlier. Most reports (amidst the answers here) say that this is fixed in Excel 2007 and newer.

请注意,您可以始终*使用导入文本"向导在 Excel 中正确打开 UTF8 CSV 文件,该向导允许您指定要打开的文件的编码.当然这样就不太方便了.

Note that you can always* correctly open UTF8 CSV files in Excel using the "Import Text" wizard, which allows you to specify the encoding of the file you're opening. Of course this is much less convenient.

这个答案的读者很可能处于他们不特别支持 Excel <2007 年,但将原始 UTF8 文本发送到 Excel,这会误解它并在文本中添加 Ã 和其他类似的 Windows-1252 字符.添加 UTF8 BOM 可能是您最好和最快的解决方法.

Readers of this answer are most likely in a situation where they don't particularly support Excel < 2007, but are sending raw UTF8 text to Excel, which is misinterpreting it and sprinkling your text with à and other similar Windows-1252 characters. Adding the UTF8 BOM is probably your best and quickest fix.

如果您的用户使用旧版 Excel,而 Excel 是您的 CSV 的唯一使用者,您可以通过导出 UTF16 而不是 UTF8 来解决此问题.Excel 2000 和 2003 将正确地双击打开它们.(其他一些文本编辑器可能会遇到 UTF16 问题,因此您可能需要仔细权衡您的选择.)

If you are stuck with users on older Excels, and Excel is the only consumer of your CSVs, you can work around this by exporting UTF16 instead of UTF8. Excel 2000 and 2003 will double-click-open these correctly. (Some other text editors can have issues with UTF16, so you may have to weigh your options carefully.)

* 除非你不能,(至少)Excel 2011 for Mac 的导入向导实际上并不总是适用于所有编码,不管你告诉它什么.</轶事证据>:)

这篇关于Microsoft Excel 会破坏 .csv 文件中的变音符号?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆