如何检测文件编码？ [英] How do I detect file encoding?

查看：79 发布时间：2019/6/11 0:01:00 C++

本文介绍了如何检测文件编码？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要打开一个现有文件并写入它。该文件可能是也可能不是Unicode。有没有办法检测编码，以便在写入时不会改变它？有关信息，我使用_wfopen_s（）打开文件。

只有追加模式似乎保留了现有的编码，但我需要替换文件的内容。

我想我可以打开二进制文件并检查BOM但有更简单/更好的方法吗？

谢谢。

我的尝试：

使用追加模式，但我需要覆盖

I need to open an existing file and write to it. The file may or may not be Unicode. Is there a way to detect the encoding so that I do not alter it when I write to it? For info, I am using _wfopen_s() to open the file.

Only append mode seems to retain the exiting encoding but I need to replace the contents of the file.

I guess I could open the file for binary and check the BOM but is there an easier/better way?

Thanks.

What I have tried:

Using append mode but I need to overwrite

推荐答案

UTF-16编码文件应始终包含字节顺序标记 - 维基百科 [ ^ ]。使用UTF-8文件，这是可选的。

因此，您应首先检查BOM。如果没有，您可以检查有效的UTF-8。

使用Windows，您可以使用 MultiByteToWideChar函数（Windows） [ ^ ]这样做（无论如何，它必须被称为将UTF-8文本转换为Windows使用的UTF-16）。

另一种选择是使用ICU转换器库（使用转换器 - ICU用户指南 [ ^ ]）。

还有一些项目提供转换器和检查功能，如 UTF8-CPP：UTF-8使用C ++以便携方式 [ ^ ]。

或根据允许的代码点编写自己的代码。我曾经找到一个基于Unicode建议的示例实现，但我不再找到它了。

请注意，所有检查都将返回true（有效的UTF-8） ASCII文件。所以可能需要首先检查字符> = 0x80。

UTF-16 encoded files should always contain a Byte order mark - Wikipedia[^]. With UTF-8 files this is optional.

So you should check first for a BOM. If there is none, you might check for valid UTF-8.

With Windows you can use the MultiByteToWideChar function (Windows)[^] to do that (it must be probably called anyway to convert UTF-8 text to UTF-16 which is used by Windows).

Another option is using the ICU converter library (Using Converters - ICU User Guide[^]).

There are also some projects providing converters and check functions like UTF8-CPP: UTF-8 with C++ in a Portable Way[^].

Or write your own according to the allowed code points. I once found a sample implementation based on the Unicode recommendations but I did not find it anymore.

Note that all checks will return true (valid UTF-8) for plain ASCII files. So it might be necessary to check first for characters >= 0x80.

参见在C / C ++中处理简单的文本文件 [ ^ ]。它显示了如何从文件的前几个字节中识别编码。

See Handling simple text files in C/C++[^]. It shows how you can identify the encoding from the first few bytes of the file.

这篇关于如何检测文件编码？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何检测文件编码？ [英] How do I detect file encoding?

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

如何检测文件编码？ [英] How do I detect file encoding?

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭