为什么不允许将UTF-8用作"ANSI"?代码页? [英] Why isn't UTF-8 allowed as the "ANSI" code page?

查看:142
本文介绍了为什么不允许将UTF-8用作"ANSI"?代码页?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Windows _setmbcp 函数允许任何有效的代码页...

The Windows _setmbcp function allows any valid code page...

(除了UTF-7和UTF-8,它们不是 支持)

(except UTF-7 and UTF-8, which are not supported)

好的,不支持UTF-7是有道理的:字符具有非唯一的表示形式,这会带来复杂性和安全风险.

OK, not supporting UTF-7 makes sense: Characters have non-unique representations and that introduces complexity and security risks.

但是为什么不使用UTF-8?

But why not UTF-8?

据我了解,Windows API函数的"ANSI"版本将其参数转换为UTF-16,调用等效的"W"函数,并将输出中的任何字符串转换为"ANSI".这是我一直在手动执行的操作.那么Windows为什么不能为我做呢?

As I understand it, the "ANSI" versions of the Windows API functions convert their arguments to UTF-16, call the equivalent "W" function, and convert any strings in the output to "ANSI". This is what I've been doing manually. So why can't Windows do it for me?

推荐答案

"ANSI"代码页基本上是遗留的:Windows 9X时代.无论如何,所有现代软件都应基于Unicode(即UTF-16).

The "ANSI" codepage is basically legacy: Windows 9X era. All modern software should be Unicode (that is, UTF-16) based anyway.

基本上,当最初设计Ansi代码页内容时,甚至还没有发明UTF-8,因此对多字节编码的支持相当随意(即,大多数Ansi代码页是单字节,除了某些East)一到两个字节的亚洲代码页).无论如何,无论如何所有新开发都应该在UTF-16中完成,添加对适当的"多字节编码的支持可能被认为是不值得的.

Basically, when the Ansi code page stuff was originally designed, UTF-8 wasn't even invented and so support for multi-byte encodings was rather haphazard (i.e. most Ansi code pages are single byte, with the exception of some East Asian code pages which are one-or-two byte). Adding support for "proper" multi-byte encodings was probably deemed not worth the effort when all new development should be done in UTF-16 anyway.

这篇关于为什么不允许将UTF-8用作"ANSI"?代码页?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆