C标准:字符集和字符串编码规范 [英] C standard : Character set and string encoding specification

查看:117
本文介绍了C标准:字符集和字符串编码规范的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我找到了C标准(C99和C11)含糊相对于字符/字符串code的位置和编码规则:

I found the C standard (C99 and C11) vague with respect to character/string code positions and encoding rules:

首先该标准定义了源代码字符集执行字符集
本质上,它提供了一组字形,但并不任何数值相关联
与他们 - 那么,什么是默认的字符集

Firstly the standard defines the source character set and the execution character set. Essentially it provides a set of glyphs, but does not associate any numerical values with them - So what is the default character set?

我不问在这里,但只是编码字形/剧目数字/ code点映。
它定义了通用字符名称作为ISO / IEC 10646,但它说,
这是默认字符集?

I'm not asking about encoding here but just the glyph/repertoire to numeric/code point mapping. It does define universal character names as ISO/IEC 10646, but does it say that this is the default charset?

作为扩展上面 - 我找不到任何它说什么字符
数字转义序列\\ 0 \\点¯x重新present。

As an extension to the above - I couldn't find anything which says what characters the numeric escape sequences \0 and \x represent.

从C标准(C99和C11,我没有检查ANSI C)我得到了以下
有关字符和字符串:

From the C standards (C99 and C11, I didn't check ANSI C) I got the following about character and string literals:

 +---------+-----+------------+----------------------------------------------+
 | Literal | Std | Type       | Meaning                                      |
 +---------+-----+------------+----------------------------------------------+
 | '...'   | C99 | int        | An integer character constant is a  sequence |
 |         |     |            | of one or more multibyte characters          |
 | L'...'  | C99 | wchar_t    | A wide character constant is a sequence of   |
 |         |     |            | one or more multibyte characters             |
 | u'...'  | C11 | char16_t   | A wide character constant is a sequence of   |
 |         |     |            | one or more multibyte characters             |
 | U'...'  | C11 | char32_t   | A wide character constant is a sequence of   |
 |         |     |            | one or more multibyte characters             |
 | "..."   | C99 | char[]     | A character string literal is a sequence of  |
 |         |     |            | zero or more multibyte characters            |   
 | L"..."  | C99 | wchar_t[]  | A wide string literal is a sequence of zero  |
 |         |     |            | or more multibyte characters                 | 
 | u8"..." | C11 | char[]     | A UTF-8 string literal is a sequence of zero |
 |         |     |            | or more multibyte characters                 | 
 | u"..."  | C11 | char16_t[] | A wide string literal is a sequence of zero  |
 |         |     |            | or more multibyte characters                 | 
 | U"..."  | C11 | char32_t[] | A wide string literal is a sequence of zero  |
 |         |     |            | or more multibyte characters                 | 
 +---------+-----+------------+----------------------------------------------+

但是我无法找到有关这些文字的编码规则东西。
UTF-8似乎暗示UTF-8编码,但我不认为这是明确提到
任何地方。另外,对于其他类型是编码未定义或依赖于实现

However I couldn't find anything about the encoding rules for these literals. UTF-8 does seem to hint UTF-8 encoding, but I don't think it's explicitly mentioned anywhere. Also, for the other types is the encoding undefined or implementation dependent?

我不熟悉UNIX规范。 请问UNIX规范指定任何额外的限制(S),以这些规则?

I'm not to familiar with the UNIX specification. Does the UNIX specification specify any additional constraint(s) to these rules?

此外,如果任何人都可以告诉我的什么字符集/编码方案所使用的GCC和MSVC ,这也将有所帮助。

Also if anyone can tell me what charset/encoding scheme is used by GCC and MSVC that would also help.

推荐答案

C是不贪有关字符集。还有为默认字符集没有这样的事,这是实现定义 - 尽管它主要是在最现代化的系统ASCII或UTF-8

C is not greedy about character sets. There's no such thing as "default character set", it's implementation defined - although it's mostly ASCII or UTF-8 on most modern systems.

这篇关于C标准:字符集和字符串编码规范的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆