保存在C字符的Uni code [英] Unicode stored in C char
问题描述
我现在正在学习Linux上的C语言,我已经跨越有点古怪的情况来了。
I'm learning the C language on Linux now and I've came across a little weird situation.
据我所知,标准C的字符
数据类型为ASCII码,1字节(8位)。这应该意味着,它只能容纳ASCII字符。
As far as I know, the standard C's char
data type is ASCII, 1 byte (8 bits). It should mean, that it can hold only ASCII characters.
在我的节目我用字符输入[]这样的伪$ C
,这是由的getchar
填充函数$ C:
In my program I use char input[]
, which is filled by getchar
function like this pseudocode:
char input[20];
int z, i;
for(i = 0; i < 20; i++)
{
z = getchar();
input[i] = z;
}
奇怪的是,它不仅为ASCII字符,但是对于任何字符我想象的,如 @&放大器; @ {čřžŧ¶'`[łĐŧđж←^€〜[←^ Ø{&放大器;}。CZ
输入
我的问题是 - 怎么可能?这似乎是在c许多美丽的例外之一,但我真的AP preciate解释。它是操作系统,编译器,藏语言的额外超功能的问题?
My question is - how is it possible? It seems to be one of many beautiful exceptions in C, but I would really appreciate explanation. Is it a matter of OS, compiler, hidden language's additional super-feature?
感谢。
推荐答案
这里没有魔法 - C语言给你接取到原始字节,因为它们存储在内存中的comptuer。
如果你的终端使用UTF-8(这是有可能的),非ASCII字符需要在内存多于一个字节。当您再次显示的话,就是我们的终端code这些序列转换成一个单一的显示的字符。
There is no magic here - The C language gives you acess to the raw bytes, as they are stored in the comptuer memory. If your terminal is using utf-8 (which is likely), non-ASCII chars take more than one byte in memory. When you display then again, is our terminal code which converts these sequences into a single displayed character.
只要改变你的code打印的strlen
琴弦,你会明白我的意思。
Just change your code to print the strlen
of the strings, and you will see what I mean.
要正确处理UTF-8非ASCII字符用C,你必须使用一些库来处理他们为你,喜欢巧舌如簧,QT,或其他许多人。
To properly handle utf-8 non-ASCII chars in C you have to use some library to handle them for you, like glib, qt, or many others.
这篇关于保存在C字符的Uni code的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!