保存在C字符的Uni code [英] Unicode stored in C char

查看:108
本文介绍了保存在C字符的Uni code的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我现在正在学习Linux上的C语言,我已经跨越有点古怪的情况来了。

I'm learning the C language on Linux now and I've came across a little weird situation.

据我所知,标准C的字符数据类型为ASCII码,1字节(8位)。这应该意味着,它只能容纳ASCII字符。

As far as I know, the standard C's char data type is ASCII, 1 byte (8 bits). It should mean, that it can hold only ASCII characters.

在我的节目我用字符输入[]这样的伪$ C ,这是由的getchar 填充函数$ C:

In my program I use char input[], which is filled by getchar function like this pseudocode:

char input[20];
int z, i;
for(i = 0; i < 20; i++)
{
   z = getchar();
   input[i] = z;
}

奇怪的是,它不仅为ASCII字符,但是对于任何字符我想象的,如 @&放大器; @ {čřžŧ¶'`[łĐŧđж←^€〜[←^ Ø{&放大器;}。CZ 输入

我的问题是 - 怎么可能?这似乎是在c许多美丽的例外之一,但我真的AP preciate解释。它是操作系统,编译器,藏语言的额外超功能的问题?

My question is - how is it possible? It seems to be one of many beautiful exceptions in C, but I would really appreciate explanation. Is it a matter of OS, compiler, hidden language's additional super-feature?

感谢。

推荐答案

这里没有魔法 - C语言给你接取到原始字节,因为它们存储在内存中的comptuer。
如果你的终端使用UTF-8(这是有可能的),非ASCII字符需要在内存多于一个字节。当您再次显示的话,就是我们的终端code这些序列转换成一个单一的显示的字符。

There is no magic here - The C language gives you acess to the raw bytes, as they are stored in the comptuer memory. If your terminal is using utf-8 (which is likely), non-ASCII chars take more than one byte in memory. When you display then again, is our terminal code which converts these sequences into a single displayed character.

只要改变你的code打印的strlen 琴弦,你会明白我的意思。

Just change your code to print the strlen of the strings, and you will see what I mean.

要正确处理UTF-8非ASCII字符用C,你必须使用一些库来处理他们为你,喜欢巧舌如簧,QT,或其他许多人。

To properly handle utf-8 non-ASCII chars in C you have to use some library to handle them for you, like glib, qt, or many others.

这篇关于保存在C字符的Uni code的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆