多字节字符 [英] Multi-byte chars

查看:105
本文介绍了多字节字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在网上阅读C标准,我很困惑多字节

字符。我认为广泛的字符可能是语言的字符,例如

粤语或日语。我知道ASCII字符集指定每个

字符,例如''b''或''B''是8位字符。那么什么是多字节

字符?

另外你如何使用函数参数main(char argc,char

** argv)如果这是正确的吗?


比尔


----- =通过Newsfeeds.Com发布,未经审查的Usenet新闻= - ---
http://www.newsfeeds.com - #1世界新闻组服务!

----- ==超过80,000个新闻组--16个不同的服务器! = -----

I''ve been reading the C standard online and I''m puzzled as to what multibyte
chars are. Wide chars I believe would be characters for languages such as
cantonese or Japanese. I know the ASCII character set specifies that each
character such as ''b'' or ''B'' is an 8 bit character. So what''s a multibyte
character?
Also how would you use the function parameter main (char argc, char
**argv) if that''s correct?

Bill

-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
-----== Over 80,000 Newsgroups - 16 Different Servers! =-----

推荐答案

Bill Cunningham写道:
Bill Cunningham wrote:
我一直在读C标准在线,我对多字节字符是什么感到疑惑。


多字节字符是一个或多个字节的序列,表示源或执行的扩展字符集的

成员。 />
环境",如果我有3.7.2的引用权。

宽字符我认为是汉语或日语等语言的字符。


C并不具体。见3.7.3。

我知道ASCII字符集
指定每个字符如''b''或''B''是8位字符。
I''ve been reading the C standard online and I''m puzzled as to what
multibyte chars are.
A multibyte character is a "sequence of one or more bytes representing a
member of the extended character set of either the source or the execution
environment", if I have the quote from 3.7.2 right.
Wide chars I believe would be characters for
languages such as cantonese or Japanese.
C isn''t as specific as that. See 3.7.3.
I know the ASCII character set
specifies that each character such as ''b'' or ''B'' is an 8 bit character.




7位,而不是8. ASCII是一个7位代码。


< snip>


-

Richard Heathfield: bi****@eton.powernet.co .uk

Usenet是一个奇怪的地方。 - Dennis M Ritchie,1999年7月29日。

C FAQ: http://www.eskimo.com/~scs/C-faq/top.html

K& R答案,C书等:< a rel =nofollowhref =http://users.powernet.co.uk/etontarget =_ blank> http://users.powernet.co.uk/eton



7 bits, not 8. ASCII is a 7-bit code.

<snip>

--
Richard Heathfield : bi****@eton.powernet.co.uk
"Usenet is a strange place." - Dennis M Ritchie, 29 July 1999.
C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
K&R answers, C books, etc: http://users.powernet.co.uk/eton


Bill Cunningham< so ** @ some.net>写道:
Bill Cunningham <so**@some.net> wrote:

我一直在网上阅读C标准,我很困惑多字节的字符是什么。我认为广泛的字符可以是诸如广东话或日语等语言的字符。我知道ASCII字符集指定每个
字符,如''b''或''B''是一个8位字符。什么是多字节
字符?

I''ve been reading the C standard online and I''m puzzled as to what multibyte
chars are. Wide chars I believe would be characters for languages such as
cantonese or Japanese. I know the ASCII character set specifies that each
character such as ''b'' or ''B'' is an 8 bit character. So what''s a multibyte
character?




单个逻辑字符,需要多个字节才能表达。

For例如,考虑ISO 10646的UTF-8编码格式:正常

ASCII字符(在\ x00和\ x7f之间)编码为单个字节

with the相同的价值。其他字符编码为多个字节,

,每个字节都设置了最高位;第一个字节在\ xc0

到\ xfd的范围内,并指示后面的字节数,后续字节

在\ x80范围内到\\ \\xbf。 UTF-8编码的字符可以是1到6个字节之间的任何长度。所以''''被编码为\ x41但是''?''

(版权符号)被编码为\ xc2 \ xa9。


多字节编码可以非常节省空间,但由于不同的字符长度不同,因此很难处理它们。另一方面,广泛的

字符旨在有效处理
,但不一定是节省空间的。宽字符是

整数足够大,以便每个逻辑字符可以只用一个宽字符代表



-Larry Jones


如果我得到一个不好的成绩,那将是你没有为我做的工作的错!

- Calvin



A single logical character that requires more than one byte to express.
For example, consider the UTF-8 encoding format for ISO 10646: normal
ASCII characters (between \x00 and \x7f) are encoded as a single byte
with the same value. Other characters are encoded as multiple bytes,
each of which has the top bit set; the first byte is in the range \xc0
to \xfd and indicates the number of bytes that follow, subsequent bytes
are in the range \x80 to \xbf. UTF-8 encoded characters can be any
length between one and six bytes. So ''A'' is encoded as \x41 but ''?''
(the copyright sign) is encoded as \xc2\xa9.

Multibyte encodings can be very space efficient, but they are difficult
to process since different characters have different lengths. Wide
characters, on the other hand, are intended to be efficient for
processing, but not necessarily space efficient. Wide characters are
integers that are large enough so that every logical character can be
represented in just one wide character.

-Larry Jones

If I get a bad grade, it''ll be YOUR fault for not doing the work for me!
-- Calvin




< la ************ @ eds.com>在消息新闻中写道:nv ********** @ cvg-65-27-189-87.cinci.rr.com ...

<la************@eds.com> wrote in message news:nv**********@cvg-65-27-189-87.cinci.rr.com...
Bill Cunningham< so ** @ some.net>写道:
Bill Cunningham <so**@some.net> wrote:

我一直在网上阅读C标准,我很困惑多字节的字符是什么。我认为广泛的字符可以是诸如广东话或日语等语言的字符。我知道ASCII字符集指定每个
字符,如''b''或''B''是一个8位字符。那么多字节是什么角色?

I''ve been reading the C standard online and I''m puzzled as to what multibyte
chars are. Wide chars I believe would be characters for languages such as
cantonese or Japanese. I know the ASCII character set specifies that each
character such as ''b'' or ''B'' is an 8 bit character. So what''s a multibyte
character?



单个逻辑字符需要多个字节才能表达。
例如,考虑UTF-8编码ISO 10646的格式:正常
ASCII字符(在\ x00和\ x7f之间)被编码为单个字节
具有相同的值。



A single logical character that requires more than one byte to express.
For example, consider the UTF-8 encoding format for ISO 10646: normal
ASCII characters (between \x00 and \x7f) are encoded as a single byte
with the same value.




我的理解是标准要求''A''== L''A''事实

基本字符集必须是扩展的子集

字符集。这样做,你上面提到的意味着代码值与ASCII'不同的

字符集不能是代码实现的基本

Unicode的值用作扩展集的



-

Jun,Woong(我的****** @ hanmail.net)

大学物理系首尔



My understanding is that the standard requires ''A'' == L''A'' by the fact
that the basic character set must be a subset of the extended
character set. Do this and what you mentioned above mean that a
character set whose code values differ from ASCII''s can''t be the basic
set on an implementation where code values of Unicode is used as those
of the extended set?
--
Jun, Woong (my******@hanmail.net)
Dept. of Physics, Univ. of Seoul


这篇关于多字节字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆