什么是多字节字符集? [英] What is a multibyte character set?

查看:862
本文介绍了什么是多字节字符集?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

术语多字节"是指字符集可以(但不一定要)大于1个字节(例如UTF-8)的字符集,还是指的是在任何情况下都大于1个字节的字符集? (例如UTF-16)?换句话说:如果有人谈论多字节字符集是什么意思?

解决方案

该术语不明确,但是在我的国际化工作中,我们通常避免使用多字节字符集"这一术语来指代基于Unicode的编码.通常,我们仅将此术语用于具有一个或多个字节来定义每个字符的旧式编码方案(不包括每个字符仅需要一个字节的编码).

通常包括Shift-jis,jis,euc-jp,euc-kr以及中文编码.

除某些例外情况外,大多数传统编码都需要某种状态机模型(或更简单地说,是页面交换模型)来处理,并且在文本流中向后移动很复杂且容易出错. UTF-8和UTF-16不会遇到此问题,因为可以使用位掩码来测试UTF-8,并且可以针对一系列代理对来测试UTF-16,因此可以在非病理性文档中前后移动安全地完成工作,而不会造成很大的复杂性.

一些用于泰语和越南语的传统编码具有多字节字符集的某些复杂性,但实际上只是建立在组合字符的基础上,通常不会与广义的多字节"混为一谈.

Does the term multibyte refer to a charset whose characters can - but don't have to be - wider than 1 byte, (e.g. UTF-8) or does it refer to character sets which are in any case wider than 1 byte (e.g. UTF-16) ? In other words: What is meant if anybody talks about multibyte character sets?

解决方案

The term is ambiguous, but in my internationalization work, we typically avoided the term "multibyte character sets" to refer to Unicode-based encodings. Generally, we used the term only for legacy encoding schemes that had one or more bytes to define each character (excluding encodings that require only one byte per character).

Shift-jis, jis, euc-jp, euc-kr, along with Chinese encodings are typically included.

Most of the legacy encodings, with some exceptions, require a sort of state machine model (or, more simply, a page swapping model) to process, and moving backwards in a text stream is complicated and error-prone. UTF-8 and UTF-16 do not suffer from this problem, as UTF-8 can be tested with a bitmask and UTF-16 can be tested against a range of surrogate pairs, so moving backward and forward in a non-pathological document can be done safely without major complexity.

A few legacy encodings, for languages like Thai and Vietnamese, have some of the complexity of multibyte character sets but are really just built on combining characters, and aren't generally lumped in with the broad term "multibyte."

这篇关于什么是多字节字符集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆