什么是字符编码,我为什么要打扰它 [英] What is character encoding and why should I bother with it

查看:24
本文介绍了什么是字符编码,我为什么要打扰它的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对字符编码的概念很困惑.

什么是 Unicode、GBK 等?编程语言如何使用它们?

What is Unicode, GBK, etc? How does a programming language use them?

我需要费心了解他们吗?有没有一种更简单或更快的编程方式,而不必为它们烦恼?

Do I need to bother knowing about them? Is there a simpler or faster way of programming without having to trouble myself with them?

推荐答案

(请注意,我松散地/通俗地使用其中一些术语是为了更简单的解释,但仍然触及关键点.)

(Note that I'm using some of these terms loosely/colloquially for a simpler explanation that still hits the key points.)

一个字节只能有 256 个不同的值,即 8 位.

A byte can only have 256 distinct values, being 8 bits.

由于字符集中存在超过 256 个字符的字符集,因此一般不能简单地说每个字符是一个字节.

Since there are character sets with more than 256 characters in the character set one cannot in general simply say that each character is a byte.

因此,必须有描述如何将字符集中的每个字符转换为字节序列的映射.某些字符可能会映射到单个字节,但其他字符则必须映射到多个字节.

Therefore, there must be mappings that describe how to turn each character in a character set into a sequence of bytes. Some characters might be mapped to a single byte but others will have to be mapped to multiple bytes.

那些映射是编码,因为它们告诉您如何将字符编码为字节序列.

Those mappings are encodings, because they are telling you how to encode characters into sequences of bytes.

至于 Unicode,在非常高的层次上,Unicode 试图为每个字符分配一个唯一的编号.显然,这个数字必须比一个字节更宽,因为有超过 256 个字符:) Java 使用一个 Unicode 版本,其中每个字符都被分配一个 16 位值(这就是为什么 Java 字符是 16 位宽并且有整数的原因)0 到 65535 之间的值).当您获得 Java 字符的字节表示时,您必须告诉 JVM 您要使用的编码,以便它知道如何为字符选择字节序列.

As for Unicode, at a very high level, Unicode is an attempt to assign a single, unique number to every character. Obviously that number has to be something wider than a byte since there are more than 256 characters :) Java uses a version of Unicode where every character is assigned a 16-bit value (and this is why Java characters are 16 bits wide and have integer values from 0 to 65535). When you get the byte representation of a Java character, you have to tell the JVM the encoding you want to use so it will know how to choose the byte sequence for the character.

这篇关于什么是字符编码,我为什么要打扰它的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆