字符集特殊字符 [英] Character Set Special Characters

查看:145
本文介绍了字符集特殊字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  • iso-8859-1是utf-8的正确子集吗?
  • iso-8859-n怎么样?
  • Windows-1252怎么样?

如果以上任何一个答案为否",那么不相交的字符是什么?我正在测试一些检测字符集的逻辑,并希望编写测试以验证检测是否正常工作.

If the answer is no to any of the above, what are the disjoint characters? I'm testing some logic that detects charsets and want to write tests to verify the detection is working properly.

推荐答案

iso-8859-1是utf-8的正确子集吗?

Is iso-8859-1 a proper subset of utf-8?

ISO-8859-1的字符报告语言(Unicode的前256个字符)是UTF-8(每个Unicode字符)的适当子集.

The character reportoire of ISO-8859-1 (the first 256 characters of Unicode) is a proper subset of that of UTF-8 (every Unicode character).

但是,字符U + 0080至U + 00FF 编码的两种编码方式.

However, the characters U+0080 to U+00FF are encoded differently in the two encodings.

  • ISO-8859-1从80FF为这些字符中的每个字符分配一个单个字节.
  • UTF-8编码与两字节序列C2 80C3 BF相同的字符.
  • ISO-8859-1 assigns each of these characters a single byte from 80 to FF.
  • UTF-8 encodes the same characters as two-byte sequences C2 80 to C3 BF.

iso-8859-n怎么样?

What about iso-8859-n?

这些是15种不同的编码,总共包含614个不同的字符.其中一些字符出现在ISO 8859的多个部分"中,而某些则没有.您必须更加具体.

These are 15 different encodings that contain a total of 614 distinct characters. Some of these characters occur in multiple "parts" of ISO 8859, and some don't. You'll have to be more specific.

我看到您的问题被标记为ISO-8859-2.在-2中但不在-1中的字符是:

I see that your question is tagged ISO-8859-2. The characters that are in -2 that aren't in -1 are:

Ă㥹ĆćČčĎďĐđĘęĚěĹ弾ŁłŃńŇňŐőŔŕŘřŚśŞşŠšŢţŤťŮůŰűŹźŻżŽžˇ˘˙˛˝

Windows-1252呢?

What about windows-1252?

Windows-1252与ISO-8859-1相似,只不过它用可打印的字符替换了0x80-0x9F范围内很少使用的控制字符.在Windows-1252中但不在ISO-8859-1中的字符是:

Windows-1252 is just like ISO-8859-1 except that it replaces the rarely used control characters in the 0x80-0x9F range with printable characters. The characters that are in windows-1252 but not in ISO-8859-1 are:

ŒœŠšŸŽžƒˆ˜–—‘’‚„†‡•…‰‹›€™

这篇关于字符集特殊字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆