ISO-8859-1编码和二进制数据保存 [英] ISO-8859-1 encoding and binary data preservation
问题描述
我在留言中阅读到@Esailija对我的问题的回答
I read in a comment to an answer by @Esailija to a question of mine that
ISO-8859-1是唯一一个完全保留的编码具有确切字节< - >码点匹配的原始二进制数据
我还读入answer by @AaronDigulla:
I also read in this answer by @AaronDigulla that :
在Java中,ISO-8859- 1(aka ISO-Latin1)是1:1映射
In Java, ISO-8859-1 (a.k.a ISO-Latin1) is a 1:1 mapping
这将失败(如此处所示):
// \u00F6 is ö
System.out.println(Arrays.toString("\u00F6".getBytes("utf-8")));
// prints [-61, -74]
System.out.println(Arrays.toString("\u00F6".getBytes("ISO-8859-1")));
// prints [-10]
问题
b $ b
Questions
- 我承认我不太明白 - 为什么它不能得到上面代码中的字节 ?
- 是与
相关的ISO-8859-1
是默认默认值?
<最重要的是, (字节保留行为
ISO-8859-1
) 指定 - 指向源的链接,或JSL会很好。是这个属性唯一的编码吗? - I admit I do not quite get it - why does it not get the bytes in the code above ?
- Most importantly, where is this (byte preserving behavior of
ISO-8859-1
) specified - links to source, or JSL would be nice. Is it the only encoding with this property ? - Is it related to
ISO-8859-1
being the default default ?
另请参见此问题适用于其他的好计数器示例charsets。
See also this question for nice counter examples from other charsets.
推荐答案
\\\ö
不是字节数组。它是一个包含单个字符的字符串。执行以下测试:
"\u00F6"
is not a byte array. It's a string containing a single char. Execute the following test instead:
public static void main(String[] args) throws Exception {
byte[] b = new byte[] {(byte) 0x00, (byte) 0xf6};
String s = new String(b, "ISO-8859-1"); // decoding
byte[] b2 = s.getBytes("ISO-8859-1"); // encoding
System.out.println("Are the bytes equal : " + Arrays.equals(b, b2)); // true
}
要检查这是否对任何字节,代码通过所有字节循环:
To check that this is true for any byte, just improve the code an loop through all the bytes:
public static void main(String[] args) throws Exception {
byte[] b = new byte[256];
for (int i = 0; i < b.length; i++) {
b[i] = (byte) i;
}
String s = new String(b, "ISO-8859-1");
byte[] b2 = s.getBytes("ISO-8859-1");
System.out.println("Are the bytes equal : " + Arrays.equals(b, b2));
}
ISO-8859-1是一种标准编码。所以使用的语言(Java,C#或其他)并不重要。
ISO-8859-1 is a standard encoding. So the language used (Java, C# or whatever) doesn't matter.
Here's a Wikipedia reference that claims that every byte is covered:
在1992年,IANA注册了字符映射ISO_8859-1:1987,更常见的是ISO-8859-1的首选MIME名称(注意超过ISO 8859-1的额外连字符),这是ISO 8859-1的超集,用于Internet上。此映射将C0和C1控制字符分配给未分配的代码值,因此通过每个可能的8位值提供256个字符。
(强调我)
这篇关于ISO-8859-1编码和二进制数据保存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!