String.getBytes(charset)对于EBCDIC-charset有错误 [英] String.getBytes(charset) has errors for EBCDIC-charset

查看:163
本文介绍了String.getBytes(charset)对于EBCDIC-charset有错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

通过String.getBytes(charset)将字符串转换为EBCDIC至少提供一个假结果。字符a变为0x3f,但应为0x81。

The String-conversion to EBCDIC via String.getBytes(charset) supplys at least one false result. The character "a" becomes a 0x3f but should be 0x81.

public static void  convert() throws UnsupportedEncodingException {
    String data="abcABC";
    String ebcdic = "IBM-1047";
    String ascii  = "ISO-8859-1";

    System.out.printf("Charset %s is supported: %s\n", ebcdic, Charset.isSupported(ebcdic));
    String result= new String(data.getBytes(ebcdic));
    System.out.printf("EBCDIC: %s\n",asHex(result.getBytes()));

    System.out.printf("Charset %s is supported: %s\n", ascii, Charset.isSupported(ascii));
    result= new String(data.getBytes(ascii));
    System.out.printf("ASCII: %s\n",asHex(result.getBytes()));
}

public static String asHex(byte[] buf) {
    char[] HEX_CHARS = "0123456789abcdef".toCharArray();
    char[] chars = new char[2 * buf.length];
    for (int i = 0; i < buf.length; ++i)
    {
        chars[2 * i] = HEX_CHARS[(buf[i] & 0xF0) >>> 4];
        chars[2 * i + 1] = HEX_CHARS[buf[i] & 0x0F];
    }
    return new String(chars);
}

结果ist:


  • 支持IBM-1047的字符集:true

  • EBCDIC:3f8283c1c2c3

  • Charset ISO-8859-1 supported:true

  • ASCII:616263414243

  • Charset IBM-1047 is supported: true
  • EBCDIC: 3f8283c1c2c3
  • Charset ISO-8859-1 is supported: true
  • ASCII: 616263414243

我可以做些什么吗? >

Anything I can do about this?

推荐答案

当您调用

data.getBytes(ebcdic)

您正在将数据中的文本编码为EBCDIC字节。然后,从这些字节创建一个字符串,如同它们为您的系统的默认字符编码中的一些字符串:这将导致破坏,因为字节不必编码任何其他编码比EBCDIC。

You are encoding the text in data into EBCDIC bytes. Then you create a string from these bytes as if they stood for some string in the default character encoding for your system: this causes breakage because the bytes don't have to encode valid text in any other encoding than EBCDIC.

为了解决这个问题,将字节保存为字节:

To fix this, keep bytes as bytes:

byte[] result= data.getBytes(ebcdic);
System.out.printf("EBCDIC: %s\n",asHex(result));

这篇关于String.getBytes(charset)对于EBCDIC-charset有错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆