Linux上的Java Charset问题 [英] Java Charset problem on linux

查看:72
本文介绍了Linux上的Java Charset问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题:我有一个包含特殊字符的字符串,我将其转换为字节,反之亦然..该转换在Windows上正常运行,但在Linux上,特殊字符未正确转换.如图所示,Linux上的默认字符集为UTF-8使用Charset.defaultCharset.getdisplayName()

但是,如果我在带有选项-Dfile.encoding = ISO-8859-1的Linux上运行,它将正常工作..

如何使用UTF-8默认字符集而不是在unix环境中设置-D选项来使其工作.

我使用jdk1.6.13

edit:代码段与cs ="ISO-8859-1"一起使用;或cs ="UTF-8";赢了,但在Linux中没有

 字符串x =½";System.out.println(x);byte [] ba = x.getBytes(Charset.forName(cs));为(字节b:ba){System.out.println(b);}字符串y =新字符串(ba,Charset.forName(cs));System.out.println(y); 

〜致爸爸

解决方案

您的字符可能已被编译过程破坏,并且最终在类文件中出现了垃圾数据.

如果我在Linux上使用-Dfile.encoding = ISO-8859-1选项运行,它将正常工作..

J2SE平台规范不需要"file.encoding"属性;它是Sun实现的内部细节,不应由用户代码检查或修改.它也打算是只读的.从技术上讲,在命令行上或程序执行期间的任何其他时间,都无法支持将此属性设置为任意值.

简而言之,不要使用-Dfile.encoding = ...

 字符串x =½"; 

由于U + 00bd(½)将以不同的编码表示为不同的值:

  windows-1252 BDUTF-8 C2 BDISO-8859-1 BD 

...您需要告诉编译器源文件的编码方式为:

  javac-编码ISO-8859-1 Foo.java 

现在我们来看看这个:

  System.out.println(x); 

作为 PrintStream ,在发出字节数据之前将数据编码为系统编码.像这样:

  System.out.write(x.getBytes(Charset.defaultCharset())); 

某些平台-字节编码必须与控制台期望的字符正确显示的编码相同.

problem: I have a string containing special characters which i convert to bytes and vice versa..the conversion works properly on windows but on linux the special character is not converted properly.the default charset on linux is UTF-8 as seen with Charset.defaultCharset.getdisplayName()

however if i run on linux with option -Dfile.encoding=ISO-8859-1 it works properly..

how to make it work using the UTF-8 default charset and not setting the -D option in unix environment.

edit: i use jdk1.6.13

edit:code snippet works with cs = "ISO-8859-1"; or cs="UTF-8"; on win but not in linux

        String x = "½";
        System.out.println(x);
        byte[] ba = x.getBytes(Charset.forName(cs));
        for (byte b : ba) {
            System.out.println(b);
        }
        String y = new String(ba, Charset.forName(cs));
        System.out.println(y);

~regards daed

解决方案

Your characters are probably being corrupted by the compilation process and you're ending up with junk data in your class file.

if i run on linux with option -Dfile.encoding=ISO-8859-1 it works properly..

The "file.encoding" property is not required by the J2SE platform specification; it's an internal detail of Sun's implementations and should not be examined or modified by user code. It's also intended to be read-only; it's technically impossible to support the setting of this property to arbitrary values on the command line or at any other time during program execution.

In short, don't use -Dfile.encoding=...

    String x = "½";

Since U+00bd (½) will be represented by different values in different encodings:

windows-1252     BD
UTF-8            C2 BD
ISO-8859-1       BD

...you need to tell your compiler what encoding your source file is encoded as:

javac -encoding ISO-8859-1 Foo.java

Now we get to this one:

    System.out.println(x);

As a PrintStream, this will encode data to the system encoding prior to emitting the byte data. Like this:

 System.out.write(x.getBytes(Charset.defaultCharset()));

That may or may not work as you expect on some platforms - the byte encoding must match the encoding the console is expecting for the characters to show up correctly.

这篇关于Linux上的Java Charset问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆