Linux上的Java字节到字符串编码问题 [英] Java byte to String encoding problem on Linux

查看:169
本文介绍了Linux上的Java字节到字符串编码问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在实现一个像这样的软件:



我有一个Linux服务器运行一个输出文本的vt100终端应用程序。
我的程序telnets服务器并读取/解析文本的位到相关数据。
相关数据被发送到一个由web服务器运行的小型客户端,该客户端在HTML页面上输出数据。



我的问题是某些特殊字符,



我的程序使用 读取字节流。 > Apache Commons TelnetClient 。字节流被转换为字符串,然后相关位被子字符串,并与分隔符字符放在一起。之后,新的字符串被转换回一个字节数组,并使用Socket发送到由web服务器运行的客户端。该客户端从接收到的字节创建一个字符串,并将其打印在标准输出上,Web服务器从中读取并输出HTML。

步骤1:
byte [ ] - > String - > byte [] - > [send to client]



步骤2:
byte [] - > String - [print output]



问题:

当我在Windows上运行我的Java程序时,所有字符,包括åäö ,在生成的HTML页面上正确输出。但是,如果我在 Linux 上运行该程序,所有特殊字符都将转换为(问号)。



网络服务器和客户端当前正在Windows上运行(第2步)。



代码:

像这样:



我的程序:

  byte [] data = telnetClient.readData()//假设方法工作,并返回一个byte []数组的文本。 

//我有我的理由使用StringBuffer一次添加一个字符。
StringBuffer buf = new StringBuffer();
for(byte b:data){
buf.append((char)(b& 0xFF));
}

String text = buf.toString();

// ...
//相关位被子字符串化并放回String中。
// ...

ServerSocket serverSocket = new ServerSocket(...);
Socket socket = serverSocket.accept();
serverSocket.close();

socket.getOutputStream.write(text.getBytes());
socket.getOutputStream.flush();

由webserver运行的客户端:

  Socket socket = new Socket(...); 

byte [] data = readData(socket); //假设这正确读取字节。

String output = new String(data);

System.out.println(output);

假设读取和写入之间的同步工作。



想法:

我尝试过不同的编码和解码字节数组的方法,没有结果。我有一个新的字符集编码问题,并希望得到一些指针。 Windows中的默认字符集WINDOWS 1252似乎让特殊字符通过服务器到webserver,但是当在Linux计算机上运行时,默认字符集是不同的。我试图运行Charset.defaultCharset()。forName(),它显示我的Linux计算机设置为US-ASCII。我认为Linux默认为UTF-8?



我应该如何让我的程序在Linux上工作?

解决方案

对于网络通信协议,依靠平台默认编码通常是个坏主意,特别是。



两个 new String() String.getBytes()重载,以允许您指定编码。由于您控制编码和解码,只需使用UTF-8(硬编码)。



同样检查您的代码使用 FileInputStream FileOutputStream InputStreamReader OutputStreamWriter 所有这些都依赖于平台默认编码(前两个,排他,这使得它们很无用)。


I am implementing a piece of software that works like this:

I have a Linux server running a vt100 terminal application that outputs text. My program telnets the server and reads/parses bits of the text into relevant data. The relevant data is sent to a small client run by a webserver that outputs the data on a HTML page.

My problem is that certain special characters like "åäö" is outputted as questionmarks (classic).

Background:
My program reads a byte stream using Apache Commons TelnetClient. The byte stream is converted into a String, then the relevant bits is substring'ed and put back toghether with separator characters. After this the new string is converted back into a byte array and sent using a Socket to the client run by the webserver. This client creates a string from the received bytes and prints it out on standard output, which the webserver reads and outputs HTML from.

Step 1: byte[] --> String --> byte[] --> [send to client]

Step2: byte[] --> String --> [print output]

Problem:
When i run my Java program on Windows all characters, including "åäö", are outputted correctly on the resulting HTML page. However if i run the program on Linux all special characters are converted into "?" (questionmark).

The webserver and the client is currently running on Windows (step 2).

Code:
The program basically works like this:

My program:

byte[] data = telnetClient.readData() // Assume method works and returns a byte[] array of text.

// I have my reasons to append the characters one at a time using a StringBuffer.
StringBuffer buf = new StringBuffer();
for (byte b : data) {
    buf.append((char) (b & 0xFF));
}

String text = buf.toString();

// ...
// Relevant bits are substring'ed and put back into the String.
// ...

ServerSocket serverSocket = new ServerSocket(...);
Socket socket = serverSocket.accept();
serverSocket.close();

socket.getOutputStream.write(text.getBytes());
socket.getOutputStream.flush();

The client run by webserver:

Socket socket = new Socket(...);

byte[] data = readData(socket); // Assume this reads the bytes correctly.

String output = new String(data);

System.out.println(output);

Assume the synchronizing between the reads and writes works.

Thoughts:
I have tried with different ways of encoding and decoding the byte array with no results. I am a little new to charset encoding issues and would like to get some pointers. The default charset in Windows "WINDOWS 1252" seems to let the special characters through all the way server to webserver, but the when run on a Linux computer the default charset is different. I have tried to run a "Charset.defaultCharset().forName()" and it shows that my Linux computer is set to "US-ASCII". I thought that Linux defaulted to "UTF-8"?

How should I do to get my program to work on Linux?

解决方案

It's generally a bad idea to rely on the platform default encoding, especially for a network communication protocol.

Both new String() and String.getBytes() are overloaded to allow you to specify the encoding. Since you control encoding as well as decoding, simply use UTF-8 (hardcoded).

Also check your code for uses of FileInputStream, FileOutputStream, InputStreamReader and OutputStreamWriter, all of which ptentially rely on the platform default encoding (the first two, exclusively, which makes them pretty useless).

这篇关于Linux上的Java字节到字符串编码问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆