Linux 上的 Java 字节到字符串编码问题 [英] Java byte to String encoding problem on Linux

查看:23
本文介绍了Linux 上的 Java 字节到字符串编码问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在实现一个像这样工作的软件:

I am implementing a piece of software that works like this:

我有一台 Linux 服务器,运行着一个输出文本的 vt100 终端应用程序.我的程序 telnet 服务器并将文本的位读取/解析为相关数据.相关数据被发送到一个由网络服务器运行的小型客户端,该客户端在 HTML 页面上输出数据.

I have a Linux server running a vt100 terminal application that outputs text. My program telnets the server and reads/parses bits of the text into relevant data. The relevant data is sent to a small client run by a webserver that outputs the data on a HTML page.

我的问题是某些特殊字符如åäö"被输出为问号(经典).

My problem is that certain special characters like "åäö" is outputted as questionmarks (classic).

背景:
我的程序使用 Apache Commons TelnetClient 读取字节流.字节流被转换为字符串,然后相关位被子串化并与分隔符一起放回.在此之后,新字符串被转换回字节数组并使用 Socket 发送到由网络服务器运行的客户端.该客户端根据接收到的字节创建一个字符串并将其打印在标准输出上,网络服务器从中读取并输出 HTML.

Background:
My program reads a byte stream using Apache Commons TelnetClient. The byte stream is converted into a String, then the relevant bits is substring'ed and put back toghether with separator characters. After this the new string is converted back into a byte array and sent using a Socket to the client run by the webserver. This client creates a string from the received bytes and prints it out on standard output, which the webserver reads and outputs HTML from.

第 1 步:byte[] --> String --> byte[] --> [发送到客户端]

Step 1: byte[] --> String --> byte[] --> [send to client]

步骤 2:byte[] --> 字符串 --> [打印输出]

Step2: byte[] --> String --> [print output]

问题:
当我在 Windows 上运行我的 Java 程序时,所有字符,包括åäö",都会在生成的 HTML 页面上正确输出.但是,如果我在 Linux 上运行该程序,所有特殊字符都会转换为?"(问号).

Problem:
When i run my Java program on Windows all characters, including "åäö", are outputted correctly on the resulting HTML page. However if i run the program on Linux all special characters are converted into "?" (questionmark).

网络服务器和客户端当前在 Windows 上运行(第 2 步).

The webserver and the client is currently running on Windows (step 2).

代码:
该程序基本上是这样工作的:

Code:
The program basically works like this:

我的程序:

byte[] data = telnetClient.readData() // Assume method works and returns a byte[] array of text.

// I have my reasons to append the characters one at a time using a StringBuffer.
StringBuffer buf = new StringBuffer();
for (byte b : data) {
    buf.append((char) (b & 0xFF));
}

String text = buf.toString();

// ...
// Relevant bits are substring'ed and put back into the String.
// ...

ServerSocket serverSocket = new ServerSocket(...);
Socket socket = serverSocket.accept();
serverSocket.close();

socket.getOutputStream.write(text.getBytes());
socket.getOutputStream.flush();

由网络服务器运行的客户端:

The client run by webserver:

Socket socket = new Socket(...);

byte[] data = readData(socket); // Assume this reads the bytes correctly.

String output = new String(data);

System.out.println(output);

假设读取和写入之间的同步工作.

Assume the synchronizing between the reads and writes works.

想法:
我尝试了不同的编码和解码字节数组的方法,但没有结果.我对字符集编码问题有点陌生,希望得到一些指点.WindowsWINDOWS 1252"中的默认字符集似乎让特殊字符通过所有服务器到网络服务器,但是在Linux计算机上运行时,默认字符集是不同的.我尝试运行Charset.defaultCharset().forName()",它显示我的 Linux 计算机设置为US-ASCII".我以为Linux默认为UTF-8"?

Thoughts:
I have tried with different ways of encoding and decoding the byte array with no results. I am a little new to charset encoding issues and would like to get some pointers. The default charset in Windows "WINDOWS 1252" seems to let the special characters through all the way server to webserver, but the when run on a Linux computer the default charset is different. I have tried to run a "Charset.defaultCharset().forName()" and it shows that my Linux computer is set to "US-ASCII". I thought that Linux defaulted to "UTF-8"?

我应该怎样做才能让我的程序在 Linux 上运行?

How should I do to get my program to work on Linux?

推荐答案

依赖平台默认编码通常是个坏主意,尤其是对于网络通信协议.

It's generally a bad idea to rely on the platform default encoding, especially for a network communication protocol.

new String()String.getBytes() 都被重载以允许您指定编码.由于您控制编码和解码,因此只需使用 UTF-8(硬编码).

Both new String() and String.getBytes() are overloaded to allow you to specify the encoding. Since you control encoding as well as decoding, simply use UTF-8 (hardcoded).

还要检查您的代码是否使用了 FileInputStreamFileOutputStreamInputStreamReaderOutputStreamWriter,所有这些都是潜在的依赖于平台默认编码(前两个,专门,这使得它们非常无用).

Also check your code for uses of FileInputStream, FileOutputStream, InputStreamReader and OutputStreamWriter, all of which ptentially rely on the platform default encoding (the first two, exclusively, which makes them pretty useless).

这篇关于Linux 上的 Java 字节到字符串编码问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆