在 Java 中将 ANSI 字符转换为 UTF-8 [英] Convert ANSI characters to UTF-8 in Java
问题描述
有没有办法使用 Java 将 ANSI 字符串转换为 UTF.
Is there a way to convert an ANSI string to UTF using Java.
我有一个使用 readUTF & 的自定义序列化程序DataInputStream 类的 writeUTF 方法来反序列化和序列化字符串.如果我收到一个以 ANSI 编码的字符串并且太长,大约 100000 个字符长,我会收到错误消息;
I have a custom serializer that uses readUTF & writeUTF methods of the DataInputStream class to deserialize and serialze string. If i receive a string encoded in ANSI and is too long, ~100000 chars long i get the error;
原因:java.io.UTFDataFormatException:编码字符串太长:106958 字节
但是在我的 Junit 测试中,我能够创建一个包含 120000 个 'a' 的字符串并且它运行良好
However in my Junit tests i'm able create a string with 120000 'a's and it works perfectly
我已经检查了以下帖子,但仍有错误;
I have checked the following posts but still having errors;
推荐答案
此错误不是由字符编码引起的.表示UTF数据的长度有误.
This error is not caused by character encoding. It means the length of the UTF data is wrong.
刚刚意识到这是一个写入错误,而不是读取错误.
Just realized this is a writing error, not reading error.
UTF 长度只有 2 个字节,所以它只能容纳 64K UTF-8 字节.您正在尝试写入 100K,这是行不通的.
The UTF length is only 2 bytes so it can only hold 64K UTF-8 bytes. You are trying to writing 100K, it's not going to work.
这个限制是硬编码的,没有办法绕过这个,
This limit is hardcoded and no way to get around this,
if (utflen > 65535)
throw new UTFDataFormatException(
"encoded string too long: " + utflen + " bytes");
这篇关于在 Java 中将 ANSI 字符转换为 UTF-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!