在Java中将ANSI字符转换为UTF-8 [英] Convert ANSI characters to UTF-8 in Java
问题描述
有一种方法可以使用Java将ANSI字符串转换为UTF。
我有一个自定义序列化程序,使用readUTF& writeUTF方法的DataInputStream类反序列化和serialze字符串。如果我收到一个字符串编码的ANSI和太长,〜100000字符长我得到错误;
导致:
java.io.UTFDataFormatException:
编码字符串太长:106958字节
但是在我的Junit测试中,我可以创建一个包含120000'a'
我已检查以下帖子,但仍有错误;
此错误不是由字符编码引起的。这意味着UTF数据的长度是错误的。
编辑:只是意识到这是一个写错误,而不是读错误。
UTF长度只有2个字节,因此它只能保存64K个UTF-8字节。
这个限制是硬编码的,没有办法解决这个问题,
if(utflen> 65535)
throw new UTFDataFormatException(
encoded string too long:+ utflen +bytes
Is there a way to convert an ANSI string to UTF using Java.
I have a custom serializer that uses readUTF & writeUTF methods of the DataInputStream class to deserialize and serialze string. If i receive a string encoded in ANSI and is too long, ~100000 chars long i get the error;
Caused by: java.io.UTFDataFormatException: encoded string too long: 106958 bytes
However in my Junit tests i'm able create a string with 120000 'a's and it works perfectly
I have checked the following posts but still having errors;
- Converting UTF-8 to ISO-8859-1 in Java - how to keep it as single byte
- How do I replace accented Latin characters in Ruby?
This error is not caused by character encoding. It means the length of the UTF data is wrong.
EDIT: Just realized this is a writing error, not reading error.
The UTF length is only 2 bytes so it can only hold 64K UTF-8 bytes. You are trying to writing 100K, it's not going to work.
This limit is hardcoded and no way to get around this,
if (utflen > 65535)
throw new UTFDataFormatException(
"encoded string too long: " + utflen + " bytes");
这篇关于在Java中将ANSI字符转换为UTF-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!