编码困难 [英] Encoding difficulties

查看:93
本文介绍了编码困难的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个代码,遇到一些编码问题。接收到使用ISO-8859-1进行解码的加密字符串。然后将该字符串放入具有UTF-8编码的DB中。当检索到这个字符串时,它仍然是ISO-8859-1,没有问题。问题是我还需要能够将这个字符串作为UTF-8来检索,但是在这方面我并没有成功。

I'm having some encoding problems with a code I'm working on. An encrypted string is received which is decoded with ISO-8859-1. This string is then put into a DB which has UTF-8 encoding. When this string is retrieved it's still ISO-8859-1, and there's no problems. The issue is that I also need to be able to retrieve this string as UTF-8, but I haven't been successfull in this.

使用以下方法从DB中检索时,我尝试将字符串从ISO转换为UTF-8:

I've tried to convert the string from ISO to UTF-8 when retrieved from the DB using this method:

private String convertIsoToUtf8(String isoLatin) {
    try {
        return new String(isoLatin.getBytes("ISO_8859_1"), "UTF_8");
    } catch (UnsupportedEncodingException e) {
        return isoLatin;
    }
}

不幸的是,这些特殊字符只是显示为问题 - 在这种情况下的标记。

Unfortunately, the special characters are just displayed as question-marks in this case.

原始字符串:Testæøå
从DB中删除并转换为UTF-8后的示例输出:Test ???

Original string: Test æøå Example output after retriving from DB and converting to UTF-8: Test ???

更新:阅读评论中提供的链接后,我设法得到正确的。由于DB已经是UTF-8编码,所以我需要做的就是这样:

Update: After reading the link provided in the comment, I managed to get it right. Since the DB is already UTF-8 encoded, all I needed to do was this:

return new String(isoLatin.getBytes("UTF-8"));


推荐答案

当您已经有一个字符串 - 由于某些信息可能已经丢失,因此修正任何编码问题通常太晚了 - 将无法映射的字符视为java内部的UTF- 16表示。

When you already have a String-object it is usually too late to correct any encoding-issues since some information may already have been lost - think of characters that can't be mapped one-to-one onto to java's internal UTF-16 representation.

处理字符转换的正确位置是您获取字符串的时刻:从文件读取输入时(在<$ c上设置正确的编码) $ c> InputStreamReader ),当转换从解密获取的字节[] 时,从数据库读取(这应该由你的JDBC -driver)等。

The correct place to handle character-ecoding is the moment you get your Strings: when reading input from a file (set the correct encoding on your InputStreamReader), when converting the byte[] you got from decryption, when reading from the database (this should be handeled by your JDBC-driver) etc.

在执行相反操作时,还要注意正确处理编码。虽然在使用默认编码时大部分时间似乎都可以正常工作,但是您可能迟早会遇到难以解决的问题(如现在所述)。

Also take care to correctly handle the encoding when doing the reverse. While it might seem to work OK most of the time when you use the default-encoding you might run into issues sooner or later that become difficult to impossible to resolve (as you do now).

PS:还要记住您用来显示输出的工具:某些控制台将不显示UTF-16或UTF-8,检查用于查看文件的编辑器的编码设置等有时您的输出可能是正确的,无法正确显示。

P.S.: also keep in mind what tool you are using to display your output: some consoles won't display UTF-16 or UTF-8, check the encoding-settings of the editor you use to view your files etc. Sometimes your output might be correct and just can't be displayed correctly.

这篇关于编码困难的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆