Eclipse错误Java属性UTF-8编码 [英] Eclipse wrong Java properties UTF-8 encoding

查看:105
本文介绍了Eclipse错误Java属性UTF-8编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个JavaEE项目,其中我使用消息属性文件。这些文件的编码设置为UTF-8。在文件中,我使用德语变音符,如äöü。问题是,有时这些字符被替换为unicode像 \\\�\\\� ,但不是每个字符。现在,我有一个案例,其中äü都被替换为 \\\�\\ \\ uFFD ,但不是每次发生äü



Git差异显示如下:

  mail.adresses =邮件hinzufügen:
-mail.adresses.multiple =电子邮件昵称Kommata getrennthinzufügen。
+ mail.adresses.multiple =电子邮件durch Kommata getrennt hinzuf\\\�\\\�gen。
mail.title = Einladungs-E-Mail
box.preview = Vorschau
box.share.text = Siekönnenjetzt dieausgewähltenBilder mit Ihren Freunden teilen。
@@ -6880,7 +6880,7 @@ browser.cancel = Abbrechen
browser.selectImage =übernehmen
browser.starImage = merken
browser.removeImage =Löschen
-browser.searchForSimilarImages =ähnliche
+ browser.searchForSimilarImages = \\\�\\\�hnliche
browser.clear_drop_box =löschen

此外,还有行已更改,我没有触及。我不明白为什么我得到这样的行为。可能是上述问题的原因?



我的系统:




  • Antergos / Arch Linux




    • 系统编码UTF-8

        Python 3.5.0(默认,2015年9月20日,11:28:25)
      [GCC 5.2.0] on linux
      键入help,版权,信用或许可。
      >>>导入sys
      >>>> sys.getdefaultencoding()
      'utf-8'



  • Eclipse Mars 1




    • 文本文件编码UTF-8

    • 属性文件编码UTF-8


  • Tomcat 8

  • Java JDK 8



如果我使用另一个编辑器如或






搁置:1



使用 native2ascii - 本机到ASCII转换器



什么 native2ascii :它转换所有非ISO 8859-1字符的等效\uXXXX。这是一个很好的工具,因为您不需要搜索相当于特殊字符的\uXXXX。



UTF-8的用法: native2ascii -encoding utf8 e:\a.txt e:\b.txt






另外:2



每个计算机程序,无论IDE,应用程序服务器,Web服务器,浏览器等是否理解所以它需要知道如何解释这些位以使预期的感觉,因为根据使用的编码,相同的位可以表示不同的字符。 这就是通过给出一个唯一的标识符来代表一个字符Encoding,所有的计算机程序,不同的操作系统等都知道正确的解释方式。



所以,如果你使用一些编码方案写入一个文件,可以说UTF-8,然后使用任何编辑器进行阅读,但是使用编码方案作为UTF-8,那么你可以期待以获得正确的显示。



请阅读我的这个答案以获得更多的细节,但是从浏览器 - 服务器的角度来看。


I have a JavaEE project, in which I use message properties files. The encoding of those file is set to UTF-8. In the file I use the german umlauts like ä, ö, ü. The problem is, sometimes those characters are replaced with unicode like \uFFFD\uFFFD, but not for every character. Now, I have a case where ä and ü are both replaced with \uFFFD\uFFFD, but not for every occurring of ä and ü.

The Git diff shows me something like this:

 mail.adresses=E-Mail hinzufügen:
-mail.adresses.multiple=E-Mails durch Kommata getrennt hinzufügen.
+mail.adresses.multiple=E-Mails durch Kommata getrennt hinzuf\uFFFD\uFFFDgen.
 mail.title=Einladungs-E-Mail
 box.preview=Vorschau
 box.share.text=Sie können jetzt die ausgewählten Bilder mit Ihren Freunden teilen.
@@ -6880,7 +6880,7 @@ browser.cancel=Abbrechen
 browser.selectImage=übernehmen
 browser.starImage=merken
 browser.removeImage=Löschen
-browser.searchForSimilarImages=ähnliche
+browser.searchForSimilarImages=\uFFFD\uFFFDhnliche
 browser.clear_drop_box=löschen

Also, there are lines changed, which I have not touched. I don't understand why I get such a behavior. What could be the cause for the above problem?

My system:

  • Antergos / Arch Linux

    • System encoding UTF-8

      Python 3.5.0 (default, Sep 20 2015, 11:28:25) 
      [GCC 5.2.0] on linux
      Type "help", "copyright", "credits" or "license" for more information.
      >>> import sys
      >>> sys.getdefaultencoding()
      'utf-8'
      

  • Eclipse Mars 1

    • Text file encoding UTF-8
    • Properties file encoding UTF-8
  • Tomcat 8
  • Java JDK 8

If I use another Editor like Atom to edit those message properties files, I don't ran into this problem.

I also realized in a case, if I copy the original value browser.searchForSimilarImages=ähnliche from Git diff and replace the wrong value browser.searchForSimilarImages=\uFFFD\uFFFDhnliche in Eclipse with that, then I have the correct umlauts in the message properties file.

解决方案

Root cause:

By default ISO 8859-1 character encoding is used for Eclipse properties file (read here), so if the file contains any character beyond ISO 8859-1 then it will not be processed as expected.

Solution 1

If you use Eclipse then you will notice that it implicitly converts the special character into \uXXXX equivalent. Try copying

会意字 / 會意字

into a properties file opened in Eclipse.

EDIT: As per comment from OP

Update the encoding of your Eclipse as shown below. If you set encoding as UTF-32 then even you can see Chinese character, which you cannot see generally.

How to change Encoding of properties file in Eclipse: See this Eclipse Bugzilla bug for more details, which talks about several other possibilities and in the end suggest what I have highlighted below.

Chinese characters can be seen in Eclipse after encoding is set properly:

Solution 2

If above doesn't work consistently for you (it does work for me and I never see encoding issues) then try this using some Eclipse plugin which handles encoding of properties or other files. For example Eclipse ResourceBundle Editor or Extended Resource-Bundle editor

I would recommend using Eclipse ResourceBundle Editor.

Solution 3

Another possibility to change encoding of file is using Edit --> Set Encoding option. It really matters because it changes the default character set and file encoding. Play around with by changing encoding using Edit --> Set Encoding option and do following Java sysout System.out.println("Default Charset=" + Charset.defaultCharset()); and System.out.println(System.getProperty("file.encoding"));


As an aside: 1

Process the properties file to have content with ISO 8859-1 character encoding by using native2ascii - Native-to-ASCII Converter

What native2ascii does: It converts all the non-ISO 8859-1 character in their equivalent \uXXXX. This is a good tool because you need not to search the \uXXXX equivalent of special character.

Usage for UTF-8: native2ascii -encoding utf8 e:\a.txt e:\b.txt


As an aside: 2

Every computer program whether an IDE, application server, web server, browser, etc. understands only bits, so it need to know how to interpret the bits to make expected sense out of it because depending upon encoding used, same bits can represent different characters. And that's where "Encoding" comes into picture by giving a unique identifier to represent a character so that all computer programs, diverse OS etc. knows exact right way to interpret it.

So, if you have written into a file using some encoding scheme, lets say UTF-8, and then reading using any editor but running with encoding scheme as UTF-8 then you can expect to get correct display.

Please do read my this answer to get more details but from browser-server perspective.

这篇关于Eclipse错误Java属性UTF-8编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆