UTF-8在Eclipse的人物 - 也许一个Windows拷贝'东经粘贴问题 [英] UTF-8 characters in Eclipse - maybe a Windows copy 'n paste issue

查看:252
本文介绍了UTF-8在Eclipse的人物 - 也许一个Windows拷贝'东经粘贴问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想国际化Android应用程序。我有一组我在英文写出来的字符串,而我期运用谷歌翻译将它们转换成目标语言。

I'm trying to internationalise an Android application. I've a set of strings which I've written out in english, and I'm useing Google translate to convert them to the target language.

然后我复制和pasteing翻译文本的Eclipse,但它是不正确地显示在Eclipse中。
例如
我开始与英国

Then I'm copy and pasteing the translated text Eclipse, however it's displayed incorrectly in Eclipse. e.g. I start with the English

轴承,为正北度东部

这相当于

德paliers,COMMEdegrés预估北站VRAI

De paliers, comme degrés Est du nord vrai

当我粘贴到Eclipse我得到

and when I paste it to Eclipse I get

德paliers,COMMEdegré小号预估北站VRAI

De paliers, comme degrés Est du nord vrai

我已经检查并为字符串文件的格式是UTF-8,也是我张贴的翻译记事本检查,我得到了正确的字符,这使我怀疑它的东西做Eclipse和Windows 7的任何人有任何意见或替代方法(即编辑将Eclipse外的XML文件(在记事本中为例)工作?)

I've checked and the format for the strings file is UTF-8, also I've checked by posting the translation to notepad and I get the correct characters, which leads me to suspect that it's something to do with Eclipse and Windows 7. Anyone got any ideas or a workaround (i.e. will editing the xml file outside of Eclipse (in notepad for example) work?)

推荐答案

您的字符串是UTF-8(符号A表示它),但是Eclipse是间preting您的文件可能的Cp1252。右键单击该文件,并检查Eclipse中使用的内容编码(一般,如果不修改,从容器通常默认为cp1252。Container是按照这个顺序的项目/工作区/全Eclipse的设置继承)。然而一些文件,诸如XML根据它们的内容处理(XML有示出所使用的编码的标头)。

Your string is UTF-8 (the symbol à denotes it) but Eclipse is interpreting your file as probably Cp1252. Right click on the file and check the content encoding Eclipse is using (generally, if not modified, inherited from container which usually defaults to Cp1252. Container is project/Workspace/whole Eclipse settings in that order). Some files however, such as XML are treated according to their content (XML has a header showing the encoding used).

更新

如果您检查该文件实际上是由Eclipse的PTED为UTF-8间$ P $那么这意味着一个双转换。当使用,CP-1252 一个具有二元code 0xC3和©有一个二进制code 0xA9。如果你偷看 UTF-8字符集表你会发现,é角色都有0xC3的两个字节编码0xA9。有时,当间preting数据一些转换是自动进行的(即输出Java字符串时,其他的,因为他们最初总是UTF-16)如果起迄编码是已知的。当编码之一是未知的(你的情况)和变压器(通常使用默认的系统编码)来决定的问题就出现了。这是当事情开始得到搞砸了。

If you check that the file is actually being interpreted as UTF-8 by Eclipse then this means a double conversion. When using Cp-1252 Ã has a binary code 0xC3 and © has a binary code 0xA9. If you peek the UTF-8 charset table you will discover that é character has a two byte encoding of 0xC3 0xA9. Sometimes when interpreting data some conversions are automatically made (i.e. when outputting java Strings to other since they originally are always UTF-16) if origin-destination encodings are known. The problem arises when one of the encodings is unknown (your case) and the transformer has to decide (normally using default system encoding). This is when things start getting messed up.

您可以用©最终在UTF-8,如果原始的源的确是UTF-8但作为PTED间的Cp1252 $ P $。原来0xC3 0xA9(é的Cp1252中或UTF-8é)序列(以UTF-8©)转换为0xC3 0X83(一间于UTF-8),并为0xC2 0xC9。

You may end up with é in UTF-8 if original source was indeed in UTF-8 but was interpreted as Cp1252. Original 0xC3 0xA9 (é in Cp1252 or é in UTF-8) sequence is translated to 0xC3 0x83 (à in UTF-8) and 0xC2 0xC9 (© in UTF-8).

如何,如果没有指定检测到原点编码?通常情况下,你不能。那是,如果你给他们回(从的Cp1252为UTF-8,再次为UTF-8为什么大多数UTF-8 EN codeRS使这双转换与previous输出,但除$ P $喂养pting输入时为的Cp1252),除非你使用的是一些标记在文档中告诉编码的EN codeR(如的 BOM ,这是不被支持Eclipse的方式)。

How can origin encoding being detected if not specified? Normally you can't. That's why most UTF-8 encoders make this double conversion if you feed them back (from Cp1252 to UTF-8 and again to UTF-8 when feeding with the previous output but interpreting the input as Cp1252), unless you are using some mark in the document to tell the encoder about the encoding (such as BOM, which is by the way not supported by Eclipse).

这篇关于UTF-8在Eclipse的人物 - 也许一个Windows拷贝'东经粘贴问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆