Scala 编译器是否适用于 UTF-8 编码的源文件? [英] Does the Scala compiler work with UTF-8 encoded source files?
问题描述
我有一段非常简单的 Scala 代码
I have a very simple bit of Scala code
var str = "≤"
for( ch <- str ) { printf("%d, %x", ch.toInt, ch.toInt) ; println }
println
str = "\u2264" ;
for( ch <- str ) { printf("%d, %x", ch.toInt, ch.toInt) ; println }
如果您的浏览器没有正确显示,第一个字符串在双引号之间包含一个字符,即小于或等于符号 U+2264.
In case that doesn't show properly on your browser, the first string contains one character, between double-quotes, which is the less-or-equal-to sign U+2264.
程序输出
8218, 201a
226, e2
167, a7
8804, 2264
显然,第一个字符串在运行时是 3 个字符长,而不是源文件中的 1 个字符长.
Clearly the first string is 3 characters long at run time, not 1 character long as it is in the source file.
源文件以 UTF-8 格式存储.十六进制转储显示它已正确编码,第一个字符串是 22 E2 89 A4 22.我正在使用 Eclipse 和 Eclipse 的 Scala 插件.
The source file is stored in UTF-8. A hex dump shows that it is encoded properly, the first string being 22 E2 89 A4 22. I'm using Eclipse and the Scala plugin for Eclipse.
- scala 编译器是否接受以 UTF-8 编码的输入文件?
- 如果是这样,为什么我的程序会产生意想不到的结果?
推荐答案
回答我自己的问题:
Scala 编译器可以处理 UTF-8 编码的文件吗?
Does the scala compiler work with UTF-8 encoded files?
是的,但前提是它知道它们是 UTF-8 编码的.在没有任何其他证据的情况下,它使用 Java 的 file.encoding
属性.(感谢@AndreasNeumann 提供这部分答案.)
Yes, but only if it knows they are UTF-8 encoded. In the absence of any other evidence, it uses Java's file.encoding
property. (Thanks to @AndreasNeumann for this part of the answer.)
为什么我的程序没有按预期运行?
Why did my program not behave as I expected?
因为我的 file.encoding
属性设置为 MacRoman
.即使我告诉 eclipse 该文件是 UTF-8,该信息也没有传达给 Scala 编译器.因此,编译器根据 MacRoman
编码将 3 字节序列 E2 89 A4 解释为三个字符序列:较低的单引号(看起来很像逗号)、a"抑扬符和节符号.这 3 个字符序列的 unicode 是 U+201A U+00E2 U+00A7,它解释了我的程序的输出.
Because my file.encoding
property was set to MacRoman
. Even though I had told eclipse that the file is UTF-8, this information was not communicated to the Scala compiler. Thus the compiler interpreted the 3 byte sequence E2 89 A4 as a three character sequence according to the MacRoman
encoding: a lower single quote (which looks a lot like a comma), an "a" circumflex, and a section symbol. The unicode for this 3 character sequence was U+201A U+00E2 U+00A7, which explains the output of my program.
你如何解决这个问题?
关于 命令scalac
的行使用选项 -encoding UTF-8
.在 Eclipse 中,您可以使用 Scala 插件的首选项(选项)来添加此选项.(感谢@Jesper 提供这部分答案.)您还可以在 scalac
命令行上或通过 JAVA_OPTS使用
-D
选项code> 环境变量来设置 file.encoding
属性.(详见@AndreasNeumann 的回答.)
On the command line for scalac
use the option -encoding UTF-8
. In eclipse you can use the preferences (options) for the Scala plugin to add this option. (Thanks to @Jesper for this part of the answer.) You can also use the -D
option either on the scalac
command line or via theJAVA_OPTS
environment variable to set the file.encoding
property. (See the answer of @AndreasNeumann for details.)
如果您使用 用于 Eclipse 的 Scala IDE,您至少可以做三件事.
If you use the Scala IDE for Eclipse, there are at least three things you can do.
- 一种是在 Eclipse 的全局首选项(或选项)中的 General >> Workspace 下为所有工作区设置默认编码,如 Iulian Dragos 的回答所示.
- 在项目属性中(在 Package Explorer 中右键单击项目并选择
Properties
),在Resource
首选项下,选择 UTF-8 作为文本文件编码
. - 最后,您可以在首选项(或选项)的编译器>> Scala 下的
附加命令行参数
下添加-encoding UTF-8
.您可以将其设置为全局首选项(或选项)或项目特定的属性设置.
- One is to set the default encoding for all your workspaces under General >> Workspace in Eclipse's global preferences (or options), as shown in Iulian Dragos's answer.
- In the project properties (right-click on the project in the Package Explorer an select
Properties
), under theResource
preferences, select UTF-8 as theText file encoding
. - Finally, you can add
-encoding UTF-8
underadditional command line parameters
under Compiler >> Scala in the preferences (or options). You can set this as a global preference (or option) or as a project specific property setting.
这篇关于Scala 编译器是否适用于 UTF-8 编码的源文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!