Scala 编译器是否适用于 UTF-8 编码的源文件? [英] Does the Scala compiler work with UTF-8 encoded source files?

查看:99
本文介绍了Scala 编译器是否适用于 UTF-8 编码的源文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一段非常简单的 Scala 代码

I have a very simple bit of Scala code

 var str = "≤"
 for( ch <- str ) { printf("%d, %x", ch.toInt, ch.toInt) ; println  }
 println
 str = "\u2264" ;
 for( ch <- str ) { printf("%d, %x", ch.toInt, ch.toInt) ; println }

如果您的浏览器没有正确显示,第一个字符串在双引号之间包含一个字符,即小于或等于符号 U+2264.

In case that doesn't show properly on your browser, the first string contains one character, between double-quotes, which is the less-or-equal-to sign U+2264.

程序输出

8218, 201a
226, e2
167, a7

8804, 2264

显然,第一个字符串在运行时是 3 个字符长,而不是源文件中的 1 个字符长.

Clearly the first string is 3 characters long at run time, not 1 character long as it is in the source file.

源文件以 UTF-8 格式存储.十六进制转储显示它已正确编码,第一个字符串是 22 E2 89 A4 22.我正在使用 Eclipse 和 Eclipse 的 Scala 插件.

The source file is stored in UTF-8. A hex dump shows that it is encoded properly, the first string being 22 E2 89 A4 22. I'm using Eclipse and the Scala plugin for Eclipse.

  • scala 编译器是否接受以 UTF-8 编码的输入文件?
  • 如果是这样,为什么我的程序会产生意想不到的结果?

推荐答案

回答我自己的问题:

Scala 编译器可以处理 UTF-8 编码的文件吗?

Does the scala compiler work with UTF-8 encoded files?

是的,但前提是它知道它们是 UTF-8 编码的.在没有任何其他证据的情况下,它使用 Java 的 file.encoding 属性.(感谢@AndreasNeumann 提供这部分答案.)

Yes, but only if it knows they are UTF-8 encoded. In the absence of any other evidence, it uses Java's file.encoding property. (Thanks to @AndreasNeumann for this part of the answer.)

为什么我的程序没有按预期运行?

Why did my program not behave as I expected?

因为我的 file.encoding 属性设置为 MacRoman.即使我告诉 eclipse 该文件是 UTF-8,该信息也没有传达给 Scala 编译器.因此,编译器根据 MacRoman 编码将 3 字节序列 E2 89 A4 解释为三个字符序列:较低的单引号(看起来很像逗号)、a"抑扬符和节符号.这 3 个字符序列的 unicode 是 U+201A U+00E2 U+00A7,它解释了我的程序的输出.

Because my file.encoding property was set to MacRoman. Even though I had told eclipse that the file is UTF-8, this information was not communicated to the Scala compiler. Thus the compiler interpreted the 3 byte sequence E2 89 A4 as a three character sequence according to the MacRoman encoding: a lower single quote (which looks a lot like a comma), an "a" circumflex, and a section symbol. The unicode for this 3 character sequence was U+201A U+00E2 U+00A7, which explains the output of my program.

你如何解决这个问题?

关于 命令scalac 的行使用选项 -encoding UTF-8.在 Eclipse 中,您可以使用 Scala 插件的首选项(选项)来添加此选项.(感谢@Jesper 提供这部分答案.)您还可以在 scalac 命令行上或通过 JAVA_OPTS-D 选项code> 环境变量来设置 file.encoding 属性.(详见@AndreasNeumann 的回答.)

On the command line for scalac use the option -encoding UTF-8. In eclipse you can use the preferences (options) for the Scala plugin to add this option. (Thanks to @Jesper for this part of the answer.) You can also use the -D option either on the scalac command line or via theJAVA_OPTS environment variable to set the file.encoding property. (See the answer of @AndreasNeumann for details.)

如果您使用 用于 Eclipse 的 Scala IDE,您至少可以做三件事.

If you use the Scala IDE for Eclipse, there are at least three things you can do.

  • 一种是在 Eclipse 的全局首选项(或选项)中的 General >> Workspace 下为所有工作区设置默认编码,如 Iulian Dragos 的回答所示.
  • 在项目属性中(在 Package Explorer 中右键单击项目并选择 Properties),在 Resource 首选项下,选择 UTF-8 作为 文本文件编码.
  • 最后,您可以在首选项(或选项)的编译器>> Scala 下的附加命令行参数 下添加-encoding UTF-8.您可以将其设置为全局首选项(或选项)或项目特定的属性设置.
  • One is to set the default encoding for all your workspaces under General >> Workspace in Eclipse's global preferences (or options), as shown in Iulian Dragos's answer.
  • In the project properties (right-click on the project in the Package Explorer an select Properties), under the Resource preferences, select UTF-8 as the Text file encoding.
  • Finally, you can add -encoding UTF-8 under additional command line parameters under Compiler >> Scala in the preferences (or options). You can set this as a global preference (or option) or as a project specific property setting.

这篇关于Scala 编译器是否适用于 UTF-8 编码的源文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆