如何确保字符串是UTF-8? [英] How to ensure that Strings are in UTF-8?

查看:718
本文介绍了如何确保字符串是UTF-8?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在Scala中将此字符串调查规则转换为 UTF-8

How to convert this String the surveyÂ’s rules to UTF-8 in Scala?

我试过这些道路但不起作用:

I tried these roads but does not work:

scala> val text = "the surveyÂ’s rules"
text: String = the surveyÂ’s rules

scala> scala.io.Source.fromBytes(text.getBytes(), "UTF-8").mkString
res17: String = the surveyÂ’s rules

scala> new String(text.getBytes(),"UTF8")
res21: String = the surveyÂ’s rules

好的,我已经以这种方式解决了。不是转换而是简单读取

Ok, i'm resolved in this way. Not a converting but a simple reading

implicit val codec = Codec("US-ASCII").onMalformedInput(CodingErrorAction.IGNORE).onUnmappableCharacter(CodingErrorAction.IGNORE)

val src = Source.fromFile(new File (folderDestination + name + ".csv"))
val src2 = Source.fromFile(new File (folderDestination + name + ".csv"))

val reader = CSVReader.open(src.reader())


推荐答案

请注意,当你在没有参数的情况下调用 text.getBytes()时,你实际上是获取表示平台默认编码中字符串的字节数组。例如,在Windows上,它可能是一些单字节编码;在Linux上它已经是UTF-8了。

Note that when you call text.getBytes() without arguments, you're in fact getting an array of bytes representing the string in your platform's default encoding. On Windows, for example, it could be some single-byte encoding; on Linux it can be UTF-8 already.

为了正确你需要在中指定精确的编码getBytes()方法调用。对于Java 7及更高版本,请执行以下操作:

To be correct you need to specify exact encoding in getBytes() method call. For Java 7 and later do this:

import java.nio.charset.StandardCharsets

val bytes = text.getBytes(StandardCharsets.UTF_8)

对于Java 6,请执行以下操作:

For Java 6 do this:

import java.nio.charset.Charset

val bytes = text.getBytes(Charset.forName("UTF-8"))

然后 bytes 将包含UTF- 8个编码的文本。

Then bytes will contain UTF-8-encoded text.

这篇关于如何确保字符串是UTF-8?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆