如何检测字节何时无法转换为字符串? [英] How to detect when bytes can't be converted to string in Go?
问题描述
有不能转换为Unicode字符串的无效字节序列 。在Go中将 []字节
转换为 string
时,如何检测?
There are invalid byte sequences that can't be converted to Unicode strings. How do I detect that when converting []byte
to string
in Go?
推荐答案
正如Tim Cooper所说,你可以用 utf8.Valid
。
You can, as Tim Cooper noted, test UTF-8 validity with utf8.Valid
.
但是!您可能会认为将非UTF-8字节转换为Go 字符串
是不可能的。实际上,在Go中,一个字符串实际上是一个只读字节的片段;它可以包含无效UTF-8的字节,您可以打印,通过索引访问,甚至往返回到 []字节
(至写
,说)。
But! You might be thinking that converting non-UTF-8 bytes to a Go string
is impossible. In fact, "In Go, a string is in effect a read-only slice of bytes"; it can contain bytes that aren't valid UTF-8 which you can print, access via indexing, or even round-trip back to a []byte
(to Write
, say).
Go有两个地方做了UTF-8解码的 string
s
There are two places in the language that Go does do UTF-8 decoding of string
s for you.
- 对于我,
,r:= range s
r
是一个Unicode代码点,值为$ code> rune - 转换
[] rune(s)
,Go将整个字符串解码为符文
- when you do
for i, r := range s
ther
is a Unicode code point as a value of typerune
- when you do the conversion
[]rune(s)
, Go decodes the whole string to runes
在这两种情况下,无效的UTF-8被替换为 U + FFFD
,替换字符为这样的用途保留。更多内容请参阅 for
语句和 string
和其他类型之间的转换。 这些转换永远不会崩溃,所以您只需要与您的应用程序相关的UTF-8有效性进行检查,就像是要对错误编码的输入发出错误。
In both these instances invalid UTF-8 is replaced with U+FFFD
, the replacement character reserved for uses like this. More is in the spec sections on for
statements and conversions between string
s and other types. These conversions never crash, so you only need to actively check for UTF-8 validity if it's relevant to your application, like if you want to throw an error on mis-encoded input.
由于这种行为被烘烤成语言,您也可以从图书馆预期。 U + FFFD
是 utf8.ErrorRune
,并由 utf8
中的函数返回。
Since that behavior's baked into the language, you can expect it from libraries, too. U+FFFD
is utf8.ErrorRune
and returned by functions in utf8
.
一个示例程序,显示了使用 []字节
持有无效UTF-8的Go功能:
Here's a sample program showing what Go does with a []byte
holding invalid UTF-8:
package main
import "fmt"
func main() {
a := []byte{0xff}
s := string(a)
fmt.Println(s)
for _, r := range s {
fmt.Println(r)
}
rs := []rune(s)
fmt.Println(rs)
}
输出在不同的环境中看起来会有所不同,但在游乐场看起来就像
Output will look different in different environments, but in the Playground it looks like
�
65533
[65533]
这篇关于如何检测字节何时无法转换为字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!