如何检测字节何时无法转换为字符串? [英] How to detect when bytes can't be converted to string in Go?

查看:147
本文介绍了如何检测字节何时无法转换为字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有不能转换为Unicode字符串的无效字节序列 。在Go中将 []字节转换为 string 时,如何检测?

There are invalid byte sequences that can't be converted to Unicode strings. How do I detect that when converting []byte to string in Go?

推荐答案

正如Tim Cooper所说,你可以用 utf8.Valid

You can, as Tim Cooper noted, test UTF-8 validity with utf8.Valid.

但是!您可能会认为将非UTF-8字节转换为Go 字符串是不可能的。实际上,在Go中,一个字符串实际上是一个只读字节的片段;它可以包含无效UTF-8的字节,您可以打印,通过索引访问,甚至往返回到 []字节(至,说)。

But! You might be thinking that converting non-UTF-8 bytes to a Go string is impossible. In fact, "In Go, a string is in effect a read-only slice of bytes"; it can contain bytes that aren't valid UTF-8 which you can print, access via indexing, or even round-trip back to a []byte (to Write, say).

Go有两个地方做了UTF-8解码的 string s

There are two places in the language that Go does do UTF-8 decoding of strings for you.


  • 对于我,,r:= range s r 是一个Unicode代码点,值为$ code> rune

  • 转换 [] rune(s),Go将整个字符串解码为符文

  • when you do for i, r := range s the r is a Unicode code point as a value of type rune
  • when you do the conversion []rune(s), Go decodes the whole string to runes

在这两种情况下,无效的UTF-8被替换为 U + FFFD 替换字符为这样的用途保留。更多内容请参阅 for 语句和 string 和其他类型之间的转换这些转换永远不会崩溃,所以您只需要与您的应用程序相关的UTF-8有效性进行检查,就像是要对错误编码的输入发出错误。

In both these instances invalid UTF-8 is replaced with U+FFFD, the replacement character reserved for uses like this. More is in the spec sections on for statements and conversions between strings and other types. These conversions never crash, so you only need to actively check for UTF-8 validity if it's relevant to your application, like if you want to throw an error on mis-encoded input.

由于这种行为被烘烤成语言,您也可以从图书馆预期。 U + FFFD utf8.ErrorRune ,并由 utf8 中的函数返回。

Since that behavior's baked into the language, you can expect it from libraries, too. U+FFFD is utf8.ErrorRune and returned by functions in utf8.

一个示例程序,显示了使用 []字节持有无效UTF-8的Go功能:

Here's a sample program showing what Go does with a []byte holding invalid UTF-8:

package main

import "fmt"

func main() {
    a := []byte{0xff}
    s := string(a)
    fmt.Println(s)
    for _, r := range s {
        fmt.Println(r)
    }
    rs := []rune(s)
    fmt.Println(rs)
}

输出在不同的环境中看起来会有所不同,但在游乐场看起来就像

Output will look different in different environments, but in the Playground it looks like

�
65533
[65533]

这篇关于如何检测字节何时无法转换为字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆