如何从数据源读取unicode字符 [英] How to read unicode characters from data source

查看：103 发布时间：2021/5/11 20:07:34 go

本文介绍了如何从数据源读取unicode字符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

以下代码能够读取数据源(遵循所有读取规则)，并具有文本(具有1字节大小的UTF-8编码):

Below code is able to read data source(following all reading rules), having text(with UTF-8 encodings of size one byte):

package main

import (
    "fmt"
    "io"
)

type MyStringData struct {
    str       string
    readIndex int
}

func (myStringData *MyStringData) Read(p []byte) (n int, err error) {

    // convert `str` string to slice of bytes
    strBytes := []byte(myStringData.str)

    // if `readIndex` is GTE source length, return `EOF` error
    if myStringData.readIndex >= len(strBytes) {
        return 0, io.EOF // `0` bytes read
    }

    // get next readable limit (exclusive)
    nextReadLimit := myStringData.readIndex + len(p)

    if nextReadLimit >= len(strBytes) {
        nextReadLimit = len(strBytes)
        err = io.EOF
    }

    // get next bytes to copy and set `n` to its length
    nextBytes := strBytes[myStringData.readIndex:nextReadLimit]
    n = len(nextBytes)

    // copy all bytes of `nextBytes` into `p` slice
    copy(p, nextBytes)

    // increment `readIndex` to `nextReadLimit`
    myStringData.readIndex = nextReadLimit

    // return values
    return
}

func main() {

    // create data source
    src := MyStringData{str: "Hello Amazing World!"} // 学中文

    p := make([]byte, 3) // slice of length `3`

    // read `src` until an error is returned
    for {
        // read `p` bytes from `src`
        n, err := src.Read(p)
        fmt.Printf("%d bytes read, data:%s\n", n, p[:n])

        // handle error
        if err == io.EOF {
            fmt.Println("--end-of-file--")
            break
        } else if err != nil {
            fmt.Println("Oops! some error occured!", err)
            break
        }
    }
}

输出:

Output:

$
$
$ go run src/../Main.go
3 bytes read, data:Hel
3 bytes read, data:lo 
3 bytes read, data:Ama
3 bytes read, data:zin
3 bytes read, data:g W
3 bytes read, data:orl
2 bytes read, data:d!
--end-of-file--
$
$

但是上面的代码无法读取具有文本的数据源(具有大于1个字节的UTF-8编码)，如下所示:

But the above code is unable to read data source having text(with UTF-8 encodings of size greater than one byte) as shown below:

  src := MyStringData{str: "Hello Amazing World!学中文"}

下面是输出:

$
$
$ go run src/../Main.go
3 bytes read, data:Hel
3 bytes read, data:lo 
3 bytes read, data:Ama
3 bytes read, data:zin
3 bytes read, data:g W
3 bytes read, data:orl
3 bytes read, data:d!�
3 bytes read, data:���
3 bytes read, data:���
2 bytes read, data:��
--end-of-file--
$
$

在给出有关使用 strings.NewReader()的注释后，下面是修改后的代码:

With the comments given on usage of strings.NewReader(), below is the code modified:

// create data source
src := strings.NewReader("Hello Amazing World!学中文") // 学中文

// p := make([]byte, 3) // slice of length `3`

// read `src` until an error is returned
for {
    // read `p` bytes from `src`
    ch, n, err := src.ReadRune()
    // n, err := src.Read(p)
    fmt.Printf("%d bytes read, data:%c\n", n, ch)

    // handle error
    if err == io.EOF {
        fmt.Println("--end-of-file--")
        break
    } else if err != nil {
        fmt.Println("Oops! some error occured!", err)
        break
    }
}

如何读取unicode字符而不将字符(例如学)拆分为两个 Read 调用?

如何从数据源读取unicode字符 [英] How to read unicode characters from data source

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何从数据源读取unicode字符 [英] How to read unicode characters from data source

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭