在Go中读取带有BOM的文件 [英] Reading files with a BOM in Go

查看:94
本文介绍了在Go中读取带有BOM的文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要读取可能包含或不包含字节顺序标记的Unicode文件.我当然可以自己检查文件的前几个字节,如果找到一个BOM,则可以丢弃它.但是在我这样做之前,在核心库或第三方中是否有任何标准的方法可以做到这一点?

I need to read Unicode files that may or may not contain a byte-order mark. I could of course check the first few bytes of the file myself, and discard a BOM if I find one. But before I do, is there any standard way of doing this, either in the core libraries or a third party?

推荐答案

没有标准的方法,IIRC(并且标准库确实是实现这种检查的错误层),因此,这里有两个示例,说明了如何处理自己动手.

No standard way, IIRC (and the standard library would really be a wrong layer to implement such a check in) so here are two examples of how you could deal with it yourself.

一种方法是在数据流上方使用缓冲读取器:

One is to use a buffered reader above your data stream:

import (
    "bufio"
    "os"
    "log"
)

func main() {
    fd, err := os.Open("filename")
    if err != nil {
        log.Fatal(err)
    }
    defer closeOrDie(fd)
    br := bufio.NewReader(fd)
    r, _, err := br.ReadRune()
    if err != nil {
        log.Fatal(err)
    }
    if r != '\uFEFF' {
        br.UnreadRune() // Not a BOM -- put the rune back
    }
    // Now work with br as you would do with fd
    // ...
}

另一种方法可以与实现 io.Seeker 接口的对象一起使用,即读取前三个字节,如果不是BOM表,则读取 io.Seek()>回到开头,例如:

Another approach, which works with objects implementing the io.Seeker interface, is to read the first three bytes and if they're not BOM, io.Seek() back to the beginning, like in:

import (
    "os"
    "log"
)

func main() {
    fd, err := os.Open("filename")
    if err != nil {
        log.Fatal(err)
    }
    defer closeOrDie(fd)
    bom := [3]byte
    _, err = io.ReadFull(fd, bom[:])
    if err != nil {
        log.Fatal(err)
    }
    if bom[0] != 0xef || bom[1] != 0xbb || bom[2] != 0xbf {
        _, err = fd.Seek(0, 0) // Not a BOM -- seek back to the beginning
        if err != nil {
            log.Fatal(err)
        }
    }
    // The next read operation on fd will read real data
    // ...
}

这是可能的,因为 * os.File 的实例(什么 os.Open()返回)支持查找并因此实现了 io.Seeker .请注意,例如HTTP响应的 Body 读取器不是这种情况,因为您无法倒带"它. bufio.Buffer 通过执行一些缓冲(显然)来解决不可搜索流的此功能.这就是允许您在其上 UnreadRune()的原因.

This is possible since instances of *os.File (what os.Open() returns) support seeking and hence implement io.Seeker. Note that that's not the case for, say, Body reader of HTTP responses since you can't "rewind" it. bufio.Buffer works around this feature of non-seekable streams by performing some buffering (obviously) — that's what allows you yo UnreadRune() on it.

请注意,两个示例均假定我们正在处理的文件是使用UTF-8编码的.如果您需要处理其他(或未知)编码,事情会变得更加复杂.

Note that both examples assume the file we're dealing with is encoded in UTF-8. If you need to deal with other (or unknown) encoding, things get more complicated.

这篇关于在Go中读取带有BOM的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆