在Go中读取带有BOM的文件 [英] Reading files with a BOM in Go
问题描述
我需要读取可能包含或不包含字节顺序标记的Unicode文件.我当然可以自己检查文件的前几个字节,如果找到一个BOM,则可以丢弃它.但是在我这样做之前,在核心库或第三方中是否有任何标准的方法可以做到这一点?
I need to read Unicode files that may or may not contain a byte-order mark. I could of course check the first few bytes of the file myself, and discard a BOM if I find one. But before I do, is there any standard way of doing this, either in the core libraries or a third party?
推荐答案
没有标准的方法,IIRC(并且标准库确实是实现这种检查的错误层),因此,这里有两个示例,说明了如何处理自己动手.
No standard way, IIRC (and the standard library would really be a wrong layer to implement such a check in) so here are two examples of how you could deal with it yourself.
一种方法是在数据流上方使用缓冲读取器:
One is to use a buffered reader above your data stream:
import (
"bufio"
"os"
"log"
)
func main() {
fd, err := os.Open("filename")
if err != nil {
log.Fatal(err)
}
defer closeOrDie(fd)
br := bufio.NewReader(fd)
r, _, err := br.ReadRune()
if err != nil {
log.Fatal(err)
}
if r != '\uFEFF' {
br.UnreadRune() // Not a BOM -- put the rune back
}
// Now work with br as you would do with fd
// ...
}
另一种方法可以与实现 io.Seeker
接口的对象一起使用,即读取前三个字节,如果不是BOM表,则读取 io.Seek()>回到开头,例如:
Another approach, which works with objects implementing the io.Seeker
interface, is to read the first three bytes and if they're not BOM, io.Seek()
back to the beginning, like in:
import (
"os"
"log"
)
func main() {
fd, err := os.Open("filename")
if err != nil {
log.Fatal(err)
}
defer closeOrDie(fd)
bom := [3]byte
_, err = io.ReadFull(fd, bom[:])
if err != nil {
log.Fatal(err)
}
if bom[0] != 0xef || bom[1] != 0xbb || bom[2] != 0xbf {
_, err = fd.Seek(0, 0) // Not a BOM -- seek back to the beginning
if err != nil {
log.Fatal(err)
}
}
// The next read operation on fd will read real data
// ...
}
这是可能的,因为 * os.File
的实例(什么 os.Open()
返回)支持查找并因此实现了 io.Seeker
.请注意,例如HTTP响应的 Body
读取器不是这种情况,因为您无法倒带"它. bufio.Buffer
通过执行一些缓冲(显然)来解决不可搜索流的此功能.这就是允许您在其上 UnreadRune()
的原因.
This is possible since instances of *os.File
(what os.Open()
returns) support seeking and hence implement io.Seeker
. Note that that's not the case for, say, Body
reader of HTTP responses since you can't "rewind" it. bufio.Buffer
works around this feature of non-seekable streams by performing some buffering (obviously) — that's what allows you yo UnreadRune()
on it.
请注意,两个示例均假定我们正在处理的文件是使用UTF-8编码的.如果您需要处理其他(或未知)编码,事情会变得更加复杂.
Note that both examples assume the file we're dealing with is encoded in UTF-8. If you need to deal with other (or unknown) encoding, things get more complicated.
这篇关于在Go中读取带有BOM的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!