在Go中解组ISO-8859-1 XML输入 [英] Unmarshal an ISO-8859-1 XML input in Go
问题描述
当您的XML输入未以UTF-8编码时,xml包的 Unmarshal
函数似乎需要 CharsetReader $
使用 go-charset 扩展@ anschel-schaffer-cohen建议和@ mjibson的评论
, / a>包可以使用这三行
decoder:= xml.NewDecoder(reader)
decoder.CharsetReader = charset.NewReader
err = decoder.Decode(& parsed)
实现所需的结果。只需记住通过调用
让 charset
知道它的数据文件在哪里 charset.CharsetDir =... / src / code.google.com / p / go-charset / datafiles
$ b
>而不是上面的,
charset.CharsetDir =
等,它是更明智的只是导入数据文件。它们被视为嵌入资源: import(
code.google.com/p/go-charset / charset
_code.google.com/p/go-charset/data
...
)
go install
只会做它的事情,这也避免了部署头痛(其中/如何获取数据文件相对于执行的应用程序?)。
使用import带下划线只是调用包的 init()
func which将所需的内容加载到内存中。
When your XML input isn't encoded in UTF-8, the Unmarshal
function of the xml package seems to require a CharsetReader
.
Where do you find such a thing ?
Expanding on @anschel-schaffer-cohen suggestion and @mjibson's comment, using the go-charset package as mentioned above allows you to use these three lines
decoder := xml.NewDecoder(reader)
decoder.CharsetReader = charset.NewReader
err = decoder.Decode(&parsed)
to achieve the required result. just remember to let charset
know where its data files are by calling
charset.CharsetDir = ".../src/code.google.com/p/go-charset/datafiles"
at some point when the app starts up.
EDIT
Instead of the above, charset.CharsetDir =
etc. it's more sensible to just import the data files. they are treated as an embedded resource:
import (
"code.google.com/p/go-charset/charset"
_ "code.google.com/p/go-charset/data"
...
)
go install
will just do its thing, this also avoids the deployment headache (where/how do I get data files relative to the executing app?).
using import with an underscore just calls the package's init()
func which loads the required stuff into memory.
这篇关于在Go中解组ISO-8859-1 XML输入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!