去编码转换问题 [英] Go encoding transform issue

查看:376
本文介绍了去编码转换问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下代码:

  import(
log
net / http
code.google.com/p/go.text/transform
code.google.com/p/go.text/encoding/charmap

  ... 

res,err:= http.Get(url)
如果err!= nil {
log.Println(Can not read,url);
log.Println(err);
继续
}
defer res.Body.Close()

我加载的页面包含非UTF-8符号。所以我尝试使用变换

  utfBody:= transform.NewReader res.Body,charmap.Windows1251.NewDecoder())

但问题是,它返回错误即使在这个简单的场景中:

  bytes,err:= ioutil.ReadAll(utfBody)
log.Println )
如果err == nil {
log.Println(bytes)
}

transform:short destination buffer



它还实际设置 / code>与一些数据,但在我的实际代码中,我使用 goquery

  doc,err:= goquery.NewDocumentFromReader(utfBody)

哪个看到错误并且没有数据没有返回



我试图将 res.Body 的chunks传递给 transform.NewReader ,并表明,只要res.Body不包含非UTF8数据,它的工作效果很好。当它包含非UTF8字节时,会失败并出现上面的错误。



我很新,不明白发生了什么,如何处理这个

解决方案

没有整个代码和一个示例URL,很难说出这里究竟是怎么回事。 >

可以这样说,我可以推荐 golang.org/x/net/html/charset 包,因为它支持 char guessing 并转换为UTF 8。

  func fetchUtf8Bytes(url string)([] byte,error){
res,err:= http.Get (url)
如果err!= nil {
return nil,err
}

contentType:= res.Header.Get(Content-Type)/ /可选,更好的猜测
utf8reader,err:= charset.NewReader(res.Body,contentType)
如果err!= nil {
return nil,err
}

返回ioutil.ReadAll(utf8reader)
}

完整示例: http://play.golang.org/p/olcBM9ughv


I have a following code in go:

import (
    "log"
    "net/http"
    "code.google.com/p/go.text/transform"
    "code.google.com/p/go.text/encoding/charmap"

)

...

res, err := http.Get(url)
if err != nil {
    log.Println("Cannot read", url);
    log.Println(err);
    continue
}
defer res.Body.Close()

The page I load contain non UTF-8 symbols. So I try to use transform

utfBody := transform.NewReader(res.Body, charmap.Windows1251.NewDecoder())

But the problem is, that it returns error even in this simple scenarion:

bytes, err := ioutil.ReadAll(utfBody)
log.Println(err)
if err == nil {
    log.Println(bytes)
}

transform: short destination buffer

It also actually sets bytes with some data, but in my real code I use goquery:

doc, err := goquery.NewDocumentFromReader(utfBody)

Which sees an error and fails with not data in return

I tried to pass "chunks" of res.Body to transform.NewReader and figuried out, that as long as res.Body contains no non-UTF8 data it works well. And when it contains non-UTF8 byte it fails with an error above.

I'm quite new to go and don't really understand what's going on and how to deal with this

解决方案

Without the whole code along with an example URL it's hard to tell what exactly is going wrong here.

That said, I can recommend the golang.org/x/net/html/charset package for this as it supports both char guessing and converting to UTF 8.

func fetchUtf8Bytes(url string) ([]byte, error) {
    res, err := http.Get(url)
    if err != nil {
        return nil, err
    }

    contentType := res.Header.Get("Content-Type") // Optional, better guessing
    utf8reader, err := charset.NewReader(res.Body, contentType)
    if err != nil {
        return nil, err
    }

    return ioutil.ReadAll(utf8reader)
}

Complete example: http://play.golang.org/p/olcBM9ughv

这篇关于去编码转换问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆