如何使原始unicode编码的内容可读? [英] How do I make raw unicode encoded content readable?

查看:165
本文介绍了如何使原始unicode编码的内容可读?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 net / http 请求一个Web API,服务器返回一个JSON响应。当我打印响应正文时,它显示为原始ASCII内容。我尝试使用 bufio.ScanRunes 来解析内容但失败。



我也尝试写一个简单的服务器并返回一个unicode字符串,它运行良好。



这是核心代码:

  func(c ClientInfo)请求(方法字符串,url string,form url.Values)string {
req,_:= http.NewRequest(method,url,strings.NewReader(c.Encode(form) ))
req.Header = c.Header
req.AddCookie(& c.Cookie)
resp,err:= http.DefaultClient.Do(req)
defer resp .Body.Close()
如果err!= nil {
fmt.Println(err)
}

scanner:= bufio.NewScanner(resp.Body)
scanner.Split(bufio.ScanRunes)

var buf bytes.Buffer
for scanner.Scan(){
buf.WriteString(scanner.Text())
}
rv:= buf.String()
fmt.Println(rv)
return rv
}
/ pre>

以下是示例输出:




{论坛:{id:3251718,name:\\\合\\\肥\\\工\\\业\\\大\\\学\\\宣\\ \城\\\校\\\区 first_class: \\\高\\\等\\\院\\\校, second_class: \\\安\\\徽\\\院\\\校, is_like : 0\" , user_level: 1, level_id: 1, level_name: \\\素\\\未\\\谋\\\面, cur_score: 0, levelup_score : 5\" , member_num: 80329, is_exists: 1, thread_num: 108762, post_num: 3445881, good_classify:[{ 类标识码: 0\" , CLASS_NAME: \\\全\\\部},{ 类标识码: 1, CLASS_NAME: \\\公\\\告\\\类},{ 类标识码:2 , CLASS_NAME: \\\吧\\\友\\\专\\\区},{ 类标识码: 4, CLASS_NAME: \\\活\\\动\\\专\\\区} ,{ 类标识码: 6, CLASS_NAME: \\\社\\\团\\\班\\\级},{ 类标识码: 5, CLASS_NAME:\\\资\\\源 \\\共\\\享 },{ 类标识码 : 8\" , CLASS_NAME:\\\温 \馨\\\生\\\活\\\类 },{ 类标识码 : 7\" , CLASS_NAME: \\\咨\\\询\\\新\\\闻\\\类},{ 类标识码 : 3, CLASS_NAME: \\\风\\\采\\\展\\\示\\\区}], 经理:[{ ID: 793092593, 名称:yi\\ \\ u62b9\\\明\\\媚\\\的\\\忧\\\伤},



...



解决方案

这只是逃避任何Unicode字符的标准方式。



解散它查看无引号文本( json 包将取消引用它):

  func main(){
var i interface {}
err:= json.Unmarshal([] byte(src),& i)
fmt.Println(err,i)
}

const src =`{论坛:{ ID : 3251718\" , 名: \\\合\\\肥\\\工\\\业\\\大\\\学\\\宣\\\城\\\校\\\区 first_class: \\\高\\\等\\\院\\\校, second_class: \\\安\\\徽\\\院\\\校, is_like: 0, user_level: 1, level_id: 1, level_name: \\\素\\\未\\\谋\\\面, cur_score: 0, levelup_score: 5, member_num : 80329, is_exists: 1, thread_num: 108762, post_num: 3445881, good_classify:[{ 类标识码: 0, CLASS_NAME:\ u5168\\\部 },{ 类标识码 : 1\" , CLASS_NAME: \\\公\\\告\\\类},{ 类标识码: 2, CLASS_NAME:\\\吧 \\\友\\\专\\\区 },{ 类标识码 : 4\" , CLASS_NAME: \\\活\\\动\\\专\\\区},{ 类标识码: 6, CLASS_NAME: \\\社\\\团\\\班\\\级},{ 类标识码: 5, CLASS_NAME: \\\资\\\源\\\共\\\享},{ 类标识码: 8, CLASS_NAME: \\\温\\\馨\\\生\\\活\\\类},{ 类标识码: 7, CLASS_NAME:\\\咨\ u8be2\\\新\\\闻\\\类 },{ 类标识码 : 3\" , CLASS_NAME:\\ }}

输出(修剪)(尝试在 Go Playground ):



< nil> map [mapup_score:5 is_exists:1 post_num:3445881 good_classify:[map [class_id:0 class_name:全部] map [class_id:1 class_name:公告类] map [class_id:2 class_name:吧友专区] map [ map_id:4 class_name:活动专区] map [class_id:6 class_name:社团班级] map [class_id:5 class_name:资源共享] map [class_id:8 class_name:温馨生活类] map [class_name:咨询新闻类class_id:7]地图[class_id:3 class_name:风采展示区]] id:3251718 is_like:0 cur_score:0



要取消引用片段,您可以使用 strconv.Unquote()

  fmt.Println(strconv.Unquote(`\\\素\\\未\\ \\ u8c0b`))

输出(尝试在 Go Playground ):

 素未谋< nil> 

请注意, strconv.Unquote() expects一个字符串是引号,这就是为什么我使用一个原始的字符串文字,所以我可以添加引号,也让编译器本身不会解释/ unquote Unicode转义。



请参阅相关问题:如何转换HTML标签中的转义字符


I used net/http request a web API and the server returned a JSON response. When I print the response body, it displayed as raw ASCII content. I tried using bufio.ScanRunes to parse the content but failed.

I also tried write a simple server and return a unicode string and it worked well.

Here is the core code:

func (c ClientInfo) Request(method string, url string, form url.Values) string {
    req, _ := http.NewRequest(method, url, strings.NewReader(c.Encode(form)))
    req.Header = c.Header
    req.AddCookie(&c.Cookie)
    resp, err := http.DefaultClient.Do(req)
    defer resp.Body.Close()
    if err != nil {
        fmt.Println(err)
    }

    scanner := bufio.NewScanner(resp.Body)
    scanner.Split(bufio.ScanRunes)

    var buf bytes.Buffer
    for scanner.Scan() {
        buf.WriteString(scanner.Text())
    }
    rv := buf.String()
    fmt.Println(rv)
    return rv
}

Here is the example output:

{"forum":{"id":"3251718","name":"\u5408\u80a5\u5de5\u4e1a\u5927\u5b66\u5ba3\u57ce\u6821\u533a","first_class":"\u9ad8\u7b49\u9662\u6821","second_class":"\u5b89\u5fbd\u9662\u6821","is_like":"0","user_level":"1","level_id":"1","level_name":"\u7d20\u672a\u8c0b\u9762","cur_score":"0","levelup_score":"5","member_num":"80329","is_exists":"1","thread_num":"108762","post_num":"3445881","good_classify":[{"class_id":"0","class_name":"\u5168\u90e8"},{"class_id":"1","class_name":"\u516c\u544a\u7c7b"},{"class_id":"2","class_name":"\u5427\u53cb\u4e13\u533a"},{"class_id":"4","class_name":"\u6d3b\u52a8\u4e13\u533a"},{"class_id":"6","class_name":"\u793e\u56e2\u73ed\u7ea7"},{"class_id":"5","class_name":"\u8d44\u6e90\u5171\u4eab"},{"class_id":"8","class_name":"\u6e29\u99a8\u751f\u6d3b\u7c7b"},{"class_id":"7","class_name":"\u54a8\u8be2\u65b0\u95fb\u7c7b"},{"class_id":"3","class_name":"\u98ce\u91c7\u5c55\u793a\u533a"}],"managers":[{"id":"793092593","name":"yi\u62b9\u660e\u5a9a\u7684\u5fe7\u4f24"},

...

解决方案

That is just the standard way to escape any Unicode character.

Unmarshal it to see the unquoted text (the json package will unquote it):

func main() {
    var i interface{}
    err := json.Unmarshal([]byte(src), &i)
    fmt.Println(err, i)
}

const src = `{"forum":{"id":"3251718","name":"\u5408\u80a5\u5de5\u4e1a\u5927\u5b66\u5ba3\u57ce\u6821\u533a","first_class":"\u9ad8\u7b49\u9662\u6821","second_class":"\u5b89\u5fbd\u9662\u6821","is_like":"0","user_level":"1","level_id":"1","level_name":"\u7d20\u672a\u8c0b\u9762","cur_score":"0","levelup_score":"5","member_num":"80329","is_exists":"1","thread_num":"108762","post_num":"3445881","good_classify":[{"class_id":"0","class_name":"\u5168\u90e8"},{"class_id":"1","class_name":"\u516c\u544a\u7c7b"},{"class_id":"2","class_name":"\u5427\u53cb\u4e13\u533a"},{"class_id":"4","class_name":"\u6d3b\u52a8\u4e13\u533a"},{"class_id":"6","class_name":"\u793e\u56e2\u73ed\u7ea7"},{"class_id":"5","class_name":"\u8d44\u6e90\u5171\u4eab"},{"class_id":"8","class_name":"\u6e29\u99a8\u751f\u6d3b\u7c7b"},{"class_id":"7","class_name":"\u54a8\u8be2\u65b0\u95fb\u7c7b"},{"class_id":"3","class_name":"\u98ce\u91c7\u5c55\u793a\u533a"}]}}`

Output (trimmed) (try it on the Go Playground):

<nil> map[forum:map[levelup_score:5 is_exists:1 post_num:3445881 good_classify:[map[class_id:0 class_name:全部] map[class_id:1 class_name:公告类] map[class_id:2 class_name:吧友专区] map[class_id:4 class_name:活动专区] map[class_id:6 class_name:社团班级] map[class_id:5 class_name:资源共享] map[class_id:8 class_name:温馨生活类] map[class_name:咨询新闻类 class_id:7] map[class_id:3 class_name:风采展示区]] id:3251718 is_like:0 cur_score:0

If you just want to unquote a fragment, you may use strconv.Unquote():

fmt.Println(strconv.Unquote(`"\u7d20\u672a\u8c0b"`))

Output (try it on the Go Playground):

素未谋 <nil>

Note that strconv.Unquote() expects a string that is in quotes, that's why I used a raw string literal, so I could add quotes, and also so that the compiler itself will not interpret / unquote the Unicode escapes.

See related question: How to convert escape characters in HTML tags?

这篇关于如何使原始unicode编码的内容可读?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆