当使用邮件包时,“缺少词组:不支持字符集” [英] "missing word in phrase: charset not supported", when using the mail package
问题描述
我正在尝试解析邮件,并使用邮件包获得这种错误。邮件包中的错误或我应该处理的东西?
I'm trying to parse emails and I get this kind of errors using the mail package. Is it a bug on the mail package or something I should handle myself ?
短语中缺少字词:不支持字符集:gb18030
不支持字符集:koi8-r
短语中缺少单词:不支持字符集:ks_c_5601-1987
如何解决?我想我应该使用字符集,但我不知道如何
。以下是电子邮件标题如何显示为
How can I fix them ? I think I should use charset but I'm not sure how . Here's how an email header looks like
Received: from smtpbg303.qq.com ([184.105.206.26]) by mx-ha.gmx.net
(mxgmxus001) with ESMTPS (Nemesis) id 0MAOx2-1X2yNC2ZFC-00BaVU for
<sormester@lobbyist.com>; Sat, 14 Jun 2014 18:11:48 +0200
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=qq.com; s=s201307;
t=1402762305; bh=imEvSr8IPsqWTXU63xUHRv+wuQG+Tcz2mPP9ai4rrE4=;
h=X-QQ-FEAT:X-QQ-SSF:X-HAS-ATTACH:X-QQ-BUSINESS-ORIGIN:
X-Originating-IP:In-Reply-To:References:X-QQ-STYLE:X-QQ-mid:From:To:Subject:Mime-Version:Content-Type:Content-Transfer-Encoding:Date:
X-Priority:Message-ID:X-QQ-MIME:X-Mailer:X-QQ-Mailer:
X-QQ-ReplyHash:X-QQ-SENDSIZE:X-QQ-FName:X-QQ-LocalIP;
b=QXs4CveboS8nG6htN9W6amC3X+F7X3ZtFrt6jrjWI+RmbvqBuTCVmX9IlaqCX84H8
n14x2Wp7x4kDYcNRqhe+HjTpf715TTQXc4d40b9e38frC/5qIhpMtYNsD8iEJwRzHW
U3xi8Yq7OCIB303fIpytx8tOjexQpZKSHbJ7ecX0=
X-QQ-FEAT: zaIfg0hwV2pIDflZYPQUsuPPXG5wtRVHJU6PiOYLBBA=
X-QQ-SSF: 00010000000000F000000000000000L
X-HAS-ATTACH: no
X-QQ-BUSINESS-ORIGIN: 2
X-Originating-IP: 180.155.99.102
In-Reply-To: <trinity-b7c6d611-52fd-4afa-b739-2deb243532a6-1402761364579@3capp-mailcom-lxa05>
References: <97e07dab7c2d1a005ed928c4350690e0@hotels-desk.co.uk>,
<tencent_105D3DC11702F53465C0025D@qq.com>
<trinity-b7c6d611-52fd-4afa-b739-2deb243532a6-1402761364579@3capp-mailcom-lxa05>
X-QQ-STYLE:
X-QQ-mid: webmail474t1402762303t356131
From: "=?gb18030?B?08bTzg==?=" <38438nx@qq.com>
To: "=?gb18030?B?V2lsaGVsbSBLdW1tZXI=?=" <sormester@lobbyist.com>
Subject: =?gb18030?B?u9i4tKO6ILvYuLSjulBhbGFjZSBXZXN0bWluc3Rl?=
=?gb18030?B?cjogMDEtMDctMjAxNCAtIDA0LTA3LTIwMTQ=?=
Mime-Version: 1.0
Content-Type: multipart/alternative;
boundary="----=_NextPart_539C743F_08A07490_0157E268"
Content-Transfer-Encoding: 8Bit
Date: Sun, 15 Jun 2014 00:11:43 +0800
X-Priority: 3
Message-ID: <tencent_573A737E73016B9F5A3D10C1@qq.com>
X-QQ-MIME: TCMime 1.0 by Tencent
X-Mailer: QQMail 2.x
X-QQ-Mailer: QQMail 2.x
X-QQ-ReplyHash: 170675637
X-QQ-SENDSIZE: 520
X-QQ-FName: 7B2EFFAD16B8462B84D3499A4CC7DDEF
X-QQ-LocalIP: 163.177.66.155
Envelope-To: <sormester@lobbyist.com>
X-GMX-Antispam: 0 (Mail was not recognized as spam); Detail=V3;
X-GMX-Antivirus: 0 (no virus found)
编辑:
我试图使用 charset 包,但它没有效果。我仍然收到相同的错误信息。
I've tried to use the charset package it but it has no effect. I still get the same error on the same messages.
import "code.google.com/p/go-imap/go1/imap"
header := imap.AsBytes(rsp.MessageInfo().Attrs["RFC822.HEADER"])
r, err := charset.NewReader("UTF-8", bytes.NewReader(header))
if err != nil {
log.Fatal(err)
}
fmt.Printf("new char is %v", r)
msg, err := mail.ReadMessage(r)
if err != nil {
log.Fatal(err)
return mgs, err
}
mg.From, err = msg.Header.AddressList("From")
if err != nil {
log.Errorf("NO FROM msg %s, err %v", header, err)
return
}
邮件包似乎只能解码 rfc2047
,但是charset包不支持这个
The mail package seems to be able to decode only rfc2047
but the charset package doesn't support this
character set "rfc2047" not found
似乎 mahonia 可以解决问题?
推荐答案
Alexey Vasiliev的麻省理工学院许可的 http://github.com/le0pard/go-falcon/ 包括一个解析器
包,它适用于需要解码标头的任何编码包(肉在 utils.go )。
Alexey Vasiliev's MIT-licensed http://github.com/le0pard/go-falcon/ includes a parser
package that applies whichever encoding package is needed to decode the headers (the meat is in utils.go).
package main
import (
"bufio"
"bytes"
"fmt"
"net/textproto"
"github.com/le0pard/go-falcon/parser"
)
var msg = []byte(`Subject: =?gb18030?B?u9i4tKO6ILvYuLSjulBhbGFjZSBXZXN0bWluc3Rl?=
=?gb18030?B?cjogMDEtMDctMjAxNCAtIDA0LTA3LTIwMTQ=?=
`)
func main() {
tpr := textproto.NewReader(bufio.NewReader(bytes.NewBuffer(msg)))
mh, err := tpr.ReadMIMEHeader()
if err != nil {
panic(err)
}
for name, vals := range mh {
for _, val := range vals {
val = parser.MimeHeaderDecode(val)
fmt.Print(name, ": ", val, "\n")
}
}
}
看起来它的 parser.FixEncodingAndCharsetOfPart
也被包使用来解码/转换内容,虽然通过转换 []导致的几个额外的分配/
It looks like its parser.FixEncodingAndCharsetOfPart
is used by the package to decode/convert content as well, though with a couple of extra allocations caused by converting the []byte
body to/from a string
. If you don't find the API works for you, you might at least be able to use the code to see how it can be done.
通过godoc.org的...并由3个包导入链接从编码/简体中文 - hooray godoc.org!
Found via godoc.org's "...and is imported by 3 packages" link from encoding/simplifiedchinese -- hooray godoc.org!
这篇关于当使用邮件包时,“缺少词组:不支持字符集”的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!