如何解析Ruby中的邮箱文件? [英] How to parse mailbox file in Ruby?
问题描述
Ruby gem rmail
具有解析本地磁盘上的邮箱文件的方法。不幸的是,这个宝石已经破碎了(在Ruby 2.0.0中)。它可能无法修复,因为人们正在迁移到宝石邮件
。
The Ruby gem rmail
has methods to parse a mailbox file on local disk. Unfortunately this gem has broken (in Ruby 2.0.0). It might not get fixed, because folks are migrating to the gem mail
.
宝石邮件
具有方法 Mail.read('filename.txt')
,但只解析邮箱中的第一条消息。
Gem mail
has method Mail.read('filename.txt')
, but that parses only the first message in a mailbox.
该gem和内置 Net :: IMAP
已经通过imap访问邮箱的教程淹没了网络。
That gem, and builtin Net::IMAP
, have flooded the net with tutorials on accessing mailboxes through imap.
那么,还有一种方法来解析一个普通的旧的文件,没有imap?
作为我的小组里的孤独的红宝石,我宁可不用尴尬地诉诸于 http://docs.python.org/2/library/mailbox.html 。
So, is there still a way to parse a plain old file, without imap? As the lone rubyist in my group I'd rather not embarrass myself by resorting to http://docs.python.org/2/library/mailbox.html.
或者更糟糕的是,PHP的 imap_open ('/ var / mail / www-data',...)
- 如果只有 Net :: IMAP.new
接受的文件名
Or, worse yet, PHP's imap_open('/var/mail/www-data', ...)
-- if only Net::IMAP.new
accepted filenames like that.
推荐答案
好消息是 Mbox格式真的很简单,虽然简单就是为什么最终被替换。解析大型邮箱文件来提取单个邮件并不是特别有效的。
The good news is the Mbox format is really dead simple, though it's simplicity is why it was eventually replaced. Parsing a large mailbox file to extract a single message is not specially efficient.
如果您可以将邮箱文件拆分为单独的字符串,则可以将这些字符串传递到邮件图书馆解析。
If you can split apart the mailbox file into separate strings, you can pass these strings to the Mail library for parsing.
一个示例起点:
def parse_message(message)
Mail.new(message)
do_other_stuff!
end
message = nil
while (line = STDIN.gets)
if (line.match(/\AFrom /))
parse_message(message) if (message)
message = ''
else
message << line.sub(/^\>From/, 'From')
end
end
关键是每个消息以From
开头,其中的空格是关键。标题将被定义为 From:
,以开头的任何行> From
被视为
实际上是From
。这样的东西使得这种编码方法真的不够,但是如果Maildir不是一个选项,那么这就是你必须做的。
The key is that each message starts with "From "
where the space after it is key. Headers will be defined as From:
and any line that starts with ">From"
is to be treated as
actually being "From"
. It's things like this that make this encoding method really inadequate, but if Maildir isn't an option, this is what you've got to do.
这篇关于如何解析Ruby中的邮箱文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!