如何解析Ruby中的邮箱文件? [英] How to parse mailbox file in Ruby?

查看:159
本文介绍了如何解析Ruby中的邮箱文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Ruby gem rmail 具有解析本地磁盘上的邮箱文件的方法。不幸的是,这个宝石已经破碎了(在Ruby 2.0.0中)。它可能无法修复,因为人们正在迁移到宝石邮件

The Ruby gem rmail has methods to parse a mailbox file on local disk. Unfortunately this gem has broken (in Ruby 2.0.0). It might not get fixed, because folks are migrating to the gem mail.

宝石邮件具有方法 Mail.read('filename.txt'),但只解析邮箱中的第一条消息。

Gem mail has method Mail.read('filename.txt'), but that parses only the first message in a mailbox.

该gem和内置 Net :: IMAP 已经通过imap访问邮箱的教程淹没了网络。

That gem, and builtin Net::IMAP, have flooded the net with tutorials on accessing mailboxes through imap.

那么,还有一种方法来解析一个普通的旧的文件,没有imap?
作为我的小组里的孤独的红宝石,我宁可不用尴尬地诉诸于 http://docs.python.org/2/library/mailbox.html

So, is there still a way to parse a plain old file, without imap? As the lone rubyist in my group I'd rather not embarrass myself by resorting to http://docs.python.org/2/library/mailbox.html.

或者更糟糕的是,PHP的 imap_open ('/ var / mail / www-data',...) - 如果只有 Net :: IMAP.new 接受的文件名

Or, worse yet, PHP's imap_open('/var/mail/www-data', ...) -- if only Net::IMAP.new accepted filenames like that.

推荐答案

好消息是 Mbox格式真的很简单,虽然简单就是为什么最终被替换。解析大型邮箱文件来提取单个邮件并不是特别有效的。

The good news is the Mbox format is really dead simple, though it's simplicity is why it was eventually replaced. Parsing a large mailbox file to extract a single message is not specially efficient.

如果您可以将邮箱文件拆分为单独的字符串,则可以将这些字符串传递到邮件图书馆解析。

If you can split apart the mailbox file into separate strings, you can pass these strings to the Mail library for parsing.

一个示例起点:

def parse_message(message)
  Mail.new(message)

  do_other_stuff!
end

message = nil

while (line = STDIN.gets)
  if (line.match(/\AFrom /))
    parse_message(message) if (message)
    message = ''
  else
    message << line.sub(/^\>From/, 'From')
  end
end

关键是每个消息以From开头,其中的空格是关键。标题将被定义为 From:,以开头的任何行> From被视为
实际上是From。这样的东西使得这种编码方法真的不够,但是如果Maildir不是一个选项,那么这就是你必须做的。

The key is that each message starts with "From " where the space after it is key. Headers will be defined as From: and any line that starts with ">From" is to be treated as actually being "From". It's things like this that make this encoding method really inadequate, but if Maildir isn't an option, this is what you've got to do.

这篇关于如何解析Ruby中的邮箱文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆