使用Email :: MIME和多部分/与子部分混合解析电子邮件 [英] Parsing email with Email::MIME and multipart/mixed with subparts

查看:122
本文介绍了使用Email :: MIME和多部分/与子部分混合解析电子邮件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是perl的新手,并且一直在与Email :: MIME一起使用,以弄清楚如何正确解析具有多个部分的电子邮件.我刚刚发现了我目前的努力无法正确阅读的另一种组合.

I'm a perl novice and have been working with Email::MIME to figure out how to parse emails with multiparts correctly. I've just identified another combination that my current efforts have not been able to properly read.

     Content-Type: multipart/mixed; boundary="===============1811908679642194059=="
 MIME-Version: 1.0

 This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
 --===============1811908679642194059==
 Content-Type: multipart/signed; micalg=pgp-sha256;
  protocol="application/pgp-signature";
  boundary="lGJM242FL2E9Wh4auTNwQRWOeFI0Wj9mB"

 This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
 --lGJM242FL2E9Wh4auTNwQRWOeFI0Wj9mB
 Content-Type: multipart/alternative;
  boundary="------------CC2F0C038668F58F6EDEA0D2"

 This is a multi-part message in MIME format.
 --------------CC2F0C038668F58F6EDEA0D2
 Content-Type: text/plain; charset=windows-1252
 Content-Transfer-Encoding: quoted-printable

 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

文本/纯文本部分是我想要的部分,但是阅读文本"部分只会给我这是一个多部分..."行,仅此而已.这是我开发的用于阅读其他具有类似子部分的电子邮件的代码,但是无法正确解释这一部分.

The text/plain part is the part that I want, but reading the "text" component just gives me the "This is a multi-part..." line and that's it. This is the code I've developed to read other emails with similar subparts, but it doesn't properly interpret this one.

它似乎与Email :: MIME:的一部分"body"功能有关

It looks to be related to the "body" function as part of Email::MIME:

 This decodes and returns the body of the object as a byte string. For
 top-level objects in multi-part messages, this is highly likely to be
 something like "This is a multi-part message in MIME format."

在Email :: MIME中使用什么适当的功能来正确读取此内容类型?

What is the proper function to use within Email::MIME to properly read this content type?

如何正确识别此电子邮件中的内容类型?是多部分/混合"或文本/纯文本"还是多部分/替代"?

How do properly identify the content-type in this email? Is it "multipart/mixed" or "text/plain" or "multipart/alternative"?

我什至要在这里使用subparts方法吗?

Do I even want to use the subparts method here?

 my @mailData;
 my $msg = Email::MIME->new($buf);
 foreach my $part ( $msg->subparts ) {
    foreach my $sub_part ($part->subparts) {
         print $sub_part->content_type;
        if ($sub_part->content_type =~ m!text!) {
            @mailData = split( '\n', $sub_part->body);
         }
    }
 }

上面的代码仅在@mailData数组中打印这是多部分消息...".

The code above only prints "This is a multi-part message..." in the @mailData array.

推荐答案

最近几天我一直在使用Email :: MIME,MIME :: Parser和MIME :: Entity来自动处理数字的电子邮件.我发现编码同一封电子邮件的标准方法很少,这比我想象的要困难得多.

I've spent the last few days working with Email::MIME, MIME::Parser and MIME::Entity in order to automate the processing of a number of emails. I've found there are so few standard ways of encoding the same email, that it was much more difficult than I thought.

这是一种处理电子邮件的标头和正文的可靠方法.非常感谢在此过程中提供帮助的所有人.

This is a pretty reliable way to process both the headers and body of an email. Thanks so much for all who helped along the way.

 #!/usr/bin/perl -w

 use strict;
 use MIME::Parser;
 use MIME::Entity;
 use Email::MIME;

 # Read the email from STDIN
 my $buf;
 while(<STDIN> ){
         $buf .= $_;
 }

 # This creates msg-NNNN-N.txt and signature-N.asc files
 # and I don't know why. Related to output_to_core?
 my $parser = MIME::Parser->new;
 $parser->extract_uuencode(1);
 $parser->extract_nested_messages(1);
 $parser->output_to_core(0);

 # For reading headers
 my $entity = $parser->parse_data($buf);

 # For reading the body (of an mbox)
 my $msg = Email::MIME->new($buf);

 # Use MIME::Entity to read various headers. 
 my $subject = $entity->head->get('Subject');
 my $from = $entity->head->get('From');
 my $AdvDate = $entity->head->get('Date');
 $AdvDate =~ s/\n//g; $subject =~ s/\n//g; $from =~ s/\n//g;

 print "Subject: $subject\n";
 print "From: $from\n";
 print "Date: $AdvDate\n";

 my @mailData;

  # walk through all the different attachments. Stop at the first one that matches and
  # read its contents into mailData. The first one typically appeared to be the primary one.
  $msg->walk_parts(sub {
      my ($part) = @_;
      #warn($part->content_type . ": " . $part->subparts);
      if (($part->content_type =~ /text\/plain; charset=\"?utf-8\"?/i) && !@mailData) {
         #print $part->body;
         @mailData = split( '\n', $part->body);
      }
      elsif (($part->content_type =~ /text\/plain; charset=\"?us-ascii\"?/i) && !@mailData) {
         #print $part->body;
         @mailData = split( '\n', $part->body);
      }
      elsif (($part->content_type =~ /text\/plain; charset=\"?windows-1252\"?/i) && !@mailData) {
         #print $part->body;
         @mailData = split( '\n', $part->body);
      }
      elsif (($part->content_type =~ /text\/plain; charset=\"?iso-8859-1\"?/i) && !@mailData) {
         #print $part->body;
         @mailData = split( '\n', $part->body);
      }
  });


 # manipulate the body of the message stored in mailData
 foreach my $line (@mailData) {
        print "$line\n";
 }

这篇关于使用Email :: MIME和多部分/与子部分混合解析电子邮件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆