解析类似电子邮件的标头(类似于RFC822) [英] Parsing e-mail-like headers (similar to RFC822)

查看:245
本文介绍了解析类似电子邮件的标头(类似于RFC822)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想解析一个机器人信息的数据库.据说它类似于RFC822消息.

There is a database of bot information that I would like to parse. It is said to be similar to RFC822 messages.

在我重新发明轮子并编写自己的解析器之前,我想知道是否还有其他可用的东西.我偶然发现了 imap_rfc822_parse_headers() ,这似乎确实可以正是我想要的.不幸的是,我的环境中不提供IMAP扩展.

Before I re-invent the wheel and write a parser of my own, I figured I would see if something else was already available. I stumbled across imap_rfc822_parse_headers(), which seems to do exactly what I want. Unfortunately, the IMAP extension is not available in my environment.

我在网上和堆栈溢出中看到了许多替代方案.不幸的是,它们全都是为电子邮件构建的,其功能超出了我的需要……通常是解析整个电子邮件并以特殊方式处理标头.我只想简单地将这些标头解析为有用的对象或数组.

I have seen many alternatives online and on Stack Overflow. Unfortunately, they are all built for e-mail and do more than I need... often times parsing out an entire e-mail and handling headers in special ways. I just want to simply parse those headers into a useful object or array.

是否有直接的PHP版本的imap_rfc822_parse_headers()可用,或等效的可以解析这样的数据的版本?如果没有,我会写我自己的.

Is there a straight PHP version of imap_rfc822_parse_headers() available, or something equivalent that will parse data like this? If not, I will write my own.

robot-id: abcdatos
robot-name: ABCdatos BotLink
robot-from: no
robot-useragent: ABCdatos BotLink/1.0.2 (test links)
robot-language: basic
robot-description: This robot is used to verify availability of the ABCdatos
                   directory entries (http://www.abcdatos.com), checking
                   HTTP HEAD. Robot runs twice a week. Under HTTP 5xx
                   error responses or unable to connect, it repeats
                   verification some hours later, verifiying if that was a
                   temporary situation.
robot-history: This robot was developed by ABCdatos team to help
               working in the directory maintenance.
robot-environment: commercial
modified-date: Thu, 29 May 2003 01:00:00 GMT
modified-by: ABCdatos

robot-id:                       acme-spider
robot-name:                     Acme.Spider
robot-cover-url:                http://www.acme.com/java/software/Acme.Spider.html
robot-exclusion:                yes
robot-exclusion-useragent:      Due to a deficiency in Java it's not currently possible to set the User-Agent.
robot-noindex:                  no
robot-host:                     *
robot-language:                 java
robot-description:              A Java utility class for writing your own robots.
robot-history:                  
robot-environment:              
modified-date:                  Wed, 04 Dec 1996 21:30:11 GMT
modified-by:                    Jef Poskanzer

...

推荐答案

假定$data包含您上面粘贴的示例数据,下面是解析器:

Assuming that $data contains the sample data you pasted above, here is the parser:

<?php

/* 
 * $data = <<<'DATA'
 * <put-sample-data-here>
 * DATA;
 *
 */

$parsed  = array();
$blocks  = preg_split('/\n\n/', $data);
$lines   = array();
$matches = array();
foreach ($blocks as $i => $block) {
    $parsed[$i] = array();
    $lines = preg_split('/\n(([\w.-]+)\: *((.*\n\s+.+)+|(.*(?:\n))|(.*))?)/',
                        $block, -1, PREG_SPLIT_DELIM_CAPTURE);
    foreach ($lines as $line) {
        if(preg_match('/^\n?([\w.-]+)\: *((.*\n\s+.+)+|(.*(?:\n))|(.*))?$/',
                      $line, $matches)) {
            $parsed[$i][$matches[1]] = preg_replace('/\n +/', ' ',
                                                    trim($matches[2]));
        }
    }
}

print_r($parsed);

这篇关于解析类似电子邮件的标头(类似于RFC822)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆