从“内容处置:附件"中提取文本.身体的一部分 [英] Extract text from "content-disposition: attachment" body part

查看:111
本文介绍了从“内容处置:附件"中提取文本.身体的一部分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我会定期收到包含文本部分和文本附件的生成的电子邮件.我想测试附件是否为base64编码,然后将其解码为:

I regularly receive a generated email message containing a text part and a text attachment. I want to test if attachment is base64 encoded, then decode it like:

:0B
* ^(Content-Transfer-Encoding: *base64(($)[a-z0-9].*)*($))
{
 msgID=`printf '%s' "$MATCH" | base64 -d`
}

但是它总是说输入无效,有人知道这是怎么回事吗?

But it always say invalid input, anyone know what's wrong?

procmail: Match on "^()\/[a-z]+[0-9]+[^\+]"
procmail: Assigning "msgID=PGh0b"
procmail: matched "^(Content-Disposition: *attachment.*(($)[a-z0-9].*)*    |Content-Transfer-Encoding: *base64(($)[a-z0-9].*)*($)"

procmail: Executing "printf '%s' "$MATCH" | base64 -d"
base64: invalid input
procmail: Assigning "msgID=<ht"
procmail: Unexpected EOL


procmail: Assigning "msgID=PGh0b"
procmail: Match on "^(Content-Transfer-Encoding: *base64(($)[a-z0-9].*)*($))"
procmail: Executing "printf '%s' "$MATCH" | base64 -d"
base64: invalid input
procmail: Assigning "msgID=<ht"
procmail: Unexpected EOL

推荐答案

如果您的要求很复杂,编写专用的脚本提取所需的信息可能会更容易-一种具有适当MIME支持的现代脚本语言正在发展在涉及现代MIME电子邮件中内容编码和身体部位结构的各种不同可能性时,它的用途更加广泛.

If your requirements are complex, it might be easier to write a dedicated script which extracts the information you want -- a modern scripting language with proper MIME support is going to be a lot more versatile when it comes to all the myriad different possibilities for content encoding and body part structure in modern MIME email.

以下内容使用Content-Disposition: attachment查找MIME头的首次出现,并提取以下正文的第一个标记.如果您与使用定义明确的静态模板的发件人相对应,此 可能会做您想要的事情.这里没有真正的MIME解析,因此(例如)转发的消息恰好包含与模式匹配的嵌入式部分也将触发条件. (这可能是错误或功能.)

The following finds the first occurrence of MIME headers with Content-Disposition: attachment and extracts the first token of the following body. This might do what you want if you are corresponding with a sender who uses a well-defined, static template. There is no real MIME parsing here, so (say) a forwarded message which happens to contain an embedded part which matches the pattern will also trigger the conditions. (This can be a bug, or a feature.)

Procmail的一个有用但不经常使用的功能是能够编写跨越多行的正则表达式.在正则表达式中,($)始终与文字换行符匹配.因此,我们可以查找Content-Disposition: attachment标头,然后查找其他标头(零个或多个),后跟空白行,然后是要提取的令牌.

A useful but not frequently used feature of Procmail is the ability to write a regular expression which spans multiple lines. Within a regex, ($) always matches a literal newline. So with that, we can look for a Content-Disposition: attachment header followed by other headers (zero or more) followed by an empty line, followed by the token you want to extract.

:0B
* ^Content-Disposition: *attachment.*(($)[A-Z].*)*($)($)\/[A-Z]+[0-9]+
{ msgid="$MATCH" }

为简单起见,我没有尝试处理多行MIME标头.如果您要支持这一点,则修复应该相当明显,尽管一点也不优雅.

For simplicity, I have not attempted to cope with multi-line MIME headers. If you want to support that, the fix should be reasonably obvious, though not at all elegant.

在更一般的情况下,您可能想添加一个条件以检查条件中的MIME标头组还包含Content-type: text/plain;您将需要设置两个替代方法,以使Content-disposition:Content-disposition:之前或之后(或在以某种方式对MIME标头进行标准化之前;或相信发送方始终按照示例消息中的顺序生成它们).

In the somewhat more general case, you might want to add a condition to check that the group of MIME headers in the condition also contains a Content-type: text/plain; you will need to set up two alternatives for having Content-type: before or after Content-disposition: (or somehow normalize the MIME headers before getting to this recipe; or trust that the sender always generates them in exactly the order in the sample message).

这篇关于从“内容处置:附件"中提取文本.身体的一部分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆