Coldfusion-如何解析和分段电子邮件文件中的数据 [英] Coldfusion - How to parse and segment out data from an email file

查看:91
本文介绍了Coldfusion-如何解析和分段电子邮件文件中的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试解析定期发送的电子邮件文件,以获取其中包含的数据。我们计划设置cfmail,以使CF Admin框中的电子邮件每分钟运行一次。



电子邮件中的数据由名称,代号,地址,描述等组成,并将具有一致的标签,因此我们正在考虑对以下对象执行循环或查找功能数据的每个字段。



以下是电子邮件数据的示例:


INCIDENT#12345



长期系统#C12345



已报告:08:39:34 05/20 / 19性质:FD NEED地址:12345 N TEST LN
城市:Testville



响应单位:T12



十字路口:N个测试LN& W TEST LN



拉特= 39.587453 Lon = -86.485021



评论:这是一个测试职位。请忽略


下面是数据的实际图片:





所以我们要提取以下内容:


  1. 事件

  2. 长期系统#

  3. 已报告

  4. 自然

  5. 地址

  6. 城市

  7. 响应单位

  8. 十字路口

  9. 评论

任何反馈或建议将不胜感激!

解决方案

SQL往往具有有限的字符串函数,因此它不是解析的最佳工具。如果电子邮件内容始终是该格式的 ,则可以使用纯字符串函数或正则表达式进行解析。但是,后者更灵活。



我怀疑内容实际上确实包含新行,这将使解析变得更简单。但是,如果您希望在两个标签之间搜索内容,则可以使用正则表达式来解决问题。



构建一个标签名称数组(仅)。遍历数组,获取一对标签:当前和下一个。使用正则表达式中的两个值提取它们之间的文本:

  label& \s * [# #:=](。*?)& nextLabel 

/ *说明:* /
标签-第一个标签名称(例如:事件)
bs *-零个或多个空格
[# #:=]-这些字符中的任何一个:井号,冒号或等号
(。*?)-零个或多个字符组成的组(非贪婪)
nextLabel-下一个标签(例如:长期系统)

使用



The较新的CF2016 +语法较为流畅,但在CF10下可以使用以下方法:

  emailBody = INCIDENT#12345 ...等。 ...; 
labelArray = [事件,长期系统,已报告,...,评论];

for(pos = 1; pos< = arrayLen(labelArray); pos ++){

//获取当前标签和下一个标签
hasNext = pos< arrayLen(labelArray);
currLabel = labelArray [pos];
nextLabel =(hasNext?labelArray [pos + 1]: $);

//提取标签和值
matchs = reFindNoCase(currLabel& \s * [##:=](。*?)& nextLabel,emailBody,1,真正);
if(arrayLen(matches.len)> = 2){
results [currLabel] = mid(emailBody,match.pos [2],matchs.len [2]);
}
}

writeDump(results);

结果:




I am trying to parse email files that will be coming periodically for data that is contained within. We plan to setup cfmail to get the email within the box within CF Admin to run every minute.

The data within the email consists of name, code name, address, description, etc. and will have consistent labels so we are thinking of performing a loop or find function for each field of data. Would that be a good start?

Here is an example of email data:

INCIDENT # 12345

LONG TERM SYS# C12345

REPORTED: 08:39:34 05/20/19 Nature: FD NEED Address: 12345 N TEST LN City: Testville

Responding Units: T12

Cross Streets: Intersection of: N Test LN & W TEST LN

Lat= 39.587453 Lon= -86.485021

Comments: This is a test post. Please disregard

Here's a picture of what the data actually looks like:

So we would like to extract the following:

  1. INCIDENT
  2. LONG TERM SYS#
  3. REPORTED
  4. Nature
  5. Address
  6. City
  7. Responding Units
  8. Cross Streets
  9. Comments

Any feedback or suggestions would be greatly appreciated!

解决方案

SQL tends to have limited string functions, so it isn't the best tool for parsing. If the email content is always in that exact format, you could use either plain string functions or regular expressions to parse it. However, the latter is more flexible.

I suspect the content actually does contain new lines, which would make for simpler parsing. However, if you prefer searching for content in between two labels, regular expressions would do the trick.

Build an array of the label names (only). Loop through the array, grabbing a pair of labels: "current" and "next". Use the two values in a regular expression to extract the text in between them:

label &"\s*[##:=](.*?)"& nextLabel

/* Explanation: */
label        - First label name (example: "Incident")
\s*          - Zero or more spaces 
[##:=]       - Any of these characters: pound sign, colon or equal sign 
(.*?)        - Group of zero or more characters (non-greedy) 
nextLabel    - Next label (example: "Long Term Sys")

Use reFindNoCase() to get details about the position and length of matched text. Then use those values in conjunction with mid() to extract the text.

Note, newer versions like ColdFusion 2016+ automagically extract the text under a key named MATCH

The newer CF2016+ syntax is slicker, but something along these lines works under CF10:

emailBody = "INCIDENT # 12345 ... etc.... ";
labelArray = ["Incident", "Long Term Sys", "Reported", ..., "Comments" ];

for (pos = 1; pos <= arrayLen(labelArray); pos++) {

    // get current and next label
    hasNext   = pos < arrayLen(labelArray);
    currLabel = labelArray[ pos ];
    nextLabel = (hasNext ? labelArray[ pos+1 ] : "$");

    // extract label and value
    matches   = reFindNoCase( currLabel &"\s*[##:=](.*?)"& nextLabel, emailBody, 1, true);
    if (arrayLen(matches.len) >= 2) {
        results[ currLabel ] = mid( emailBody, matches.pos[2], matches.len[2]);
    }   
}

writeDump( results );

Results:

这篇关于Coldfusion-如何解析和分段电子邮件文件中的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆