通过LINQ中的模式查找动态单词 [英] Find dynamic words through patterns in LINQ
问题描述
这是html的启动方式
Here is how the html starts
业务文档
<p>Some company</p>
<p>
<p>DEPARTMENT: Legal Process</p>
<p>FUNCTION: Computer Department</p>
<p>PROCESS: Process Server</p>
<p>PROCEDURE: ABC Process Server</p>
<p>OWNER: Some User</p>
<p>REVISION DATE: 06/10/2013</p>
<p>
<p>OBJECTIVE: To ensure that the process server receive their invoices the following day.</p>
<p>
<p>WHEN TO PERFORM: Daily</p>
<p>
<p>WHO WILL PERFORM? Computer Team</p>
<p>
<p>TIME TO COMPLETE: 5 minutes</p>
<p>
<p>TECHNOLOGY REQUIREMENT(S): </p>
<p>
<p>SOURCE DOCUMENT(S): N/A</p>
<p>
<p>CODES AND DEFINITIONS: N/A</p>
<p>
<table border="1">
<tr>
<td>
<p>KPI’s: </p>
</td>
</tr>
</table>
<p>
<table border="1">
<tr>
<td>
<p>RISKS: </p>
</td>
</tr>
</table>
在这之后有一大堆文字.我需要做的是从以上所述,我需要解析出特定的数据.
After this there is a whole bunch of text. What I need to do is from the above I need to parse out specific data.
我需要解析部门,职能,流程,程序.目标,执行时间,执行人员,完成时间,技术要求,原始文档,代码和定义,风险.
I need to parse out the Department, Function, Process, Procedure. Objective, When to Perform, Who Will Perform, Time To Complete, Technology Requirements, Source Documents, Codes and Definitions, Risks.
然后,我需要从"HTML"列中删除此信息,同时保持其他所有内容不变.在LINQ中有可能吗?
I then need to delete this information from the Html column while leaving everything else in-tact. Is this possible in LINQ?
这是我正在使用的LINQ查询:
Here is the LINQ query I am using:
var result = (from d in IPACS_Documents
join dp in IPACS_ProcedureDocs on d.DocumentID equals dp.DocumentID
join p in IPACS_Procedures on dp.ProcedureID equals p.ProcedureID
where d.DocumentID == 4
&& d.DateDeleted == null
select d.Html);
Console.WriteLine(result);
推荐答案
此正则表达式对您的输入数据非常适合我
This regex worked just fine for me on your input data
(DEPARTMENT|FUNCTION|OBJECTIVE):\s*(?<value>.+)\<
结果是多个匹配项,每个匹配项有2组-第一个为键,第二个为值.我只处理了两种情况,但是您可以轻松地添加其余的情况.
The result is multiple Matches with 2 groups each - the first the key and the second the value. I have only handled two cases, but you can add the rest easily enough.
要删除由此解析的信息,您可以执行 Regex.Replace 使用此正则表达式
To remove the information thus parsed, you can do a Regex.Replace with this regex
(?\(部门|职能|目标):\ s *)(?.+)(?\)
(?\(DEPARTMENT|FUNCTION|OBJECTIVE):\s*)(?.+)(?\)
,替换字符串为
$ {start} $ {end}
${start}${end}
留下价值.
在代码中,看起来有点像这样(在Notepad ++中快速键入,可能有一些小错误).
In code, this looks kinda like this (quickly typed this out in Notepad++ - may have minor errors).
private static readonly ParseDocRegex = new Regex(@"(?<start>\<p\>(?<name>DEPARTMENT|FUNCTION|OBJECTIVE):\s*)(?<value>.+)(?<end>\</p\>)", RegexOptions.ExplicitCaptured | RegexOptions.Compiled);
...
from html in result
let matches = findValuesRegex.Match(html)
where matches.Success
select new
{
namesAndValues = from m in matches.AsType<Match>()
select new KeyValuePair<string, string>(m.Groups["name"].Value, m.Groups["value"].Value),
strippedHtml = ParseDocRegex.Replace(html, "${start}${end}")
};
这应该为您提供所需的输出.
This ought to give you the desired output.
这篇关于通过LINQ中的模式查找动态单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!