ColdFusion,REGEX - 给定TEXT,查找SPAN中包含的所有项目 [英] ColdFusion , REGEX - Given TEXT, find all items contained in SPANs

查看:176
本文介绍了ColdFusion,REGEX - 给定TEXT,查找SPAN中包含的所有项目的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想学习如何在Coldfusion中创建一个REGEX,它将扫描大量的html文本并创建一个项目列表。



我想包含在以下之间

 < span class =findme>商品< / span>                h2_lin>解决方案

你不说什么版本的CF。从v8开始,您可以使用 REMatch 来获取数组

  results = REMatch('(?i)< span [^>] + class =findme[^ *>(。+?)< / span>',text)



使用ArrayToList进入列表。
对于旧版本,使用REFindNoCase并使用Mid()提取子字符串。



编辑:为了回答您的后续注释,匹配是相当复杂的,因为该函数只返回FIRST匹配。这意味着你实际上必须多次调用REFind,每次传递一个新的startpos。 Ben Forta写了一个UDF,这样做会节省一些时间。

 <!--- 
返回字符串中正则表达式的所有匹配项。
注意:更新为允许子表达式选择(而不是整个匹配)

@param regex正则表达式。 (必需)
@param text要搜索的字符串。 (必需)
@param subexnum要解压缩的子表达式(可选)
@return返回结构。
@author Ben Forta(ben@forta.com)
@version 1,2005年7月15日
--->
< cffunction name =reFindAlloutput =truereturnType =struct>
< cfargument name =regextype =stringrequired =yes>
< cfargument name =texttype =stringrequired =yes>
< cfargument name =subexnumtype =numericdefault =1>

<!---定义局部变量--->
< cfset var results = structNew()>
< cfset var pos = 1>
< cfset var subex =>
< cfset var done = false>

<!---初始化结果结构--->
< cfset results.len = arraynew(1)>
< cfset results.pos = arraynew(1)>

<!---循环通过文本--->
< cfloop condition =not done>

<!---执行搜索--->
< cfset subex = reFind(arguments.regex,arguments.text,pos,true)>
<!---任何匹配? --->
< cfif subex.len [1]是0>
<!---没有找到,outta here --->
< cfset done = true>
< cfelse>
<!---有一个,添加到数组--->
< cfset arrayappend(results.len,subex.len [arguments.subexnum])>
< cfset arrayappend(results.pos,subex.pos [arguments.subexnum])>
<!--- Reposition start point --->
< cfset pos = subex.pos [1] + subex.len [1]>
< / cfif>
< / cfloop>

<!---如果没有匹配,则将0添加到两个数组--->
< cfif arraylen(results.len)is 0>
< cfset arrayappend(results.len,0)>
< cfset arrayappend(results.pos,0)>
< / cfif>

<!---并返回结果--->
< cfreturn results>
< / cffunction>

这给出了每个匹配的开始(pos)和长度,以使每个子字符串使用另一个循环

 < cfset text ='< span class =findme>商品< / span>< span class =findme>更多商品< / span>'/> 
< cfset pattern ='(?i)< span [^>] + class =findme[^>] *>(。+?)< / span>'/
< cfset results = reFindAll(pattern,text,2)/>
< cfloop index =ifrom =1to =#ArrayLen(results.pos)#>
< cfoutput> match#i#:#Mid(text,results.pos [i],results.len [i])#< br>< / cfoutput>
< / cfloop>

编辑:更新了reFindAll和subexnum参数。将其设置为2将捕获第一个子表达式。默认值1捕获整个匹配。


I'm looking to learn how to create a REGEX in Coldfusion that will scan through a large item of html text and create a list of items.

The items I want are contained between the following

<span class="findme">The Goods</span>

Thanks for any tips to get this going.

解决方案

You don't say what version of CF. Since v8 you can use REMatch to get an array

results = REMatch('(?i)<span[^>]+class="findme"[^>]*>(.+?)</span>', text)

Use ArrayToList to turn that into a list. For older version use REFindNoCase and use Mid() to extract substrings.

EDIT: To answer your follow-up comment the process of using REFind to return all matches is quite involved because the function only returns the FIRST match. This means you actually have to call REFind many times passing a new startpos each time. Ben Forta has written a UDF which does exactly this and will save you some time.

<!---
Returns all the matches of a regular expression within a string.
NOTE: Updated to allow subexpression selection (rather than whole match)

@param regex      Regular expression. (Required)
@param text       String to search. (Required)
@param subexnum   Sub-expression to extract (Optional)
@return Returns a structure.
@author Ben Forta (ben@forta.com)
@version 1, July 15, 2005
--->
<cffunction name="reFindAll" output="true" returnType="struct">
<cfargument name="regex" type="string" required="yes">
<cfargument name="text" type="string" required="yes">
<cfargument name="subexnum" type="numeric" default="1">

<!--- Define local variables --->    
<cfset var results=structNew()>
<cfset var pos=1>
<cfset var subex="">
<cfset var done=false>

<!--- Initialize results structure --->
<cfset results.len=arraynew(1)>
<cfset results.pos=arraynew(1)>

<!--- Loop through text --->
<cfloop condition="not done">

   <!--- Perform search --->
   <cfset subex=reFind(arguments.regex, arguments.text, pos, true)>
   <!--- Anything matched? --->
   <cfif subex.len[1] is 0>
      <!--- Nothing found, outta here --->
      <cfset done=true>
   <cfelse>
      <!--- Got one, add to arrays --->
      <cfset arrayappend(results.len, subex.len[arguments.subexnum])>
      <cfset arrayappend(results.pos, subex.pos[arguments.subexnum])>
      <!--- Reposition start point --->
      <cfset pos=subex.pos[1]+subex.len[1]>
   </cfif>
</cfloop>

<!--- If no matches, add 0 to both arrays --->
<cfif arraylen(results.len) is 0>
   <cfset arrayappend(results.len, 0)>
   <cfset arrayappend(results.pos, 0)>
</cfif>

<!--- and return results --->
<cfreturn results>
</cffunction>

This gives you the start (pos) and length of each match so to get each substring use another loop

<cfset text = '<span class="findme">The Goods</span><span class="findme">More Goods</span>' />
<cfset pattern = '(?i)<span[^>]+class="findme"[^>]*>(.+?)</span>' />
<cfset results = reFindAll(pattern, text, 2) />
<cfloop index="i" from="1" to="#ArrayLen(results.pos)#">
    <cfoutput>match #i#: #Mid(text, results.pos[i], results.len[i])#<br></cfoutput>
</cfloop>

EDIT: Updated reFindAll with subexnum argument. Setting this to 2 will capture the first subexpression. The default value 1 captures the entire match.

这篇关于ColdFusion,REGEX - 给定TEXT,查找SPAN中包含的所有项目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆