使用值列表中的regex获取方括号的内容 [英] Get contents of brackets using regex in a list of values

查看:125
本文介绍了使用值列表中的regex获取方括号的内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图寻找一个regex(Coldfusion或Java),可以让我的每个(param \d +)的括号之间的内容不会失败。我尝试了几十种不同类型的正则表达式,我得到的最接近的是这一个:

I'm trying to look for a regex (Coldfusion or Java) that can get me the contents between the brackets for each (param \d+) without fail. I've tried dozens of different types of regexes and the closest one I got is this one:

\(param \d+\) = \[(type='[^']*', class='[^']*', value='(?:[^']|'')*', sqltype='[^']*')\]

这将是完美的,如果我回来的字符串从CF转义的单引号从value参数。但它不是这样,它惨败。走这样一个负前瞻的路线:

Which would be perfect, if the string that I get back from CF escaped single quotes from the value parameter. But it doesn't so it fails miserably. Going the route of a negative lookahead like so:

\[(type='[^']*', class='[^']*', value='(?:(?!', sqltype).)*', sqltype='[^']*')\]

很好,除非有一些不成熟的原因,有一个代码,确实有,sqltype 中的值。我发现很难相信,我不能简单地告诉regex挖出每个打开和关闭括号的内容,但是我又不知道足够的正则表达式来了解其限制。

Is great, unless for some unnatured reason there's a piece of code that quite literally has , sqltype in the value. I find it hard to believe I can't simply tell regex to scoop out the contents of every open and closed bracket it finds but then again, I don't know enough regex to know its limits.

下面是一个我想要解析的字符串示例:

Here's an example string of what I'm trying to parse:

(param 1) = [type='IN', class='java.lang.Integer', value='47', sqltype='cf_sql_integer'] , (param 2) = [type='IN', class='java.lang.String', value='asf , O'Reilly, really?', sqltype='cf_sql_varchar'] , (param 3) = [type='IN', class='java.lang.String', value='Th[is]is Ev'ery'thing That , []can break it ', sqltype= ', sqltype='cf_sql_varchar']


$ b b

对于好奇,这是一个子问题,可以可复制的Coldfusion SQL异常

这是我尝试在CF9.1中实现@ Mena的答案。遗憾的是它没有完成处理字符串。我不得不替换 \\ \ 只是为了让它运行起来,

This is my attempt at implementing @Mena's answer in CF9.1. Sadly it doesn't finish processing the string. I had to replace the \\ with \ just to get it to run at first, but my implementation might still be at fault.

这是给定的字符串(管道只是表示边界):

This is the string given (pipes are just to denote boundary):

| (param 1) = [type='IN', class='java.lang.Integer', value='47', sqltype='cf_sql_integer'] , (param 2) = [type='IN', class='java.lang.String', value='asf , O'Reilly], really?', sqltype='cf_sql_varchar'] , (param 3) = [type='IN', class='java.lang.String', value='Th[is]is Ev'ery'thing That , []can break it ', sqltype ', sqltype='cf_sql_varchar'] | 

这是我的实现:

    <cfset var outerPat = createObject("java","java.util.regex.Pattern").compile(javaCast("string", "\((.+?)\)\s?\=\s?\[(.+?)\](\s?,|$)"))>
    <cfset var innerPat = createObject("java","java.util.regex.Pattern").compile(javaCast("string", "(.+?)\s?\=\s?'(.+?)'\s?,\s?"))>
    <cfset var outerMatcher = outerPat.matcher(javaCast("string", arguments.params))>

    <cfdump var="Start"><br />
    <cfloop condition="outerMatcher.find()">     
        <cfdump var="#outerMatcher.group(1)#"> (<cfdump var="#outerMatcher.group(2)#">)<br />
        <cfset var innerMatcher = innerPat.matcher(javaCast("string", outerMatcher.group(2)))>
        <cfloop condition="innerMatcher.find()">
            <cfoutput>|__</cfoutput><cfdump var="#innerMatcher.group(1)#"> --> <cfdump var="#innerMatcher.group(2)#"><br />
        </cfloop>
        <br />
    </cfloop>
    <cfabort>

这是打印的:

Start 
param 1 ( type='IN', class='java.lang.Integer', value='47', sqltype='cf_sql_integer' )
|__ type --> IN 
|__ class --> java.lang.Integer 
|__ value --> 47 

param 2 ( type='IN', class='java.lang.String', value='asf , O'Reilly )
|__ type --> IN 
|__ class --> java.lang.String 

End


推荐答案

以下是适用于您的示例输入的Java正则表达式模式。

Here's a Java regex pattern that works for your sample input.

(?x)

# lookbehind to check for start of string or previous param
# java lookbehinds must have max length, so limits sqltype
(?<=^|sqltype='cf_sql_[a-z]{1,16}']\ ,\ )

# capture the full string for replacing in the orig sql
# and just the position to verify against the match position
(\(param\ (\d+)\))

\ =\ \[

# type and class wont contain quotes
   type='([^']++)'
,\ class='([^']++)'

# match any non-quote, then lazily keep going
,\ value='([^']++.*?)'

# sqltype is always alphanumeric
,\ sqltype='cf_sql_[a-z]+'

\]

# lookahead to check for end of string or next param
(?=$|\ ,\ \(param\ \d+\)\ =\ \[)

$ c>(?x)标志用于注释模式,它忽略未转义的空格和散列与行尾之间。)

(The (?x) flag is for comment mode, which ignores unescaped whitespace and between a hash and end of line.)

下面是在CFML中实现的模式(在CF9,0,1,274733上测试)。它使用 cfRegex (使用CFML更容易使用Java正则表达式的库)来获取该模式的结果,然后进行几个检查,以确保找到预期数量的params。

And here's that pattern implemented in CFML (tested on CF9,0,1,274733). It uses cfRegex (a library which makes it easier to work with Java regex in CFML) to get the results of that pattern, and then does a couple of checks to make sure the expected number of params are found.

<cfsavecontent variable="Input">
(param 1) = [type='IN', class='java.lang.Integer', value='47', sqltype='cf_sql_integer']
 , (param 2) = [type='IN', class='java.lang.String', value='asf , O'Reilly, really?', sqltype='cf_sql_varchar']
 , (param 3) = [type='IN', class='java.lang.String', value='Th[is]is Ev'ery'thing That , []can break it ', sqltype= ', sqltype='cf_sql_varchar']
</cfsavecontent>
<cfset Input = trim(Input).replaceall('\n','')>

<cfset cfcatch = 
    { params = input
    , sql = 'SELECT stuff FROM wherever WHERE (param 3) is last param'
    }/>

<cfsavecontent variable="ParamRx">(?x)

    # lookbehind to check for start or previous param
    # java lookbehinds must have max length, so limits sqltype
    (?<=^|sqltype='cf_sql_[a-z]{1,16}']\ ,\ )

    # capture the full string for replacing in the orig sql
    # and just the position to verify against the match position
    (\(param\ (\d+)\))

    \ =\ \[

    # type and class wont contain quotes
       type='([^']++)'
    ,\ class='([^']++)'

    # match any non-quote, then lazily keep going if needed
    ,\ value='([^']++.*?)'

    # sqltype is always alphanumeric
    ,\ sqltype='cf_sql_[a-z]+'

    \]

    # lookahead to check for end or next param
    (?=$|\ ,\ \(param\ \d+\)\ =\ \[)

</cfsavecontent>

<cfset FoundParams = new Regex(ParamRx).match
    ( text = cfcatch.params
    , returntype = 'full'
    )/>

<cfset LastParamPos = cfcatch.sql.lastIndexOf('(param ') + 7 />
<cfset LastParam = ListFirst( Mid(cfcatch.sql,LastParamPos,3) , ')' ) />

<cfif LastParam NEQ ArrayLen(FoundParams) >
    <cfset ProblemsDetected = true />
<cfelse>
    <cfset ProblemsDetected = false />

    <cfloop index="i" from=1 to=#ArrayLen(FoundParams)# >

        <cfif i NEQ FoundParams[i].Groups[2] >
            <cfset ProblemsDetected = true />
        </cfif>

    </cfloop>
</cfif>

<cfif ProblemsDetected>
    <big>Something went wrong!</big>
<cfelse>
    <big>All seems fine</big>
</cfif>

<cfdump var=#FoundParams# />

如果将整个参数嵌入另一个参数的值内,如果你尝试两个(或更多),它会失败,但至少最低限度检查应该检测到这个失败。

This will actually work if you embed an entire param inside the value of another param. It fails if you try two (or more), but at least least the checks should detect this failure.

转储输出应该是什么样子:

Here's what the dump output should look like:

希望这里的一切都有意义 - 让我知道是否有任何问题。

Hopefully everything here makes sense - let me know if any questions.

这篇关于使用值列表中的regex获取方括号的内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆