如何使用 Tcl 正则表达式提取所有匹配项? [英] How do I extract all matches with a Tcl regex?

查看:113
本文介绍了如何使用 Tcl 正则表达式提取所有匹配项?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好,我想要这个正则表达式的解决方案,我的问题是以H'xxxx形式提取所有十六进制数字,我使用了这个正则表达式,但我没有得到所有的十六进制值,只有我得到一个数字,如何从这个字符串中得到整个十六进制数

hi everybody i want solution for this regular expression, my problem is Extract all the hex numbers in the form H'xxxx, i used this regexp but i didn't get all hexvalues only i get one number, how to get whole hex number from this string

set hex "V5CCH,IA=H'22EF&H'2354&H'4BD4&H'4C4B&H'4D52&H'4DC9"
set res [regexp -all {H'([0-9A-Z]+)&} $hex match hexValues]
puts "$res H$hexValues"

我得到的输出是 5 H4D52

i am getting output is 5 H4D52

推荐答案

On -all -inline

来自文档:

-all :使正则表达式在字符串中尽可能多地匹配,返回找到的匹配总数.如果这是用匹配变量指定的,它们将只包含最后一次匹配的信息.

-all : Causes the regular expression to be matched as many times as possible in the string, returning the total number of matches found. If this is specified with match variables, they will contain information for the last match only.

-inline :使命令以列表形式返回原本会放置在匹配变量中的数据.使用-inline 时,可能不指定匹配变量.如果与 -all 一起使用,列表将在每次迭代时连接,这样总是返回一个平面列表.对于每次匹配迭代,该命令将附加整个匹配数据,并为正则表达式中的每个子表达式添加一个元素.

-inline : Causes the command to return, as a list, the data that would otherwise be placed in match variables. When using -inline, match variables may not be specified. If used with -all, the list will be concatenated at each iteration, such that a flat list is always returned. For each match iteration, the command will append the overall match data, plus one element for each subexpression in the regular expression.

因此要将所有匹配项(包括按组捕获)作为 Tcl 中的平面列表返回,您可以编写:

Thus to return all matches --including captures by groups-- as a flat list in Tcl, you can write:

set matchTuples [regexp -all -inline $pattern $text]

如果模式有 0…N-1 组,那么每个匹配项都是列表中的一个 N 元组.因此实际匹配的数量是这个列表的长度除以N.然后,您可以使用 foreachN 变量来遍历列表的每个元组.

If the pattern has groups 0…N-1, then each match is an N-tuple in the list. Thus the number of actual matches is the length of this list divided by N. You can then use foreach with N variables to iterate over each tuple of the list.

如果 N = 2 例如,你有:

set numMatches [expr {[llength $matchTuples] / 2}]

foreach {group0 group1} $matchTuples {
   ...
}

参考资料

  • regular-expressions.info/Tcl
  • 这是针对此特定问题的解决方案,将输出注释为注释(另见 ideone.com):

    Here's a solution for this specific problem, annotated with output as comments (see also on ideone.com):

    set text "V5CCH,IA=H'22EF&H'2354&H'4BD4&H'4C4B&H'4D52&H'4DC9"
    set pattern {H'([0-9A-F]{4})}
     
    set matchTuples [regexp -all -inline $pattern $text]
     
    puts $matchTuples
    # H'22EF 22EF H'2354 2354 H'4BD4 4BD4 H'4C4B 4C4B H'4D52 4D52 H'4DC9 4DC9
    # \_________/ \_________/ \_________/ \_________/ \_________/ \_________/
    #  1st match   2nd match   3rd match   4th match   5th match   6th match
     
    puts [llength $matchTuples]
    # 12
     
    set numMatches [expr {[llength $matchTuples] / 2}]
    puts $numMatches
    # 6
     
    foreach {whole hex} $matchTuples {
       puts $hex
    }
    # 22EF
    # 2354
    # 4BD4
    # 4C4B
    # 4D52
    # 4DC9
    


    关于模式

    请注意,我稍微改变了模式:


    On the pattern

    Note that I've changed the pattern slightly:

    • 代替 [0-9A-Z]+,例如[0-9A-F]{4} 更具体地用于精确匹配 4 个十六进制数字
    • 如果你坚持要匹配&,那么最后一个十六进制字符串(你输入的H'4DC9)就匹配不上了
      • 这解释了为什么您在原始脚本中得到 4D52,因为这是与 &
      • 的最后一次匹配
      • 也许去掉 &,或者使用 (&|$) 代替,即一个 & 或结尾字符串 $.
      • Instead of [0-9A-Z]+, e.g. [0-9A-F]{4} is more specific for matching exactly 4 hexadecimal digits
      • If you insist on matching the &, then the last hex string (H'4DC9 in your input) can not be matched
        • This explains why you get 4D52 in the original script, because that's the last match with &
        • Maybe get rid of the &, or use (&|$) instead, i.e. a & or the end of the string $.

        这篇关于如何使用 Tcl 正则表达式提取所有匹配项?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆