如何使用正则表达式捕获“多个"重复组 [英] How to capture 'multiple' repeated groups with Regular Expressions

查看:142
本文介绍了如何使用正则表达式捕获“多个"重复组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想解析以下文本文件以获取各个字段:

I have the following text file I would like to parse out to get the individual fields:

host_group_web = ( )
host_group_lbnorth = ( lba050 lbhou002 lblon003 )

我要提取的字段以粗体显示

The fields that I would like to extract are in bold

  • host_group_ 网络 =()
  • host_group_ lbnorth =( lba505 lbhou002 lblon003 )
  • host_group_web = ( )
  • host_group_lbnorth = ( lba505 lbhou002 lblon003 )

host_group_web在()之间没有任何项目,因此该部分将被忽略

host_group_web has no items in between the ( ), so that portion would be ignored

我将第一个组命名为 nodegroup ,将()之间的项目命名为 nodes

I've named the first group as nodegroup and the items in between the () as nodes

我正在逐行读取文件,并存储结果以供进一步处理.

I am reading the file line by line, and storing the results for further processing.

在Golang中,这是我正在使用的Regex的代码段:

In Golang, This is the snippet of Regex I am using:

hostGroupLine := "host_group_lbnorth = ( lba050 lbhou002 lblon003 )"
hostGroupExp := regexp.MustCompile(`host_group_(?P<nodegroup>[[:alnum:]]+)\s*=\s*\(\s*(?P<nodes>[[:alnum:]]+\s*)`)
hostGroupMatch := hostGroupExp.FindStringSubmatch(hostGroupLine)

for i, name := range hostGroupExp.SubexpNames() {
  if i != 0 {
    fmt.Println("GroupName:", name, "GroupMatch:", hostGroupMatch[i])
  }
}

我得到以下输出,该输出缺少节点命名组的其余匹配项.

I get the following output, which is missing the rest of the matches for the nodes named group.

GroupName: nodegroup GroupMatch: lbnorth
GroupName: nodes GroupMatch: lba050

Golang游乐场中的代码段

我的问题是,我如何在Golang中获得一个正则表达式,该正则表达式将匹配该行中的 nodegroup 和所有 nodes ,例如lba050 lbhou002 lblon003.节点的数量将在0到许多之间变化.

My question is, how do I get a Regex in Golang that would match the nodegroup and all the nodes that maybe in the line, e.g lba050 lbhou002 lblon003. The amount of nodes will vary, from 0 - as many.

推荐答案

如果要捕获组名和所有可能的节点名,则应使用其他正则表达式模式.这个应该一口气捕获所有的对象.无需使用命名的捕获组,但如果需要,可以.

If you want to capture the group name and all possible node names, you should work with a different regex pattern. This one should capture all of them in one go. No need to work with named capture groups but you can if you want to.

hostGroupExp := regexp.MustCompile(`host_group_([[:alnum:]]+)|([[:alnum:]]+) `)

hostGroupLine := "host_group_lbnorth = ( lba050 lbhou002 lblon003 )"
hostGroupMatch := hostGroupExp.FindAllStringSubmatch(hostGroupLine, -1)

fmt.Printf("GroupName: %s\n", hostGroupMatch[0][1])
for i := 1; i < len(hostGroupMatch); i++ {
    fmt.Printf("  Node: %s\n", hostGroupMatch[i][2])
}

​​游乐场

您还可以按照 awk 的方式进行工作:使用regexp表达式将行拆分为标记并打印所需的标记.当然,行的布局应与示例中的行布局相同.

You can also work the way awk would do the parsing: use a regexp expression to split the lines in tokens and print the tokens you need. Of course the line layout should be the same as the one given in your example.

package main

import (
    "fmt"
    "regexp"
)

func printGroupName(tokens []string) {
    fmt.Printf("GroupName: %s\n", tokens[2])
    for i := 5; i < len(tokens)-1; i++ {
        fmt.Printf("  Node: %s\n", tokens[i])
    }
}

func main() {

    // regexp line splitter (either _ or space)
    r := regexp.MustCompile(`_| `)

    // lines to parse
    hostGroupLines := []string{
        "host_group_lbnorth = ( lba050 lbhou002 lblon003 )",
        "host_group_web = ( web44 web125 )",
        "host_group_web = ( web44 )",
        "host_group_lbnorth = ( )",
    }

    // split lines on regexp splitter and print result
    for _, line := range hostGroupLines {
        hostGroupMatch := r.Split(line, -1)
        printGroupName(hostGroupMatch)
    }

}

游乐场

这篇关于如何使用正则表达式捕获“多个"重复组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆