Ruby-将多行制表符分隔的字符串解析为数组 [英] Ruby - Parse a multi-line tab-delimited string into an array of arrays

查看:199
本文介绍了Ruby-将多行制表符分隔的字符串解析为数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很抱歉是否已经在Ruby设置中要求过-我在发布之前检查了一下,但老实说这已经是漫长的一天了,如果我错过了明显的事情,我事先表示歉意!

My apologies if this has already been asked in a Ruby setting--I checked before posting but to be perfectly honest it has been a very long day and If I am missing the obvious, I apologize in advance!

我有以下字符串,其中包含安装在系统上的软件包的列表,由于某种原因,我很难解析它.我知道在Ruby中必须有一种直接的方法,但我会继续简短.

I have the following string which contains a list of software packages installed on a system and for some reason I am having the hardest time parsing it. I know there has got to be a straight forward means of doing this in Ruby but I keep coming up short.

我想将以下多行,制表符分隔的字符串解析为一个数组数组,然后我可以使用each_with_index遍历每个数组元素,并将HTML代码吐出到我的Rails应用程序中.

I would like to parse the below multi-line, tab-delimited, string into an array of arrays where I can then loop through each array element with an each_with_index and spit out the HTML code into my Rails app.

str = 'Product and/or Software Full Name 5242     [version 6.5.24]     [Installed on: 12/31/2015]

 Product and/or Software Full Name 5426     [version 22.4]     [Installed on: 06/11/2013]

 Product and/or Software Full Name 2451     [version 1.63]     [Installed on: 12/17/2015]

 Product and/or Software Full Name 5225     [version 43.22.51]     [Installed on: 11/15/2011]

 Product and/or Software Full Name 2420     [version 43.51-r2]     [Installed on: 12/31/2015]'

最终结果将是具有5个元素的数组的数组,如下所示:

The end result would be an array of arrays with 5 elements like so:

[[产品和/或软件全名5245"],[版本6.5.24"], [安装日期:2015年12月31日"],[产品和/或软件全名5426"],[版本22.4"],[安装日期:2013年6月11日"],[产品和/或软件名称/或软件全名2451],["版本1.63],["安装日期:2015年12月17日]]

[["Product and/or Software Full Name 5245"],["version 6.5.24"], ["Installed on: 12/31/2015"],["Product and/or Software Full Name 5426"],["version 22.4"],["Installed on: 06/11/2013"],["Product and/or Software Full Name 2451"],["version 1.63"],["Installed on: 12/17/2015"]]

请注意:为简便起见,仅显示5个阵列中的3个

Please Note: Only 3 of 5 arrays are shown for brevity

我希望从'version'和'Installed on'中删除括号,但是如果不能轻易将其归纳为答案,我可以分别使用gsub做到这一点.

I would prefer to strip out the brackets from both 'version' and 'Installed on' but I can do that with gsub separately if that cannot easily be baked into an answer.

最后一件事是,多行字符串中的每一行都不会始终有一个"Installed on"条目,因此答案需要在适当的情况下加以考虑.

Last thing is that there won't always be an 'Installed on' entry for every line in the multiline string, so the answer will need to take that into account as applicable.

推荐答案

这应该做到:

expr = /(.+?)\s+\[([^\]]+)\](?:\s+\[([^\]]+)\])?/
str.scan(expr)

表达式实际上比看起来复杂得多.它看起来很复杂,因为我们要匹配必须转义的方括号,并且还要使用字符类,这些字符类在正则表达式语言中包含在方括号中.总之,这会增加很多噪音.

The expression is actually a lot less complex than it looks. It looks complex because we're matching square brackets, which have to be escaped, and also using character classes, which are enclosed in square brackets in the regular expression language. All together it adds a lot of noise.

这里是分开的:

expr = /
  (.+?)  # Capture #1: Any characters (non-greedy)

  \s+    # Whitespace
  \[     # Literal '['
    (      # Capture #2:
      [^\]]+   # One or more characters that aren't ']'
    )
  \]     # Literal ']'

  (?:    # Non-capturing group
    \s+    # Whitespace
    \[     # Literal '['
      ([^\]]+) # Capture #3 (same as #2)
    \]     # Literal ']'
  )?     # Preceding group is optional
/x

如您所见,第三部分与第二部分相同,除了它在非捕获组中,后跟一个?以使其成为可选.

As you can see, the third part is identical to the second part, except it's in a non-capture group followed by a ? to make it optional.

值得注意的是,如果例如产品名称包含方括号.如果可能的话,一种可能的解决方案是在比赛中包含versionInstalled文本,例如:

It's worth noting that this may fail if e.g. the product name contains square brackets. If that's a possibility, one potential solution is include the version and Installed text in the match, e.g.:

expr = /(.+?)\s+\[(version [^\]]+)\](?:\s+\[(Installed [^\]]+)\])?/

P.S.这是使用String#split代替的解决方案:

P.S. Here's a solution that uses String#split instead:

expr = /\]?\s+\[|\]$/
res = str.each_line.map {|ln| ln.strip.split(expr) }
        .reject {|arr| arr.empty? }

如果产品名称中带有方括号,则此处可能的解决方法是指定零件之间的最小间距,例如:

If you have brackets in your product names, a possible workaround here is to specify a minimum number of spaces between parts, e.g.:

expr = /\]?\s{3,}\[|\]$/

...当然,这取决于产品名称的连续空格不能超过三个.

...which of course depends on product names never having more than three consecutive spaces.

这篇关于Ruby-将多行制表符分隔的字符串解析为数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆