正则表达式:如何在文本中间捕获可选组? [英] Regex: How to capture optional group in middle of text?

查看:33
本文介绍了正则表达式:如何在文本中间捕获可选组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在努力使用正则表达式来捕获一些可选文本 - 它位于某些文件名的中间,但不是全部.最大的问题似乎是我的可选组没有锚定(我在它之前和之后使用 .*? ).我广泛浏览了过去关于 SO 的答案,但其中大多数只能在固定在一侧或另一侧(即在行尾)时才能捕获可选文本.

I'm struggling with using regex to capture some optional text - it's in the middle of some filenames, but not all. The big problem appears to be that my optional group is not anchored (I am using .*? before and after it). I looked extensively through past answers on SO, but most of them were able to capture optional text only if it was anchored on one side or the other (ie. at the end of the line).

给定一个文件名列表,我最多可以捕获 5 项内容:

Given a list of filenames, there are up to 5 things I'm trying to capture:

  • NAME:始终存在,文件名中的第一件事
  • NUMBER:始终存在,文件名中的第二项(可能在括号中)
  • 形状:始终存在
  • 颜色:有时出现,但可以出现在形状之前或之后
  • VERSION:有时出现,总是最后出现(但后面通常会有垃圾文本)

源文本:

name 1111 color shape
name 2222 shape color
name 3333 shape
name (4444) color shape version
name.5555.JUNK.color.JUNK.shape.JUNK.version.JUNK

预期结果:

name (1111) color shape
name (2222) color shape
name (3333) shape
name (4444) color shape version
name (5555) color shape version

但是当我使用这个正则表达式时:

But when I use this regex:

FIND: (.*?).\(?(\d{4}).*?(color)?.*?(shape).*?(color)?.*?(version)?.*
REPLACE: $1 ($2) $3$5 $4 $6

我明白了:

name (1111)  shape
name (2222)  shape
name (3333)  shape
name (4444)  shape
name (5555)  shape

如您所见,通过将 (color)(version) 捕获组设为可选,它根本不会选择它们.(另外,如果有任何方法可以去除多余的空格,那也很好.)

As you can see, by making the (color) and (version) capture groups optional, it's not picking them up at all. (Also, if there's any way to remove the extra whitespace, that would be great too.)

顺便说一下,我在每个捕获组之间使用 .*? 因为我了解到它是 .* 的懒惰"版本(不是贪婪") - 基本上,它试图匹配尽可能少而不是尽可能多.如果您是像我这样的正则表达式新手,请在此处了解更多信息:http://www.rexegg.com/regex-quantifiers.html#greedytrap

By the way, I'm using .*? in between each capture group because I learned it's the "lazy" version of .* (not "greedy") - basically, it tries to match as little as possible instead of as much as possible. More info on that here if you're a regex newbie like me: http://www.rexegg.com/regex-quantifiers.html#greedytrap

无论如何,我在这里遗漏了什么很明显的东西吗?或者有没有办法通过正则表达式捕获一些可选的文本?

Anyways, is there something really obvious I'm missing here? Or is there no way to capture some optional text via regex?

附注.这是我在在线工具上预加载的数据:http://regexr.com/3cs84 - 我理解正则表达式可能因语言/平台而有所不同,所以如果它有任何不同,我最终想在 AppleScript 中使用这个正则表达式来重命名文件和文件夹(可能通过调用终端命令,因为我认为 AppleScript 本身不支持正则表达式).

PS. Here's my data pre-loaded on an online tool to play with: http://regexr.com/3cs84 - I understand that regex can differ a little by langugage/platform, so if it makes any difference, I ultimately want to use this regex in an AppleScript for renaming files and folders (likely by invoking a terminal command since I don't think AppleScript natively supports regex).

推荐答案

尝试将 .*?(foo) 部分放在括号中,例如 (.*?(foo))code> 以便 ? 运算符将 .*? 部分考虑在内.

Try putting the .*?(foo) parts in parentheses like (.*?(foo)) so that the ? operator will take the .*? parts into consideration.

更正语法(.*?).\(?(\d{4})(.*?(color))?.*?(shape)(.*?(color))?(.*?(version))?.* (示例)

这篇关于正则表达式:如何在文本中间捕获可选组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆