BASH正则表达式匹配-在方括号中包含要匹配的字符列表中的方括号? [英] BASH regexp matching - including brackets in a bracketed list of characters to match against?

查看:80
本文介绍了BASH正则表达式匹配-在方括号中包含要匹配的字符列表中的方括号?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试做一个小的bash脚本,该脚本将清理一些我喜欢的电视节目的下载剧集的文件和文件夹名称.它们通常看起来像"[www.Speed.Cd]-Some.Show.S07E14.720p.HDTV.X264-SOMEONE",我基本上只是想删除该speedcd广告位.

I'm trying to do a tiny bash script that'll clean up the file and folder names of downloaded episodes of some tv shows I like. They often look like "[ www.Speed.Cd ] - Some.Show.S07E14.720p.HDTV.X264-SOMEONE", and I basically just want to strip out that speedcd advertising bit.

使用BASH中的regexp匹配很容易删除www.Speed.Cd,空格和破折号,但是为了我的生命,我不知道如何在要匹配的字符列表中包括方括号. [-[]不起作用,[-\ [],[-\\ [],[-\\\ []]或我要删除的括号前面的任意数量的转义字符都无效.

It's easy enough to remove www.Speed.Cd, spaces, and dashes using regexp matching in BASH, but for the life of me, I cannot figure out how to include the brackets in a list of characters to be matched against. [- [] doesn't work, neither does [- \[], [- \\[], [- \\\[], or any number of escape characters preceding the bracket I want to remove.

这是到目前为止我得到的:

Here's what I've got so far:

[[ "$newfile" =~ ^(.*)([- \[]*(www\.torrenting\.com|spastikustv|www\.speed\.cd|moviesp2p\.com)[- \]]*)(.*)$ ]] &&
    newfile="${BASH_REMATCH[1]}${BASH_REMATCH[4]}"

但是它在方括号中破了.

But it breaks on the brackets.

有什么想法吗?

TIA, 丹尼尔:)

TIA, Daniel :)

我可能应该注意,我正在使用"shopt -s nocasematch"以确保不区分大小写的匹配,以防万一您想知道:)

I should probably note that I'm using "shopt -s nocasematch" to ensure case insensitive matching, just in case you're wondering :)

感谢所有贡献者.我不确定100%哪个答案是正确的"答案,因为我的陈述有几个问题.实际上,最准确的答案只是对jw013发布的问题的评论,但当时我没有得到,因为我还不知道应该转义空格.我选择了aefxx,因为它基本上是一样的,但有解释:)本来我也想在ormaaj的答案上打上正确的答案,因为他发现我的表达有更多严重的问题.

EDIT 2: Thanks to all who contributed. I'm not 100% sure which answer was to be the "correct" one, as I had several problems with my statement. Actually, the most accurate answer was just a comment to my question posted by jw013, but I didn't get it at the time because I hadn't understood yet that spaces should be escaped. I've opted for aefxx's as that one basically says the same, but with explanations :) Would've liked to put a correct answer mark on ormaaj's answer, too, as he spotted more grave issues with my expression.

无论如何,我在上面使用的方法试图匹配并提取零件以保留并留下不需要的零件,这确实不是很优雅,并且无法捕获所有情况,甚至不是像有些. Show.S07E14.720p.HDTV.X264-SOMEONE-[www.Speed.Cd]".我改写了它,以匹配并仅提取不需要的部分,然后对原始字符串中的那些部分进行字符串替换,就像这样(如果有多个商标,则进行循环):

Anyway, the approach I was using above, trying to match and extract the parts to keep and leave behind the unwanted ones is really not very elegant, and won't catch all cases, not even something really simple like "Some.Show.S07E14.720p.HDTV.X264-SOMEONE - [ www.Speed.Cd ]". I've instead rewritten it to match and extract just the unwanted parts and then do string replacement of those on the original string, like so (loop is in case there's multiple brandings):

# Remove common torrent site brandings, including surrounding spaces, brackets, etc.:
while [[ "$newfile" =~ ([[\ {\(-]*(www\.)?(torrentday\.com|torrenting\.com|spastikustv|speed\.cd|moviesp2p\.com|publichd\.org|publichd|scenetime\.com|kingdom-release)[]\ }\)-]*) ]]; do
    newfile=${newfile//"${BASH_REMATCH[1]}"/}
done

推荐答案

好,这是我第一次听说=~运算符,但这是我通过反复试验发现的内容:

Ok, this is the first time I've heard of the =~ operator but nevertheless here's what I found by trial and error:

if [[ $newfile =~ ^(.*)([-[:space:][]*(what|ever)[][:space:]-]*)(.*)$ ]] 
                          ^^^^^^^^^^              ^^^^^^^^^^

看起来很奇怪,但实际上确实可以工作(刚刚测试过).

Looks strange but actually does work (just tested it).

编辑
引用Linux手册页regex(7):

EDIT
Quote from the Linux man pages regex(7):

要在列表中包含文字],请使其成为第一个字符(在可能的^之后).要包括文字-,使其成为范围的第一个或最后一个字符,或第二个端点.要将文字aq-aq用作范围的第一个端点,请将其括在"[."中.和.]"作为整理元素(请参见下文).除了这些以及使用aq [aq的某些组合(请参阅下一段)之外,所有其他特殊字符(包括aq \ aq)在方括号表达式中都失去其特殊意义.

To include a literal ] in the list, make it the first character (following a possible ^). To include a literal -, make it the first or last character, or the second endpoint of a range. To use a literal aq-aq as the first endpoint of a range, enclose it in "[." and ".]" to make it a collating element (see below). With the exception of these and some combinations using aq[aq (see next paragraphs), all other special characters, including aq\aq, lose their special significance within a bracket expression.

这篇关于BASH正则表达式匹配-在方括号中包含要匹配的字符列表中的方括号?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆