gnu sed 在模式匹配后删除部分行与特殊字符 [英] gnu sed remove portion of line after pattern match with special characters

查看:60
本文介绍了gnu sed 在模式匹配后删除部分行与特殊字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目标是使用 sed 仅返回 FF 扩展 Mining Blocker 的每一行的 url,该扩展程序的正则表达式行使用此格式:

The goal is to use sed to return only the url from each line of FF extension Mining Blocker which uses this format for its regex lines:

{"baseurl":"*://002.0x1f4b0.com/*", "suburl":"*://*/002.0x1f4b0.com/*"},
{"baseurl":"*://003.0x1f4b0.com/*", "suburl":"*://*/003.0x1f4b0.com/*"},

结果应该是:

002.0x1f4b0.com
003.0x1f4b0.com

一种方法是保留 suburl":"*://*/ 之后的所有内容,然后删除每个出现的 /*"},

One way would be to keep everything after suburl":"*://*/ then remove each occurrence of /*"},

我找到了 https://unix.stackexchange.com/questions/24140/return-only-the-portion-of-a-line-after-a-matching-pattern 但特殊字符是个问题.

I found https://unix.stackexchange.com/questions/24140/return-only-the-portion-of-a-line-after-a-matching-pattern but the special characters are a problem.

这行不通:

sed -n -e s@^.*suburl":"*://*/@@g hosts

有人可以告诉我如何标记字符串中的 2 个星号,以便正则表达式将它们视为文字字符,而不是通配符吗?

Would someone please show me how to mark the 2 asterisks in the string so they are seen by regex as literal characters, not wildcards?

sed -n 's#.*://\*/\([^/]\+\)/.*#\1#p' hosts

不幸的是,它不起作用.

doesn't work, unfortunately.

关于字符替换,感谢您将我引向参考文献.

regarding character substitution, thanks for directing me to the references.

我将搜索到的字符串缩减为//*/并使用如下 ASCII 字符代码:

I reduced the searched-for string to //*/ and used ASCII character codes like this:

sed -n -e s@^.*\d047\d047\d042\d047@@g hosts

不幸的是,这并没有输出对行的任何更改.

Unfortunately, that didn't output any changes to the lines.

我的假设是:

^.*something 指定一行中直到并包括最后一次出现的something"的所有内容

^.*something specifies everything up to and including the last occurrence of "something" in a line

sed -n -e s@search@@g 删除(替换为空)一行内的搜索"

sed -n -e s@search@@g deletes (replace with nothing) "search" within a line

所以,这一行:

sed -n -e s@^.*\d047\d047\d042\d047@@g 主机

sed -n -e s@^.*\d047\d047\d042\d047@@g hosts

应该在每一行中输出 //*/ 之后的所有内容......除非它没有.

Should output everything after //*/ in each line...except it doesn't.

那一行有什么不对的地方?

What is incorrect with that line?

关于删除所有内容,包括第一次/第一次操作之后和之后,是的,这也是我们想要的.

Regarding deleting everything including and after the first / AFTER that first operation, yes, that's wanted too.

推荐答案

这可能对你有用(GNU sed):

This might work for you (GNU sed):

sed -n 's#.*://\*/\([^/]\+\)/.*#\1#p' file

贪婪地匹配(匹配的最长字符串)直到 ://*/ 的所有字符,后跟一组字符(将被称为 \1>) 与 / 不匹配,后跟该行的其余部分并将其替换为 \1 组.

Match greedily (the longest string that matches) all characters up to ://*/, followed by a group of characters (which will be referred to as \1) that do not match a /, followed by the rest of the line and replace it by the group \1.

注意sed 替换分隔符是任意的,在这种情况下选择为 # 以便使模式匹配 / 更容易.此外,替换命令左侧的字符 * 可能被解释为元字符,表示前一个字符/组的零个或多个,因此引用 \* 以免错误地发挥此属性.最后,在执行完所有 sed 命令后,使用选项 -n 关闭模式空间中所有内容的常规打印.替换命令上的 p 标志会在成功替换后打印模式空间,因此输出中只会出现 URL 或什么都不出现.

N.B. the sed substitution delimiters are arbitrary, in this case chosen to be # so as make pattern matching / easier. Also the character * on the left hand side of the substitution command may be interpreted as a meta character that means zero or more of the previous character/group and so is quoted \* so that it does not mistakenly exert this property. Finally, using the option -n toggles off the usual printing of every thing in the pattern space after all the sed commands have been executed. The p flag on the substitution command, prints the pattern space following a successful substitution, therefore only URL's will appear in the output or nothing.

这篇关于gnu sed 在模式匹配后删除部分行与特殊字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆