gnu sed 在模式匹配后删除部分行与特殊字符 [英] gnu sed remove portion of line after pattern match with special characters
问题描述
目标是使用 sed 仅返回 FF 扩展 Mining Blocker 的每一行的 url,该扩展程序的正则表达式行使用此格式:
The goal is to use sed to return only the url from each line of FF extension Mining Blocker which uses this format for its regex lines:
{"baseurl":"*://002.0x1f4b0.com/*", "suburl":"*://*/002.0x1f4b0.com/*"},
{"baseurl":"*://003.0x1f4b0.com/*", "suburl":"*://*/003.0x1f4b0.com/*"},
结果应该是:
002.0x1f4b0.com
003.0x1f4b0.com
一种方法是保留 suburl":"*://*/
之后的所有内容,然后删除每个出现的 /*"},
One way would be to keep everything after suburl":"*://*/
then remove each occurrence of /*"},
我找到了 https://unix.stackexchange.com/questions/24140/return-only-the-portion-of-a-line-after-a-matching-pattern 但特殊字符是个问题.
I found https://unix.stackexchange.com/questions/24140/return-only-the-portion-of-a-line-after-a-matching-pattern but the special characters are a problem.
这行不通:
sed -n -e s@^.*suburl":"*://*/@@g hosts
有人可以告诉我如何标记字符串中的 2 个星号,以便正则表达式将它们视为文字字符,而不是通配符吗?
Would someone please show me how to mark the 2 asterisks in the string so they are seen by regex as literal characters, not wildcards?
sed -n 's#.*://\*/\([^/]\+\)/.*#\1#p' hosts
不幸的是,它不起作用.
doesn't work, unfortunately.
关于字符替换,感谢您将我引向参考文献.
regarding character substitution, thanks for directing me to the references.
我将搜索到的字符串缩减为//*/并使用如下 ASCII 字符代码:
I reduced the searched-for string to //*/ and used ASCII character codes like this:
sed -n -e s@^.*\d047\d047\d042\d047@@g hosts
不幸的是,这并没有输出对行的任何更改.
Unfortunately, that didn't output any changes to the lines.
我的假设是:
^.*something
指定一行中直到并包括最后一次出现的something"的所有内容
^.*something
specifies everything up to and including the last occurrence of "something" in a line
sed -n -e s@search@@g
删除(替换为空)一行内的搜索"
sed -n -e s@search@@g
deletes (replace with nothing) "search" within a line
所以,这一行:
sed -n -e s@^.*\d047\d047\d042\d047@@g 主机
sed -n -e s@^.*\d047\d047\d042\d047@@g hosts
应该在每一行中输出 //*/
之后的所有内容......除非它没有.
Should output everything after //*/
in each line...except it doesn't.
那一行有什么不对的地方?
What is incorrect with that line?
关于删除所有内容,包括第一次/第一次操作之后和之后,是的,这也是我们想要的.
Regarding deleting everything including and after the first / AFTER that first operation, yes, that's wanted too.
推荐答案
这可能对你有用(GNU sed):
This might work for you (GNU sed):
sed -n 's#.*://\*/\([^/]\+\)/.*#\1#p' file
贪婪地匹配(匹配的最长字符串)直到 ://*/
的所有字符,后跟一组字符(将被称为 \1
>) 与 /
不匹配,后跟该行的其余部分并将其替换为 \1
组.
Match greedily (the longest string that matches) all characters up to ://*/
, followed by a group of characters (which will be referred to as \1
) that do not match a /
, followed by the rest of the line and replace it by the group \1
.
注意sed 替换分隔符是任意的,在这种情况下选择为 #
以便使模式匹配 /
更容易.此外,替换命令左侧的字符 *
可能被解释为元字符,表示前一个字符/组的零个或多个,因此引用 \*
以免错误地发挥此属性.最后,在执行完所有 sed 命令后,使用选项 -n
关闭模式空间中所有内容的常规打印.替换命令上的 p
标志会在成功替换后打印模式空间,因此输出中只会出现 URL 或什么都不出现.
N.B. the sed substitution delimiters are arbitrary, in this case chosen to be #
so as make pattern matching /
easier. Also the character *
on the left hand side of the substitution command may be interpreted as a meta character that means zero or more of the previous character/group and so is quoted \*
so that it does not mistakenly exert this property. Finally, using the option -n
toggles off the usual printing of every thing in the pattern space after all the sed commands have been executed. The p
flag on the substitution command, prints the pattern space following a successful substitution, therefore only URL's will appear in the output or nothing.
这篇关于gnu sed 在模式匹配后删除部分行与特殊字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!