从磁力链接获取标题的正则表达式:“未终止的地址正则表达式" [英] Regex to get title from magnet link: "unterminated address regex"

查看:43
本文介绍了从磁力链接获取标题的正则表达式:“未终止的地址正则表达式"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试创建一个简单的 shell 脚本,以从磁力链接获取标题并将其写入 .out 文件.

I am trying to create a simple shell script to get the title from a magnet link and write it to a .out file.

如果我在 regex101.com 上尝试下面的正则表达式,就会成功.看截图.

If I try out on regex101.com the below regex, there is a hit. See screenshot.

&dn=(.*?)&

(https://imge.to/i/Fw26r)

问题是我一直收到以下错误:未终止的地址正则表达式".

The problem is that I get the following error all the time: "unterminated address regex".

我尝试了不同的选择,但结果相同:

I tried different options, yet same result:

u@d:~/Documents/tmp $ sed -e '\&dn=(.*?)\&$' magnet.txt >> magnet.out
sed: -e expression #1, char 13: unterminated address regex
u@d:~/Documents/tmp $ sed -E '\&dn=(.*?)\&' magnet.txt >> magnet.out
sed: -e expression #1, char 12: unterminated address regex
u@d:~/Documents/tmp $ cat magnet.txt | sed -e '\&dn=(.*?)\&i'
sed: -e expression #1, char 13: unterminated address regex
u@d:~/Documents/tmp $ sed -e '&dn=(.*?)&' magnet.txt >> magnet.out
sed: -e expression #1, char 1: unknown command: `&'

你能指出我正确的方向吗?

Can you please point me out in the right direction?

推荐答案

结束定界符前的反斜杠有误.第一个反斜杠是必要的,表示我想使用与默认斜杠不同的分隔符",但第二个反斜杠表示这是一个文字和号,而不是结束分隔符"(因此 sed 期望regex 继续,并在它从未看到结束分隔符时抱怨).

The backslash before the closing delimiter is wrong. The first backslash is necessary to say "I want to use a different delimiter than the default slash" but the second backslash says "this is a literal ampersand, not the closing delimiter" (and so sed expects the regex to continue, and complains when it never sees the closing delimiter).

仅仅一个地址表达式会导致 sed 完整地打印匹配的行(第二次,没有 -n,因为默认行为是打印所有行),并且您似乎希望&符号成为正则表达式的一部分,而不是正则表达式周围的分隔符.如果目的是在&符号之间提取字符串,则需要类似

Just an address expression causes sed to print matching lines in their entirety (a second time, without -n, as the default behavior is to print all lines), and it seems that you want the ampersand to be part of the regex, not the delimiter around the regex. If the intent is to extract a string between ampersands, you want something like

sed -n 's/.*&dn=\([^&]*\)&.*/\1/p' magnet.txt

也就是说,用提取的括号表达式替换整行,然后打印该行.

that is, replace the entire line with just the extracted parenthesized expression, then print that line.

sed 是一种脚本语言.除了斜杠(以及冒号和等号)之外的大多数命令都是单字母字母;s 命令 - 这是许多人遇到的唯一命令 - 在文本中执行替换.

sed is a scripting language. Most commands other than slash (and colon and equals) are single-letter alphabetics; the s command - which is the only command many people ever encounter - performs substitutions in text.

重申一下,您的原始脚本看起来像

Just to reiterate, your original script looks like

sed '/dn=.*?/'

使用自定义 & 分隔符代替 /.这将查找包含 dn= 后跟任何内容,后跟文字问号的行.默认操作是打印匹配的行,所以 sed 会打印这些行两次(所有其他行只打印一次).

with a custom & delimiter instead of /. This looks for lines containing dn= followed by anything, followed by a literal question mark. The default action is to print matching lines, so sed would print those lines twice (and all other lines only once).

非贪婪量词 .*? 是一个 Perl 扩展,在我熟悉的任何 sed 方言中都不支持;但准确表达您想要的实际上更好(即使您确实可以使用非贪婪量词).

The non-greedy quantifier .*? is a Perl extension which is not supported in any sed dialect I am familiar with; but expressing exactly what you want is actually better (even when you do have access to non-greedy quantifiers).

这篇关于从磁力链接获取标题的正则表达式:“未终止的地址正则表达式"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆