从curl的输出中提取模式 [英] Extract a pattern from the output of curl
问题描述
我想在命令行上使用curl来抓取网址,将其转换为模式,并返回与该模式匹配的网址列表。
我遇到了模式的贪婪方面的问题,似乎不能通过它。
curl http://www.reddit.com/r/pics/ | grep -ioEhttp://imgur\.com/.+(jpg | jpeg | gif | png)
所以,从url中获取数据,这会返回一堆html,这可能需要一些linebreaks以某种方式替换,无论正则表达式可以返回多个模式在一行。该模式很简单,任何匹配的字符串...
- 以 http://imgur.com/
- 有AZ AZ 0-9(也许是其他人),目前为止,5个字符长,8应该永远覆盖它,如果我想限制模式的方面,我不
- 结束于.grraphic_file_format_extention(jpg,jpeg,gif,png)
关于它,在那个url,使用默认设置,我通常应该得到一套好的图像。
感谢所有人!
我们不会反对在同一页面使用RSS感觉网址
编辑
感谢您的快速回答,我的最终命令是:
$ curl -s http://www.reddit.com/r/pics/ | grep -ioEhttp:\ / \ / imgur \.com \ /。{1,10} \。(jpg | jpeg | gif | png)
解决方案尝试:
http:\ / \ / imgur \.com \ /。{5,8} \。(jpg | jpeg | gif | png)
I would like to use curl, on the command line, to grab a url, pipe it to a pattern, and return a list of urls that match that pattern.
I am running into problems with greedy aspects of the pattern, and can not seem to get past it. Any help on this would be apprecaited.
curl http://www.reddit.com/r/pics/ | grep -ioE "http://imgur\.com/.+(jpg|jpeg|gif|png)"
So, grab the data from the url, which returns a mess of html, which may need some linebreaks somehow replaced in, onless the regex can return more than one pattern in a single line. The patter is pretty simple, any string that matches...
- starts with http://imgur.com/
- has A-Z a-z 0-9 (maybe some others) and is so far, 5 chars long, 8 should cover it forever if I wanted to limit that aspect of the patter, which I don't
- ends in a .grraphic_file_format_extention (jpg, jpeg, gif, png)
Thats about it, at that url, with default settings, I should generally get back a good set of images. I would not be objectionable to using the RSS feel url for the same page, it may be easier to parse actually.
Thanks everyone!
Edit Thanks for a quick answer, my final command is now:
$curl -s http://www.reddit.com/r/pics/ | grep -ioE "http:\/\/imgur\.com\/.{1,10}\.(jpg|jpeg|gif|png)"
Try:
http:\/\/imgur\.com\/.{5,8}\.(jpg|jpeg|gif|png)
这篇关于从curl的输出中提取模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!