从curl的输出中提取模式 [英] Extract a pattern from the output of curl

查看:142
本文介绍了从curl的输出中提取模式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在命令行上使用curl来抓取网址,将其转换为模式,并返回与该模式匹配的网址列表。



我遇到了模式的贪婪方面的问题,似乎不能通过它。

  curl http://www.reddit.com/r/pics/ | grep -ioEhttp://imgur\.com/.+(jpg | jpeg | gif | png)

所以,从url中获取数据,这会返回一堆html,这可能需要一些linebreaks以某种方式替换,无论正则表达式可以返回多个模式在一行。该模式很简单,任何匹配的字符串...




  • http://imgur.com/

  • 有AZ AZ 0-9(也许是其他人),目前为止,5个字符长,8应该永远覆盖它,如果我想限制模式的方面,我不

  • 结束于.grraphic_file_format_extention(jpg,jpeg,gif,png)



关于它,在那个url,使用默认设置,我通常应该得到一套好的图像。



感谢所有人!


我们不会反对在同一页面使用RSS感觉网址

编辑
感谢您的快速回答,我的最终命令是:

  $ curl -s http://www.reddit.com/r/pics/ | grep -ioEhttp:\ / \ / imgur \.com \ /。{1,10} \。(jpg | jpeg | gif | png)


解决方案

尝试:

  http:\ / \ / imgur \.com \ /。{5,8} \。(jpg | jpeg | gif | png)


I would like to use curl, on the command line, to grab a url, pipe it to a pattern, and return a list of urls that match that pattern.

I am running into problems with greedy aspects of the pattern, and can not seem to get past it. Any help on this would be apprecaited.

curl http://www.reddit.com/r/pics/ | grep -ioE "http://imgur\.com/.+(jpg|jpeg|gif|png)"

So, grab the data from the url, which returns a mess of html, which may need some linebreaks somehow replaced in, onless the regex can return more than one pattern in a single line. The patter is pretty simple, any string that matches...

  • starts with http://imgur.com/
  • has A-Z a-z 0-9 (maybe some others) and is so far, 5 chars long, 8 should cover it forever if I wanted to limit that aspect of the patter, which I don't
  • ends in a .grraphic_file_format_extention (jpg, jpeg, gif, png)

Thats about it, at that url, with default settings, I should generally get back a good set of images. I would not be objectionable to using the RSS feel url for the same page, it may be easier to parse actually.

Thanks everyone!

Edit Thanks for a quick answer, my final command is now:

$curl -s http://www.reddit.com/r/pics/ | grep -ioE "http:\/\/imgur\.com\/.{1,10}\.(jpg|jpeg|gif|png)"

解决方案

Try:

http:\/\/imgur\.com\/.{5,8}\.(jpg|jpeg|gif|png)

这篇关于从curl的输出中提取模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆