使用Ruby获取网页的所有链接 [英] Getting all links of a webpage using Ruby

查看：49 发布时间：2021/6/8 18:44:45 ruby regex string nokogiri

本文介绍了使用Ruby获取网页的所有链接的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用 Ruby 检索网页的每个外部链接.我正在使用 String.scan 和这个正则表达式:

I'm trying to retrieve every external link of a webpage using Ruby. I'm using String.scan with this regex:

/href="https?:[^"]*|href='https?:[^']*/i

然后，我可以使用 gsub 删除 href 部分:

Then, I can use gsub to remove the href part:

str.gsub(/href=['"]/)

这很好用，但我不确定它在性能方面是否有效.这可以使用还是我应该使用更具体的解析器(例如 nokogiri)?哪种方式更好?

This works fine, but I'm not sure if it's efficient in terms of performance. Is this OK to use or I should work with a more specific parser (nokogiri, for example)? Which way is better?

谢谢！

推荐答案

为什么不在模式中使用组?例如

why you dont use groups in your pattern? e.g.

/http[s]?:\/\/(.+)/i

所以第一组将是您搜索的链接.

so the first group will already be the link you searched for.

这篇关于使用Ruby获取网页的所有链接的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用Ruby获取网页的所有链接 [英] Getting all links of a webpage using Ruby

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用Ruby获取网页的所有链接 [英] Getting all links of a webpage using Ruby

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭