使用正则表达式从页面中提取网址 [英] extracting urls from a page with regex

查看:68
本文介绍了使用正则表达式从页面中提取网址的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一段 php 可以从页面中提取所有网址:

I have this bit of php that extracts all urls from a page:

$regex = '/https?\:\/\/[^\" ]+/i';
preg_match_all($regex, $page, $matches);

$links = ($matches[0]);

foreach($links as $link)
{
  echo $link.'<br />';
}

在这种情况下,我将如何修改它以不提取所有链接,而是提取与某个部分 url 匹配的链接,在这种情况下:`http://www.site.com/artist/' 我正在寻找的结果是一个列表,如:

How would I modify it to extract not all links but just the ones that match a certain partial url, in this case: `http://www.site.com/artist/' where the result I am looking for is a list like:

http://www.site.com/artist/Nirvana/

http://www.site.com/artist/Jayz/

等等.

推荐答案

通过将分隔符更改为感叹号,不需要额外的转义字符.\s 字符类匹配空白字符,如制表符、空格和换行符.我还确保我们涵盖了两种类型的引用(以防页面不同).

By changing the delimiters to exclamation points, there was no need for the extra escape characters. The \s character class matches whitespace characters like tabs, spaces, and new lines. I'm also making sure we cover both types of quotes (in case the page varies).

$regex = '!https?://www.site.com/artist/[^\'"\s]+!i';
preg_match_all($regex, $page, $matches);

$links = ($matches[0]);

foreach($links as $link)
{
  echo $link.'<br />';
}

这篇关于使用正则表达式从页面中提取网址的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆