使用正则表达式从页面中提取网址 [英] extracting urls from a page with regex
问题描述
我有一段 php
可以从页面中提取所有网址:
I have this bit of php
that extracts all urls from a page:
$regex = '/https?\:\/\/[^\" ]+/i';
preg_match_all($regex, $page, $matches);
$links = ($matches[0]);
foreach($links as $link)
{
echo $link.'<br />';
}
在这种情况下,我将如何修改它以不提取所有链接,而是提取与某个部分 url 匹配的链接,在这种情况下:`http://www.site.com/artist/' 我正在寻找的结果是一个列表,如:
How would I modify it to extract not all links but just the ones that match a certain partial url, in this case: `http://www.site.com/artist/' where the result I am looking for is a list like:
http://www.site.com/artist/Nirvana/
http://www.site.com/artist/Jayz/
等等.
推荐答案
通过将分隔符更改为感叹号,不需要额外的转义字符.\s
字符类匹配空白字符,如制表符、空格和换行符.我还确保我们涵盖了两种类型的引用(以防页面不同).
By changing the delimiters to exclamation points, there was no need for the extra escape characters. The \s
character class matches whitespace characters like tabs, spaces, and new lines. I'm also making sure we cover both types of quotes (in case the page varies).
$regex = '!https?://www.site.com/artist/[^\'"\s]+!i';
preg_match_all($regex, $page, $matches);
$links = ($matches[0]);
foreach($links as $link)
{
echo $link.'<br />';
}
这篇关于使用正则表达式从页面中提取网址的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!