解析URL的网站 [英] Parse Website for URLs
问题描述
只是想知道是否有人可以帮助我进一步以下。我想解析这个网站上的URL:http://www.directorycritic.com/free-directory-list.html?pg = 1& sort = pr
我有以下代码:
<?PHP
$ url =http://www.directorycritic。 COM /自由目录-list.html PG = 1&安培;排序= PR;
$ input = @file_get_contents($ url)或死(无法访问文件:$ url);
$ regexp =< a \s [^>] * href =(\??)([^ \>] *?)\\1 [^> ] *><(。*); \ / A>中;
if(preg_match_all(/ $ regexp / siU,$ input,$ matches)){
// $ matches [2] =链接地址数组
// $ matches [3 ] =链接文本数组 - 包括HTML代码
}
?>
目前没有任何操作,我需要做的是将表格中的所有网址所有16页,并会真正感谢一些帮助,如何修改上述做到这一点,并输出到一个文本文件的URL。 HTML Dom Parser
$ html = file_get_html('http://www.example.com/');
//查找所有链接
$ links = array();
foreach($ html-> find('a')as $ element)
$ links [] = $ element-> href;
现在links数组包含给定页面的所有URL,您可以使用这些URL进一步解析。 p>
使用正则表达式解析HTML并不是一个好主意。以下是一些相关的帖子: 编辑: 一些其他HTML分析工具戈登的评论如下: Just wondering if someone can help me further with the following. I want to parse the URL on this website:http://www.directorycritic.com/free-directory-list.html?pg=1&sort=pr I have the following code:
<?PHP
$url = "http://www.directorycritic.com/free-directory-list.html?pg=1&sort=pr";
$input = @file_get_contents($url) or die("Could not access file: $url");
$regexp = "<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>";
if(preg_match_all("/$regexp/siU", $input, $matches)) {
// $matches[2] = array of link addresses
// $matches[3] = array of link text - including HTML code
}
?>
Which does nothing at present and what I need this to do is scrap all the URL in the table for all 16 pages and would really appreciate some help with how to amend the above to do that and output URL into a text file.
Use HTML Dom Parser
$html = file_get_html('http://www.example.com/');
// Find all links
$links = array();
foreach($html->find('a') as $element)
$links[] = $element->href;
Now links array contains all URLs of given page and you can use these URLs to parse further.
Parsing HTML with regular expressions is not a good idea. Here are some related posts:
- Using regular expressions to parse HTML: why not?
- RegEx match open tags except XHTML self-contained tags
EDIT:
Some Other HTML Parsing tools as described by Gordon in comments below:
这篇关于解析URL的网站的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!