解析URL的网站 [英] Parse Website for URLs

查看：92 发布时间：2018/6/13 17:24:31 php html parsing html-parsing

本文介绍了解析URL的网站的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

只是想知道是否有人可以帮助我进一步以下。我想解析这个网站上的URL：http：//www.directorycritic.com/free-directory-list.html？pg = 1& sort = pr

我有以下代码：

 <？PHP 
 $ url =http：//www.directorycritic。 COM /自由目录-list.html PG = 1&安培;排序= PR; 
 $ input = @file_get_contents（$ url）或死（无法访问文件：$ url）; 
 $ regexp =< a \s [^>] * href =（\??）（[^ \>] *？）\\1 [^> ] *><（。*）; \ / A>中; 
 if（preg_match_all（/ $ regexp / siU，$ input，$ matches））{
 // $ matches [2] =链接地址数组
 // $ matches [3 ] =链接文本数组 - 包括HTML代码
} 
？>

目前没有任何操作，我需要做的是将表格中的所有网址所有16页，并会真正感谢一些帮助，如何修改上述做到这一点，并输出到一个文本文件的URL。 HTML Dom Parser

  $ html = file_get_html（'http://www.example.com/'）; 
 
 //查找所有链接
 $ links = array（）; 
 foreach（$ html-> find（'a'）as $ element）
 $ links [] = $ element-> href;

现在links数组包含给定页面的所有URL，您可以使用这些URL进一步解析。 p>

使用正则表达式解析HTML并不是一个好主意。以下是一些相关的帖子：

stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454\">RegEx匹配除XHTML自包含标签之外的开放标签

编辑：

一些其他HTML分析工具戈登的评论如下：

phpQuery

Zend_Dom

FluentDom

Just wondering if someone can help me further with the following. I want to parse the URL on this website:http://www.directorycritic.com/free-directory-list.html?pg=1&sort=pr

I have the following code:
<?PHP $url = "http://www.directorycritic.com/free-directory-list.html?pg=1&sort=pr"; $input = @file_get_contents($url) or die("Could not access file: $url"); $regexp = "<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>"; if(preg_match_all("/$regexp/siU", $input, $matches)) { // $matches[2] = array of link addresses // $matches[3] = array of link text - including HTML code } ?>
Which does nothing at present and what I need this to do is scrap all the URL in the table for all 16 pages and would really appreciate some help with how to amend the above to do that and output URL into a text file.
解决方案
Use HTML Dom Parser
$html = file_get_html('http://www.example.com/'); // Find all links $links = array(); foreach($html->find('a') as $element) $links[] = $element->href;
Now links array contains all URLs of given page and you can use these URLs to parse further.

Parsing HTML with regular expressions is not a good idea. Here are some related posts:

Using regular expressions to parse HTML: why not?

RegEx match open tags except XHTML self-contained tags

EDIT:

Some Other HTML Parsing tools as described by Gordon in comments below:

phpQuery

Zend_Dom

QueryPath

FluentDom

这篇关于解析URL的网站的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

解析URL的网站 [英] Parse Website for URLs

问题描述

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

解析URL的网站 [英] Parse Website for URLs

问题描述

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭