(PHP5)使用PHP DOM或Regex从HTML提取标题标签和RSS提要地址 [英] (PHP5) Extracting a title tag and RSS feed address from HTML using PHP DOM or Regex
问题描述
我想从给定的网址获取标题标签和RSS Feed地址(如果有的话),但是我使用的方法到目前为止还没有工作。我已经设法通过使用preg_match和正则表达式获取标题标签,但我似乎无法获取RSS Feed地址。
($ webContent持有网站的HTML)
我已经复制了我的代码以供参考...
`//获取标题标签
preg_match('@(。*)@ i',$ webContent,$ titleTagArray);
//如果已经找到标题标签,将其分配给变量
if($ titleTagArray&& $ titleTagArray [3])
$ webTitle = $ titleTagArray [3];
//获取RSS或Atom订阅源地址
preg_match('@< link(。*)rel =alternate(。*)href =(。*) *)类型= 应用/ RSS + xml 的\s /> @i,$网络内容,$ feedAddrArray);
//如果找到Feed地址,则将其分配给变量
if($ feedAddrArray&& $ feedAddrArray [2])
$ webFeedAddr = $ feedAddrArray [2];`
我一直在阅读这里使用正则表达式不是最好的办法做到这一点希望有人可以给我一个手: - )
谢谢。
一种方法
$ dom = new DOMDocument; // init new DOMDocument
$ dom-> loadHTML($ html); //将HTML加载到
$ xpath = new DOMXPath($ dom); //创建一个新的XPath
$ nodes = $ xpath-> query('// title'); //查找文档
foreach中的所有标题元素($ nodes as $ node){//迭代找到的元素
echo $ node-> nodeValue; //输出标题文本
}
要获取所有链接标签的href属性,您将使用此XPath的application / rss + xml类型:
$ xpath-> query('// link [@类型= 应用/ RSS + xml 的] / @ HREF');
I'd like to get the title tag and RSS feed address (if there is one) from a given URL, but the method(s) I've used so far just aren't working at all. I've managed to get the title tag by using preg_match and a regular expression, but I can't seem to get anywhere with getting the RSS feed address.
($webContent holds the HTML of the website)
I've copied my code below for reference...
` // Get the title tag preg_match('@(.*)@i',$webContent,$titleTagArray);
// If the title tag has been found, assign it to a variable
if($titleTagArray && $titleTagArray[3])
$webTitle = $titleTagArray[3];
// Get the RSS or Atom feed address
preg_match('@<link(.*)rel="alternate"(.*)href="(.*)"(.*)type="application/rss+xml"\s/>@i',$webContent,$feedAddrArray);
// If the feed address has been found, assign it to a variable
if($feedAddrArray && $feedAddrArray[2])
$webFeedAddr = $feedAddrArray[2];`
I've been reading on here that using a regular expression isn't the best way to do this? Hopefully someone can give me a hand with this :-)
Thanks.
One approach
$dom = new DOMDocument; // init new DOMDocument
$dom->loadHTML($html); // load HTML into it
$xpath = new DOMXPath($dom); // create a new XPath
$nodes = $xpath->query('//title'); // Find all title elements in document
foreach($nodes as $node) { // Iterate over found elements
echo $node->nodeValue; // output title text
}
To get the href attribute of all link tags with a type of "application/rss+xml" you would use this XPath:
$xpath->query('//link[@type="application/rss+xml"]/@href');
这篇关于(PHP5)使用PHP DOM或Regex从HTML提取标题标签和RSS提要地址的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!