获取某种链接的 href 属性和文本 [英] Getting the href attribute and text of certain kind of links

查看：28 发布时间：2021/9/23 20:37:13 php html regex

本文介绍了获取某种链接的 href 属性和文本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这四个链接中:

<img border="0" src="imagenes/flech.gif" width="6" height="8">

<a href="escuchar-baladas-de-Albano_Y_Romina_Power.html">Albano Y Romina Power</a><br>
<img border="0" src="imagenes/flech.gif" width="6" height="8">

<a href="escuchar-baladas-de-Armando_Manzanero.html">Armando Manzanero</a><br>

<a name="inicio21" href="musica-Merengue-de-Banda_Cuisillos.html">
<img border="0" src="imagenes/flech.gif" width="6" height="8">Banda Cuisillos</a><br>

<a href="Musica-Baladas-Alternativas.html">Baladas Alternativas</a><br>

我正在尝试首先捕获三个链接的 href 值和文本，而忽略了第四个链接，换句话说，我正在尝试获取:

I'm trying to capture the href value and the text of the link of the three first, leaving out the fourth link, in other words i'm trying to get this:

escuchar-baladas-de-Albano_Y_Romina_Power.html    Albano Y Romina Power
escuchar-baladas-de-Armando_Manzanero.html    Armando Manzanero
musica-Merengue-de-Banda_Cuisillos.html    Banda Cuisillos

我试图充分利用三个第一个具有 imagenes/flech.gif 的事实，这样就忽略了第四个，imagenes/flech.gif 的顺序不同.这里是我尝试解决它的方法，我得到了 href 但包括第四个.

I was trying to make the most of the fact that the three first have imagenes/flech.gif and that way leave out the fourth, the thing that imagenes/flech.gif isn't in the same order. Here is my attempt to solve it where i get up to the href but include the fourth.

感谢您的帮助

推荐答案

你应该使用 html 解析器而不是正则表达式，试试这个:

You should use an html parser and not a regex, try this:

<?php

$html = <<< EOF
<img border="0" src="imagenes/flech.gif" width="6" height="8">

<a href="escuchar-baladas-de-Albano_Y_Romina_Power.html">Albano Y Romina Power</a><br>
<img border="0" src="imagenes/flech.gif" width="6" height="8">

<a href="escuchar-baladas-de-Armando_Manzanero.html">Armando Manzanero</a><br>

<a name="inicio21" href="musica-Merengue-de-Banda_Cuisillos.html">
<img border="0" src="imagenes/flech.gif" width="6" height="8">Banda Cuisillos</a><br>

<a href="Musica-Baladas-Alternativas.html">Baladas Alternativas</a><br>
EOF;


$dom = new DOMDocument();
@$dom->loadHTML($html);

# Iterate over all the <a> tags
foreach($dom->getElementsByTagName('a') as $link) {

    $url =  $link->getAttribute('href');
    $text = preg_replace('/[\r\n]/sm', '', $link->nodeValue); // remove line breaks

    //if doesn't contain the banned words...
    if (!preg_match('/(Baladas Alternativas|another text to filter)/sm', $text)) {
        echo $url ." ".$text. "\n";
    } 

}
?>

演示
http://ideone.com/5QX83x

资源
http://htmlparsing.com/php.html

这篇关于获取某种链接的 href 属性和文本的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

获取某种链接的 href 属性和文本 [英] Getting the href attribute and text of certain kind of links

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

获取某种链接的 href 属性和文本 [英] Getting the href attribute and text of certain kind of links

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭