获取某种链接的 href 属性和文本 [英] Getting the href attribute and text of certain kind of links

查看:28
本文介绍了获取某种链接的 href 属性和文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这四个链接中:

<img border="0" src="imagenes/flech.gif" width="6" height="8">

<a href="escuchar-baladas-de-Albano_Y_Romina_Power.html">Albano Y Romina Power</a><br>
<img border="0" src="imagenes/flech.gif" width="6" height="8">

<a href="escuchar-baladas-de-Armando_Manzanero.html">Armando Manzanero</a><br>

<a name="inicio21" href="musica-Merengue-de-Banda_Cuisillos.html">
<img border="0" src="imagenes/flech.gif" width="6" height="8">Banda Cuisillos</a><br>

<a href="Musica-Baladas-Alternativas.html">Baladas Alternativas</a><br>

我正在尝试首先捕获三个链接的 href 值和文本,而忽略了第四个链接,换句话说,我正在尝试获取:

I'm trying to capture the href value and the text of the link of the three first, leaving out the fourth link, in other words i'm trying to get this:

escuchar-baladas-de-Albano_Y_Romina_Power.html    Albano Y Romina Power
escuchar-baladas-de-Armando_Manzanero.html    Armando Manzanero
musica-Merengue-de-Banda_Cuisillos.html    Banda Cuisillos

我试图充分利用三个第一个具有 imagenes/flech.gif 的事实,这样就忽略了第四个,imagenes/flech.gif 的顺序不同.这里是我尝试解决它的方法,我得到了 href 但包括第四个.

I was trying to make the most of the fact that the three first have imagenes/flech.gif and that way leave out the fourth, the thing that imagenes/flech.gif isn't in the same order. Here is my attempt to solve it where i get up to the href but include the fourth.

感谢您的帮助

推荐答案

你应该使用 html 解析器而不是正则表达式,试试这个:

You should use an html parser and not a regex, try this:

<?php

$html = <<< EOF
<img border="0" src="imagenes/flech.gif" width="6" height="8">

<a href="escuchar-baladas-de-Albano_Y_Romina_Power.html">Albano Y Romina Power</a><br>
<img border="0" src="imagenes/flech.gif" width="6" height="8">

<a href="escuchar-baladas-de-Armando_Manzanero.html">Armando Manzanero</a><br>

<a name="inicio21" href="musica-Merengue-de-Banda_Cuisillos.html">
<img border="0" src="imagenes/flech.gif" width="6" height="8">Banda Cuisillos</a><br>

<a href="Musica-Baladas-Alternativas.html">Baladas Alternativas</a><br>
EOF;


$dom = new DOMDocument();
@$dom->loadHTML($html);

# Iterate over all the <a> tags
foreach($dom->getElementsByTagName('a') as $link) {

    $url =  $link->getAttribute('href');
    $text = preg_replace('/[\r\n]/sm', '', $link->nodeValue); // remove line breaks

    //if doesn't contain the banned words...
    if (!preg_match('/(Baladas Alternativas|another text to filter)/sm', $text)) {
        echo $url ." ".$text. "\n";
    } 

}
?>

演示
http://ideone.com/5QX83x

资源
http://htmlparsing.com/php.html

这篇关于获取某种链接的 href 属性和文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆