获取某种链接的 href 属性和文本 [英] Getting the href attribute and text of certain kind of links
问题描述
这四个链接中:
<img border="0" src="imagenes/flech.gif" width="6" height="8">
<a href="escuchar-baladas-de-Albano_Y_Romina_Power.html">Albano Y Romina Power</a><br>
<img border="0" src="imagenes/flech.gif" width="6" height="8">
<a href="escuchar-baladas-de-Armando_Manzanero.html">Armando Manzanero</a><br>
<a name="inicio21" href="musica-Merengue-de-Banda_Cuisillos.html">
<img border="0" src="imagenes/flech.gif" width="6" height="8">Banda Cuisillos</a><br>
<a href="Musica-Baladas-Alternativas.html">Baladas Alternativas</a><br>
我正在尝试首先捕获三个链接的 href 值和文本,而忽略了第四个链接,换句话说,我正在尝试获取:
I'm trying to capture the href value and the text of the link of the three first, leaving out the fourth link, in other words i'm trying to get this:
escuchar-baladas-de-Albano_Y_Romina_Power.html Albano Y Romina Power
escuchar-baladas-de-Armando_Manzanero.html Armando Manzanero
musica-Merengue-de-Banda_Cuisillos.html Banda Cuisillos
我试图充分利用三个第一个具有 imagenes/flech.gif
的事实,这样就忽略了第四个,imagenes/flech.gif
的顺序不同.这里是我尝试解决它的方法,我得到了 href 但包括第四个.
I was trying to make the most of the fact that the three first have imagenes/flech.gif
and that way leave out the fourth, the thing that imagenes/flech.gif
isn't in the same order. Here is my attempt to solve it where i get up to the href but include the fourth.
感谢您的帮助
推荐答案
你应该使用 html 解析器而不是正则表达式,试试这个:
You should use an html parser and not a regex, try this:
<?php
$html = <<< EOF
<img border="0" src="imagenes/flech.gif" width="6" height="8">
<a href="escuchar-baladas-de-Albano_Y_Romina_Power.html">Albano Y Romina Power</a><br>
<img border="0" src="imagenes/flech.gif" width="6" height="8">
<a href="escuchar-baladas-de-Armando_Manzanero.html">Armando Manzanero</a><br>
<a name="inicio21" href="musica-Merengue-de-Banda_Cuisillos.html">
<img border="0" src="imagenes/flech.gif" width="6" height="8">Banda Cuisillos</a><br>
<a href="Musica-Baladas-Alternativas.html">Baladas Alternativas</a><br>
EOF;
$dom = new DOMDocument();
@$dom->loadHTML($html);
# Iterate over all the <a> tags
foreach($dom->getElementsByTagName('a') as $link) {
$url = $link->getAttribute('href');
$text = preg_replace('/[\r\n]/sm', '', $link->nodeValue); // remove line breaks
//if doesn't contain the banned words...
if (!preg_match('/(Baladas Alternativas|another text to filter)/sm', $text)) {
echo $url ." ".$text. "\n";
}
}
?>
资源
http://htmlparsing.com/php.html
这篇关于获取某种链接的 href 属性和文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!