php cURL。 preg_match,从xhtml提取文本 [英] php cURL. preg_match , extract text from xhtml

查看:115
本文介绍了php cURL。 preg_match,从xhtml提取文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用php cURL和preg_match从下面的html页面/链接中提取价格。基本上,我期望此代码输出4,550,但由于某些原因,我得到

I'm trying to extract the price from the bellow html page/link using php cURL and preg_match . Basically I'm expecting for this code to output 4,550 but for some reasons I get

 Notice: Undefined offset: 1 in C:\wamp\www\test.php on line 22

我认为该模式是正确的,因为如果我将html本身放在变量中并转义,则可以! 。
同样,如果我输出(echo $ result;),它会正确显示从foxtons网站上抓取的html,因此我无法弄清为什么整个事情都不起作用。我需要做这个工作,如果您能告诉我为什么会生成该通知以及为什么我当前的脚本不起作用,我将不胜感激。

I think that the pattern is correct because if I put the html itself in a variable and escape the "" it works ! . Also if I output (echo $result;) it displays the html properly grabbed from foxtons website so I just can't figure it out why the whole thing doesn't work . I need to make this work and also I would appreciate if you would tell me why is that notice generated and why my current script doesn't work.


$url = "http://www.foxtons.co.uk/search?bedrooms_from=0&property_id=727717";
$ch = curl_init($url);

curl_setopt($ ch,CURLOPT_HEADER,0);
curl_setopt($ ch,CURLOPT_RETURNTRANSFER,1);
$ result = curl_exec($ ch);
curl_exec($ ch);
curl_close($ ch);
$ result2 = str_replace('','\',$ result);

curl_setopt($ch, CURLOPT_HEADER, 0); curl_setopt($ch,CURLOPT_RETURNTRANSFER, 1); $result = curl_exec($ch); curl_exec($ch); curl_close($ch); $result2 = str_replace('"', '\"', $result);

$ tagname1 =);< / script>
;
$ tagname2 =< / noscript>每月
< / a>;

$tagname1= ");</script> "; $tagname2= "</noscript> per month</a>";

推荐答案

我稍微重写了脚本,以说明不只1个< noscript>在页面上。您需要使用preg_match_all来查找所有匹配项,而不仅仅是在第一个匹配项处停止。

I rewrote the script a bit to account for more than 1 <noscript> on the page. You needed to use preg_match_all which will look for all the matches not just stop at the first one.



$url = "http://www.foxtons.co.uk/search?bedrooms_from=0&property_id=727717";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch,CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec($ch);
curl_exec($ch);
curl_close($ch);

preg_match_all("/<noscript>(.*)<\/noscript>/", $result, $matches);
print_r($matches);

输出



Array
(
    [0] => Array
        (
            [0] => £1,050
            [1] => 4,550
        )

    [1] => Array
        (
            [0] => £1,050
            [1] => 4,550
        )

)

这个在我的盒子上就可以了-让我知道它是否对您有用

I tried this on my box and it worked - let me know if it worked for you

这篇关于php cURL。 preg_match,从xhtml提取文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆