我无法在搜索页面中获取特定的URL [英] I can't grab specific URL in search page

查看:83
本文介绍了我无法在搜索页面中获取特定的URL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我进入房地产网站,并按城市名称搜索.之后,我想获取大阪市的建筑物URL.在这里http://brillia.com/search/?area=27999 有四个.

I enter the estate website and searched by name of the city. After that I want to grab Osaka City building URL. In here http://brillia.com/search/?area=27999 There are four of those. 

我正在使用该链接来获取URL.

And I m using that link to grab URL.

$allDivs = $parser->getElementsByTagName('div');
    foreach ($allDivs as $div) {
        if ($div->getAttribute('class') == 'boxInfomation') {
            $allLinks = $div->getElementsByTagName('a');
            foreach ($allLinks as $a) {
                $linkler[] = $a->getAttribute('href');
            }
        }
    }

但是我不能抓住那些.实际上,我不仅抓取了大阪城市页面的URL,还抓取了所有这些URL.当我尝试查看大阪页面网站的源代码时.它显示http://brillia.com/search/这就是为什么我要抓住所有其他链接...

But I cant grab those. Actually I grabbed not just osaka city pages URL actually grabbed all of it. When I try to see the source the osaka page site. It shows http://brillia.com/search/ Thats why I m grabbing all other links...

但是如何在这里仅捕获URL-> http://brillia.com/search/?area=27999

But how can I grab just URLs in here -> http://brillia.com/search/?area=27999

有什么主意吗?谢谢你.

Any idea? Thank you.

推荐答案

解析器依靠libxml提取元素,但是该页面大量使用html5,省略了某些关闭标签,等等,而这并不是严格的xml,因此,通过猜测在哪里关闭丢失的标签,返回错误的结果来纠正错误"是很困难的.

The parser relies on libxml to extract elements but that page is using html5 heavily, ommiting certain close tags, etc and that isn't really strict xml, so it's struggling to "correct mistakes" by guessing where to close missing tags, returning wrong results.

您需要具有HTML5DOMDocument之类的html5支持的解析器,该解析器应扩展DOMDocument并且应该具有几乎相同的接口.

You need a parser with html5 support like HTML5DOMDocument that extends DOMDocument and should have mostly the same interface.

这篇关于我无法在搜索页面中获取特定的URL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆