我无法在搜索页面中获取特定的URL [英] I can't grab specific URL in search page
问题描述
我进入房地产网站,并按城市名称搜索.之后,我想获取大阪市的建筑物URL.在这里http://brillia.com/search/?area=27999
有四个.
I enter the estate website and searched by name of the city. After that I want to grab Osaka City building URL. In here http://brillia.com/search/?area=27999
There are four of those.
我正在使用该链接来获取URL.
And I m using that link to grab URL.
$allDivs = $parser->getElementsByTagName('div');
foreach ($allDivs as $div) {
if ($div->getAttribute('class') == 'boxInfomation') {
$allLinks = $div->getElementsByTagName('a');
foreach ($allLinks as $a) {
$linkler[] = $a->getAttribute('href');
}
}
}
但是我不能抓住那些.实际上,我不仅抓取了大阪城市页面的URL,还抓取了所有这些URL.当我尝试查看大阪页面网站的源代码时.它显示http://brillia.com/search/
这就是为什么我要抓住所有其他链接...
But I cant grab those. Actually I grabbed not just osaka city pages URL actually grabbed all of it. When I try to see the source the osaka page site. It shows http://brillia.com/search/
Thats why I m grabbing all other links...
但是如何在这里仅捕获URL-> http://brillia.com/search/?area=27999
But how can I grab just URLs in here -> http://brillia.com/search/?area=27999
有什么主意吗?谢谢你.
Any idea? Thank you.
推荐答案
解析器依靠libxml
提取元素,但是该页面大量使用html5,省略了某些关闭标签,等等,而这并不是严格的xml,因此,通过猜测在哪里关闭丢失的标签,返回错误的结果来纠正错误"是很困难的.
The parser relies on libxml
to extract elements but that page is using html5 heavily, ommiting certain close tags, etc and that isn't really strict xml, so it's struggling to "correct mistakes" by guessing where to close missing tags, returning wrong results.
您需要具有HTML5DOMDocument
之类的html5支持的解析器,该解析器应扩展DOMDocument
并且应该具有几乎相同的接口.
You need a parser with html5 support like HTML5DOMDocument
that extends DOMDocument
and should have mostly the same interface.
这篇关于我无法在搜索页面中获取特定的URL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!