Simple-html-dom跳过属性 [英] Simple-html-dom skips attributes

查看:99
本文介绍了Simple-html-dom跳过属性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试解析Google play的html页面,并获取有关应用程序的一些信息. Simple-html-dom可以完美地工作,但是如果页面包含没有空格的代码,则它将完全忽略属性.例如,我有html代码:

I am trying to parse html page of Google play and getting some information about apps. Simple-html-dom works perfect, but if page contains code without spaces, it completely ingnores attributes. For instance, I have html code:

<div class="doc-banner-icon"><img itemprop="image"src="https://lh5.ggpht.com/iRd4LyD13y5hdAkpGRSb0PWwFrfU8qfswGNY2wWYw9z9hcyYfhU9uVbmhJ1uqU7vbfw=w124"/></div>

如您所见,在imagesrc之间没有任何空格,因此simple-html-dom会忽略src属性,仅返回<img itemprop="image">.如果我增加空间,它会完美地工作.要获取此属性,请使用以下代码:

As you can see, there is no any spaces between image and src, so simple-html-dom ignores src attribute and returns only <img itemprop="image">. If I add space, it works perfectly. To get this attribute I use the following code:

foreach($html->find('div.doc-banner-icon') as $e){          
        foreach($e->find('img') as $i){
            $bannerIcon = $i->src;              
        }
}

我的问题是如何更改这个漂亮的库以获取此div的完整内部文本?

My question is how to change this beautiful library to get full inner text of this div?

推荐答案

我只是创建一个在内容上添加必要空格的函数:

I just create function which adds neccessary spaces to content:

function placeNeccessarySpaces($contents){
$quotes = 0; $flag=false;
$newContents = '';
for($i=0; $i<strlen($contents); $i++){
    $newContents.=$contents[$i];
    if($contents[$i]=='"') $quotes++; 
    if($quotes%2==0){
        if($contents[$i+1]!== ' ' && $flag==true) {             
            $newContents.=' ';
            $flag=false;
        }           
    }
    else $flag=true;        
}   
return $newContents;
}

,然后在file_get_contents功能之后使用它.所以:

And then use it after file_get_contents function. So:

$contents = file_get_contents($url, $use_include_path, $context, $offset);
$contents = placeNeccessarySpaces($contents);

希望对其他人有帮助.

这篇关于Simple-html-dom跳过属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆