从简单HTML DOM中排除不需要的html-PHP [英] Exclude non wanted html from Simple Html Dom - PHP

查看：84 发布时间：2021/5/15 18:40:01 php parsing web-scraping html-parsing simple-html-dom

本文介绍了从简单HTML DOM中排除不需要的html-PHP的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在将HTML Simple Dom Parser与PHP结合使用，以从网站获取标题，描述和图像.我面临的问题是我收到了我不想要的html以及如何排除这些html标签.下面是解释.

I am using HTML Simple Dom Parser with PHP to get title, description and images from a website. The issue I am facing is I am getting the html which I dont want and how to exclude those html tags. Below is the explanation.

这是一个正在解析的示例html结构.

Here is a sample html structure which is being parsed.

<div id="product_description">
<p> Some text</p>
<ul>
<li>value 1</li>
<li>value 2</li>
<li>value 3</li>
</ul>

// the div I dont want
<div id="comments">
<h1> Some Text </h1>
</div>

</div>

我正在使用下面的php脚本进行解析，

I am using below php script to parse,

foreach($html->find('div#product_description') as $description)
{
    echo $description->outertext ;
    echo "<br>";
}

上面的代码解析id为"product_description"的div中的所有内容.我想用ID评论"排除div的内容.我试图将其转换为字符串，然后使用substr排除最后一个字符，但那不起作用.不知道为什么.关于如何执行此操作的任何想法?任何允许我从已解析的html中排除div的方法都可以使用.谢谢

The above code parses everything inside the div with id "product_description". What I want to exclude the div with Id "comments". I tried to convert this into string and then used substr to exclude the last character but thats not working. Dont know why. Any idea about how can I do this? Any approach that will allow me to exclude the div from parsed html will work. Thanks

推荐答案

您可以通过设置其 outertext ='':

$src =<<<src
<div id="product_description">
    <p> Some text</p>
    <ul>
        <li>value 1</li>
        <li>value 2</li>
        <li>value 3</li>
    </ul>

    <!-- the div I don't want -->                                                                                                                                        
    <div id="comments">
        <h1> Some Text </h1>
    </div>

</div>
src;

$html = str_get_html($src);

foreach($html->find('#product_description') as $description)
{
    $comments = $description->find('#comments', 0); 
    $comments->outertext = ''; 
    print $description->outertext ;
}

这篇关于从简单HTML DOM中排除不需要的html-PHP的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从简单HTML DOM中排除不需要的html-PHP [英] Exclude non wanted html from Simple Html Dom - PHP

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

从简单HTML DOM中排除不需要的html-PHP [英] Exclude non wanted html from Simple Html Dom - PHP

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭