从简单HTML DOM中排除不需要的html-PHP [英] Exclude non wanted html from Simple Html Dom - PHP
问题描述
我正在将HTML Simple Dom Parser与PHP结合使用,以从网站获取标题,描述和图像.我面临的问题是我收到了我不想要的html以及如何排除这些html标签.下面是解释.
I am using HTML Simple Dom Parser with PHP to get title, description and images from a website. The issue I am facing is I am getting the html which I dont want and how to exclude those html tags. Below is the explanation.
这是一个正在解析的示例html结构.
Here is a sample html structure which is being parsed.
<div id="product_description">
<p> Some text</p>
<ul>
<li>value 1</li>
<li>value 2</li>
<li>value 3</li>
</ul>
// the div I dont want
<div id="comments">
<h1> Some Text </h1>
</div>
</div>
我正在使用下面的php脚本进行解析,
I am using below php script to parse,
foreach($html->find('div#product_description') as $description)
{
echo $description->outertext ;
echo "<br>";
}
上面的代码解析id为"product_description"的div中的所有内容.我想用ID评论"排除div的内容.我试图将其转换为字符串,然后使用substr排除最后一个字符,但那不起作用.不知道为什么.关于如何执行此操作的任何想法?任何允许我从已解析的html中排除div的方法都可以使用.谢谢
The above code parses everything inside the div with id "product_description". What I want to exclude the div with Id "comments". I tried to convert this into string and then used substr to exclude the last character but thats not working. Dont know why. Any idea about how can I do this? Any approach that will allow me to exclude the div from parsed html will work. Thanks
推荐答案
您可以通过设置其 outertext =''
:
$src =<<<src
<div id="product_description">
<p> Some text</p>
<ul>
<li>value 1</li>
<li>value 2</li>
<li>value 3</li>
</ul>
<!-- the div I don't want -->
<div id="comments">
<h1> Some Text </h1>
</div>
</div>
src;
$html = str_get_html($src);
foreach($html->find('#product_description') as $description)
{
$comments = $description->find('#comments', 0);
$comments->outertext = '';
print $description->outertext ;
}
这篇关于从简单HTML DOM中排除不需要的html-PHP的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!