在解析页面内容时删除DocDocument警告 [英] Removing DocDocument warning while parsing page content

查看:106
本文介绍了在解析页面内容时删除DocDocument警告的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试解析任何url的内容。哪个不应该含有任何html代码。
这样可以正常工作,但是在读取url给出的内容时会出现一些错误。如何清除此警告?

 <?php 
$ url ='http://stackoverflow.com/问题/ 12097352 /如何-可以-I-解析动态化内容从-A-网页;
$ doc = new DOMDocument();
$ doc-> loadHTMLFile($ url);
$ xpath = new DOMXPath($ doc);
foreach($ xpath-> query(// script)as $ script){
$ script-> parentNode-> removeChild($ script);
}
$ textContent = $ doc-> textContent; //从DOMNode继承
echo $ textContent;
?>

警告:

 code> content-from-a-web-page,line:255 in /opt/lampp/htdocs/FB/ec2/test.php on line 13 

警告:DOMDocument :: loadHTMLFile ():htmlParseEntityRef:expecting';'in http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page,line:255 in / opt / lampp /htdocs/FB/ec2/test.php第13行

警告:DOMDocument :: loadHTMLFile():htmlParseEntityRef:expecting';'in http://stackoverflow.com/questions/12097352/how -can-i-parse-dynamic-content-from-a-web-page,line:273 in /opt/lampp/htdocs/FB/ec2/test.php in line 13

警告: DOMDocument :: loadHTMLFile():htmlParseEntityRef:expecting';'in http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page,行:273 in /opt/lampp/htdocs/FB/ec2/test.php第13行

警告:DOMDocument :: loadHTMLFile():htmlParseEntityRef:expecting';'in http://stackoverflow.com/question s / 12097352 / how-can-i-parse-dynamic-content-from-a-web-page,line:412 in /opt/lampp/htdocs/FB/ec2/test.php on line 13

警告:DOMDocument :: loadHTMLFile():htmlParseEntityRef:expecting';'in http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page ,line:412在/opt/lampp/htdocs/FB/ec2/test.php第13行

警告:DOMDocument :: loadHTMLFile():htmlParseEntityRef:expecting';'in http:// stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page,行:551在/opt/lampp/htdocs/FB/ec2/test.php第13行

警告:DOMDocument :: loadHTMLFile():htmlParseEntityRef:expecting';'in http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a -web-page,line:551 in /opt/lampp/htdocs/FB/ec2/test.php on line 13

警告:DOMDocument :: loadHTMLFile():ID显示名称已经定义在http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content -from-a-web-page,line:731 in /opt/lampp/htdocs/FB/ec2/test.php on line 13


解决方案

您可以使用 libxml_use_internal_errors() 并执行以下操作:

  libxml_use_internal_errors(真); 
$ doc-> loadHTMLFile($ url);
libxml_clear_errors();

正如Peehaa在下面的评论中所指出的,重置错误状态是个好主意。你可以这样做:

  $ errors = libxml_use_internal_errors(true); // store 
$ doc-> loadHTMLFile($ url);
libxml_clear_errors();
libxml_use_internal_errors($ errors); //重新设置到以前的状态

这是它的工作原理:





演示!


I am trying to parse the content of any url. Which should not content any html code. This works fine, but gives bunch of error while reading the content on url given. How to remove this warning?

<?php
$url= 'http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page';
$doc = new DOMDocument();
$doc->loadHTMLFile($url);
$xpath = new DOMXPath($doc);
foreach($xpath->query("//script") as $script) {
    $script->parentNode->removeChild($script);
}
$textContent = $doc->textContent; //inherited from DOMNode
echo $textContent;
?>

Warnings:

content-from-a-web-page, line: 255 in /opt/lampp/htdocs/FB/ec2/test.php on line 13

Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 255 in /opt/lampp/htdocs/FB/ec2/test.php on line 13

Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 273 in /opt/lampp/htdocs/FB/ec2/test.php on line 13

Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 273 in /opt/lampp/htdocs/FB/ec2/test.php on line 13

Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 412 in /opt/lampp/htdocs/FB/ec2/test.php on line 13

Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 412 in /opt/lampp/htdocs/FB/ec2/test.php on line 13

Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 551 in /opt/lampp/htdocs/FB/ec2/test.php on line 13

Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 551 in /opt/lampp/htdocs/FB/ec2/test.php on line 13

Warning: DOMDocument::loadHTMLFile(): ID display-name already defined in http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 731 in /opt/lampp/htdocs/FB/ec2/test.php on line 13

解决方案

You can use libxml_use_internal_errors() and do the following:

libxml_use_internal_errors(true);
$doc->loadHTMLFile($url);
libxml_clear_errors();

As Peehaa noted in the comments below, it's a good idea to reset the state of errors. You can do it as below:

$errors = libxml_use_internal_errors(true); //store
$doc->loadHTMLFile($url);
libxml_clear_errors();
libxml_use_internal_errors($errors); //reset back to previous state

Here's how it works:

  • libxml_use_internal_errors() tells libxml to handle the errors and warnings internally, and that it shouldn't be outputted to the browser. Also store the current state of errors in a variable
  • then you load the HTML file with loadHTML() method
  • clear the error buffer with libxml_clear_errors
  • restores the old state of error values

Demo!

这篇关于在解析页面内容时删除DocDocument警告的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆