在解析页面内容时删除DocDocument警告 [英] Removing DocDocument warning while parsing page content
本文介绍了在解析页面内容时删除DocDocument警告的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
这样可以正常工作,但是在读取url给出的内容时会出现一些错误。如何清除此警告?
<?php
$ url ='http://stackoverflow.com/问题/ 12097352 /如何-可以-I-解析动态化内容从-A-网页;
$ doc = new DOMDocument();
$ doc-> loadHTMLFile($ url);
$ xpath = new DOMXPath($ doc);
foreach($ xpath-> query(// script)as $ script){
$ script-> parentNode-> removeChild($ script);
}
$ textContent = $ doc-> textContent; //从DOMNode继承
echo $ textContent;
?>
警告:
code> content-from-a-web-page,line:255 in /opt/lampp/htdocs/FB/ec2/test.php on line 13
警告:DOMDocument :: loadHTMLFile ():htmlParseEntityRef:expecting';'in http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page,line:255 in / opt / lampp /htdocs/FB/ec2/test.php第13行
警告:DOMDocument :: loadHTMLFile():htmlParseEntityRef:expecting';'in http://stackoverflow.com/questions/12097352/how -can-i-parse-dynamic-content-from-a-web-page,line:273 in /opt/lampp/htdocs/FB/ec2/test.php in line 13
警告: DOMDocument :: loadHTMLFile():htmlParseEntityRef:expecting';'in http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page,行:273 in /opt/lampp/htdocs/FB/ec2/test.php第13行
警告:DOMDocument :: loadHTMLFile():htmlParseEntityRef:expecting';'in http://stackoverflow.com/question s / 12097352 / how-can-i-parse-dynamic-content-from-a-web-page,line:412 in /opt/lampp/htdocs/FB/ec2/test.php on line 13
警告:DOMDocument :: loadHTMLFile():htmlParseEntityRef:expecting';'in http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page ,line:412在/opt/lampp/htdocs/FB/ec2/test.php第13行
警告:DOMDocument :: loadHTMLFile():htmlParseEntityRef:expecting';'in http:// stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page,行:551在/opt/lampp/htdocs/FB/ec2/test.php第13行
警告:DOMDocument :: loadHTMLFile():htmlParseEntityRef:expecting';'in http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a -web-page,line:551 in /opt/lampp/htdocs/FB/ec2/test.php on line 13
警告:DOMDocument :: loadHTMLFile():ID显示名称已经定义在http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content -from-a-web-page,line:731 in /opt/lampp/htdocs/FB/ec2/test.php on line 13
解决方案
您可以使用 libxml_use_internal_errors()
并执行以下操作:
libxml_use_internal_errors(真);
$ doc-> loadHTMLFile($ url);
libxml_clear_errors();
正如Peehaa在下面的评论中所指出的,重置错误状态是个好主意。你可以这样做:
$ errors = libxml_use_internal_errors(true); // store
$ doc-> loadHTMLFile($ url);
libxml_clear_errors();
libxml_use_internal_errors($ errors); //重新设置到以前的状态
这是它的工作原理:
-
libxml_use_internal_errors()
告诉libxml在内部处理错误和警告,并且不应将其输出到浏览器。还将当前的错误状态存储在变量 - 中,然后使用
loadHTML()
方法 - 使用
libxml_clear_errors
- 恢复旧状态的错误值
I am trying to parse the content of any url. Which should not content any html code. This works fine, but gives bunch of error while reading the content on url given. How to remove this warning?
<?php
$url= 'http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page';
$doc = new DOMDocument();
$doc->loadHTMLFile($url);
$xpath = new DOMXPath($doc);
foreach($xpath->query("//script") as $script) {
$script->parentNode->removeChild($script);
}
$textContent = $doc->textContent; //inherited from DOMNode
echo $textContent;
?>
Warnings:
content-from-a-web-page, line: 255 in /opt/lampp/htdocs/FB/ec2/test.php on line 13
Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 255 in /opt/lampp/htdocs/FB/ec2/test.php on line 13
Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 273 in /opt/lampp/htdocs/FB/ec2/test.php on line 13
Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 273 in /opt/lampp/htdocs/FB/ec2/test.php on line 13
Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 412 in /opt/lampp/htdocs/FB/ec2/test.php on line 13
Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 412 in /opt/lampp/htdocs/FB/ec2/test.php on line 13
Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 551 in /opt/lampp/htdocs/FB/ec2/test.php on line 13
Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 551 in /opt/lampp/htdocs/FB/ec2/test.php on line 13
Warning: DOMDocument::loadHTMLFile(): ID display-name already defined in http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 731 in /opt/lampp/htdocs/FB/ec2/test.php on line 13
解决方案
You can use libxml_use_internal_errors()
and do the following:
libxml_use_internal_errors(true);
$doc->loadHTMLFile($url);
libxml_clear_errors();
As Peehaa noted in the comments below, it's a good idea to reset the state of errors. You can do it as below:
$errors = libxml_use_internal_errors(true); //store
$doc->loadHTMLFile($url);
libxml_clear_errors();
libxml_use_internal_errors($errors); //reset back to previous state
Here's how it works:
libxml_use_internal_errors()
tells libxml to handle the errors and warnings internally, and that it shouldn't be outputted to the browser. Also store the current state of errors in a variable- then you load the HTML file with
loadHTML()
method - clear the error buffer with
libxml_clear_errors
- restores the old state of error values
这篇关于在解析页面内容时删除DocDocument警告的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文