如果不完整,请删除HTML实体 [英] Remove HTML Entity if Incomplete
问题描述
我遇到一个问题,我最多显示从数据库中提取的一个字符串的400个字符,但是该字符串必须包含HTML实体.
I have an issue where I have displayed up to 400 characters of a string that is pulled from the database, however, this string is required to contain HTML Entities.
一个偶然的机会,客户端创建了一个字符串,使第400个字符恰好位于结束的P标签中间,从而杀死了该标签,并导致后面的其他代码错误.
By chance, the client has created the string to have the 400th character to sit right in the middle of a closing P tag, thus killing the tag, resulting in other errors for code after it.
我希望完全删除该结束的P标签,因为我在末尾附加了一个"...阅读更多"链接,如果将其附加到现有段落的话看起来会更干净.
I would prefer this closing P tag to be removed entirely as I have a "...read more" link attached to the end which would look cleaner if attached to the existing paragraph.
涵盖所有HTML实体问题的最佳方法是什么?是否有PHP函数可以自动关闭/删除任何错误的HTML标签?我不需要编码答案,只需一个方向即可.
What would be the best approach for this to cover all HTML Entity issues? Is there a PHP function that will automatically close off/remove any erroneous HTML tags? I don't need a coded answer, just a direction will help greatly.
谢谢.
推荐答案
这是使用DOMDocument的一种简单方法,虽然它并不完美,但可能会引起人们的兴趣:
Here's a simple way you can do it with DOMDocument, its not perfect but it may be of interest:
<?php
function html_tidy($src){
libxml_use_internal_errors(true);
$x = new DOMDocument;
$x->loadHTML('<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />'.$src);
$x->formatOutput = true;
$ret = preg_replace('~<(?:!DOCTYPE|/?(?:html|body|head))[^>]*>\s*~i', '', $x->saveHTML());
return trim(str_replace('<meta http-equiv="Content-Type" content="text/html;charset=utf-8">','',$ret));
}
$brokenHTML[] = "<p><span>This is some broken html</spa";
$brokenHTML[] = "<poken html</spa";
$brokenHTML[] = "<p><span>This is some broken html</spa</p>";
/*
<p><span>This is some broken html</span></p>
<poken html></poken>
<p><span>This is some broken html</span></p>
*/
foreach($brokenHTML as $test){
echo html_tidy($test);
}
?>
尽管要注意 Mike'Pomax'Kamermans
的评论.
这篇关于如果不完整,请删除HTML实体的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!