如何在没有HTML包装器的情况下保存DOMDocument的HTML? [英] How to saveHTML of DOMDocument without HTML wrapper?

查看:97
本文介绍了如何在没有HTML包装器的情况下保存DOMDocument的HTML?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是下面的函数,我正在努力输出DOMDocument,而没有在输出结果之前附加XML,HTML, body p 标签包装器内容.建议的修复方法:

I'm the function below, I'm struggling to output the DOMDocument without it appending the XML, HTML, body and p tag wrappers before the output of the content. The suggested fix:

$postarray['post_content'] = $d->saveXML($d->getElementsByTagName('p')->item(0));

仅当内容中没有块级元素时才起作用.但是,这样做时,如下面的示例中带有h1元素的示例一样,saveXML的结果输出将被截断为...

Only works when the content has no block level elements inside it. However, when it does, as in the example below with the h1 element, the resulting output from saveXML is truncated to...

< p>如果您喜欢</p>

<p>If you like</p>

我已经指出此帖子是一种可能的解决方法,但我不明白如何将其实施到此解决方案中(请参阅下面的注释尝试).

I've been pointed to this post as a possible workaround, but I can't understand how to implement it into this solution (see commented out attempts below).

有什么建议吗?

function rseo_decorate_keyword($postarray) {
    global $post;
    $keyword = "Jasmine Tea"
    $content = "If you like <h1>jasmine tea</h1> you will really like it with Jasmine Tea flavors. This is the last ocurrence of the phrase jasmine tea within the content. If there are other instances of the keyword jasmine tea within the text what happens to jasmine tea."
    $d = new DOMDocument();
    @$d->loadHTML($content);
    $x = new DOMXpath($d);
    $count = $x->evaluate("count(//text()[contains(translate(., 'ABCDEFGHJIKLMNOPQRSTUVWXYZ', 'abcdefghjiklmnopqrstuvwxyz'), '$keyword') and (ancestor::b or ancestor::strong)])");
    if ($count > 0) return $postarray;
    $nodes = $x->query("//text()[contains(translate(., 'ABCDEFGHJIKLMNOPQRSTUVWXYZ', 'abcdefghjiklmnopqrstuvwxyz'), '$keyword') and not(ancestor::h1) and not(ancestor::h2) and not(ancestor::h3) and not(ancestor::h4) and not(ancestor::h5) and not(ancestor::h6) and not(ancestor::b) and not(ancestor::strong)]");
    if ($nodes && $nodes->length) {
        $node = $nodes->item(0);
        // Split just before the keyword
        $keynode = $node->splitText(strpos($node->textContent, $keyword));
        // Split after the keyword
        $node->nextSibling->splitText(strlen($keyword));
        // Replace keyword with <b>keyword</b>
        $replacement = $d->createElement('strong', $keynode->textContent);
        $keynode->parentNode->replaceChild($replacement, $keynode);
    }
$postarray['post_content'] = $d->saveXML($d->getElementsByTagName('p')->item(0));
//  $postarray['post_content'] = $d->saveXML($d->getElementsByTagName('body')->item(1));
//  $postarray['post_content'] = $d->saveXML($d->getElementsByTagName('body')->childNodes);
return $postarray;
}

推荐答案

所有这些答案现在都是错误,因为从PHP 5.4和Libxml 2.6开始,

All of these answers are now wrong, because as of PHP 5.4 and Libxml 2.6 loadHTML now has a $option parameter which instructs Libxml about how it should parse the content.

因此,如果我们使用这些选项加载HTML

Therefore, if we load the HTML with these options

$html->loadHTML($content, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

执行saveHTML()时将没有doctype,没有<html><body>.

when doing saveHTML() there will be no doctype, no <html>, and no <body>.

LIBXML_HTML_NOIMPLIED关闭隐式html/body元素的自动添加 LIBXML_HTML_NODEFDTD防止在找不到默认文档类型时添加默认文档类型.

LIBXML_HTML_NOIMPLIED turns off the automatic adding of implied html/body elements LIBXML_HTML_NODEFDTD prevents a default doctype being added when one is not found.

有关Libxml参数的完整文档,请参见此处

Full documentation about Libxml parameters is here

(请注意,loadHTML文档说需要Libxml 2.6,但是LIBXML_HTML_NODEFDTD仅在Libxml 2.7.8中可用,而LIBXML_HTML_NOIMPLIED在Libxml 2.7.7中可用)

(Note that loadHTML docs say that Libxml 2.6 is needed, but LIBXML_HTML_NODEFDTD is only available in Libxml 2.7.8 and LIBXML_HTML_NOIMPLIED is available in Libxml 2.7.7)

这篇关于如何在没有HTML包装器的情况下保存DOMDocument的HTML?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆