如何在没有 HTML 包装器的情况下保存 DOMDocument 的 HTML? [英] How to saveHTML of DOMDocument without HTML wrapper?

查看:19
本文介绍了如何在没有 HTML 包装器的情况下保存 DOMDocument 的 HTML?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是下面的函数,我正在努力输出 DOMDocument 而不在输出之前附加 XML、HTML、bodyp 标记包装器内容.建议的修复:

I'm the function below, I'm struggling to output the DOMDocument without it appending the XML, HTML, body and p tag wrappers before the output of the content. The suggested fix:

$postarray['post_content'] = $d->saveXML($d->getElementsByTagName('p')->item(0));

仅当内容中没有块级元素时才有效.但是,当它这样做时,如下面的带有 h1 元素的示例所示,来自 saveXML 的结果输出将被截断为...

Only works when the content has no block level elements inside it. However, when it does, as in the example below with the h1 element, the resulting output from saveXML is truncated to...

<p>如果你喜欢</p>

<p>If you like</p>

有人指出这篇文章是一种可能的解决方法,但我不明白如何将其实施到此解决方案中(请参阅下面注释掉的尝试).

I've been pointed to this post as a possible workaround, but I can't understand how to implement it into this solution (see commented out attempts below).

有什么建议吗?

function rseo_decorate_keyword($postarray) {
    global $post;
    $keyword = "Jasmine Tea"
    $content = "If you like <h1>jasmine tea</h1> you will really like it with Jasmine Tea flavors. This is the last ocurrence of the phrase jasmine tea within the content. If there are other instances of the keyword jasmine tea within the text what happens to jasmine tea."
    $d = new DOMDocument();
    @$d->loadHTML($content);
    $x = new DOMXpath($d);
    $count = $x->evaluate("count(//text()[contains(translate(., 'ABCDEFGHJIKLMNOPQRSTUVWXYZ', 'abcdefghjiklmnopqrstuvwxyz'), '$keyword') and (ancestor::b or ancestor::strong)])");
    if ($count > 0) return $postarray;
    $nodes = $x->query("//text()[contains(translate(., 'ABCDEFGHJIKLMNOPQRSTUVWXYZ', 'abcdefghjiklmnopqrstuvwxyz'), '$keyword') and not(ancestor::h1) and not(ancestor::h2) and not(ancestor::h3) and not(ancestor::h4) and not(ancestor::h5) and not(ancestor::h6) and not(ancestor::b) and not(ancestor::strong)]");
    if ($nodes && $nodes->length) {
        $node = $nodes->item(0);
        // Split just before the keyword
        $keynode = $node->splitText(strpos($node->textContent, $keyword));
        // Split after the keyword
        $node->nextSibling->splitText(strlen($keyword));
        // Replace keyword with <b>keyword</b>
        $replacement = $d->createElement('strong', $keynode->textContent);
        $keynode->parentNode->replaceChild($replacement, $keynode);
    }
$postarray['post_content'] = $d->saveXML($d->getElementsByTagName('p')->item(0));
//  $postarray['post_content'] = $d->saveXML($d->getElementsByTagName('body')->item(1));
//  $postarray['post_content'] = $d->saveXML($d->getElementsByTagName('body')->childNodes);
return $postarray;
}

推荐答案

所有这些答案现在都错误,因为从 PHP 5.4 和 Libxml 2.6 loadHTML 现在有一个 $option 参数,它指示 Libxml它应该如何解析内容.

All of these answers are now wrong, because as of PHP 5.4 and Libxml 2.6 loadHTML now has a $option parameter which instructs Libxml about how it should parse the content.

因此,如果我们使用这些选项加载 HTML

Therefore, if we load the HTML with these options

$html->loadHTML($content, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

在执行 saveHTML() 时,将没有 doctype.

when doing saveHTML() there will be no doctype, no <html>, and no <body>.

LIBXML_HTML_NOIMPLIED 关闭自动添加隐含的 html/body 元素LIBXML_HTML_NODEFDTD 防止在找不到默认文档类型时添加.

LIBXML_HTML_NOIMPLIED turns off the automatic adding of implied html/body elements LIBXML_HTML_NODEFDTD prevents a default doctype being added when one is not found.

关于 Libxml 参数的完整文档在这里

Full documentation about Libxml parameters is here

(请注意,loadHTML 文档说需要 Libxml 2.6,但 LIBXML_HTML_NODEFDTD 仅在 Libxml 2.7.8 中可用,并且 LIBXML_HTML_NOIMPLIED 可用在 Libxml 2.7.7)

(Note that loadHTML docs say that Libxml 2.6 is needed, but LIBXML_HTML_NODEFDTD is only available in Libxml 2.7.8 and LIBXML_HTML_NOIMPLIED is available in Libxml 2.7.7)

这篇关于如何在没有 HTML 包装器的情况下保存 DOMDocument 的 HTML?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆