使用PHP DOMDocument区分XHTML和HTML [英] Differentiating between XHTML and HTML with PHP DOMDocument

查看:138
本文介绍了使用PHP DOMDocument区分XHTML和HTML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想用PHP DOM实现来操纵HTML和XHTML文档。我使用DOMDocument-> loadHTML()方法加载内容。



想知道加载的内容是XHTML还是HTML。 DOMDocument具有一个doctype对象,该对象包含文档本身的DOCTYPE声明。到目前为止,我想到了比较$ dom-> doctype-> publicId,其中包含诸如 - // W3C // DTD HTML 4.01 // ENtext / html之类的字符串



是有没有更好的方法可以想到?



编辑:



对不起,我的问题有点不清楚。我更新了这个问题,因为它可能会令人困惑。但是现在要说明一点:这个问题不是一般来说用PHP DOM来处理HTML,还是XHTML是好还是坏。

解决方案

p>如果您从外部来源加载,您可以检查文件的MIME类型,并查看它是否为 application / xhtml + xml ;如果是的话,这绝对是XHTML(当然这可以说谎和服务于那种类型,但是格式错误的标记)。否则,如果它是 text / html ,那么它将被解析为HTML标签汤。实际标记的有效性,doctype声明是您的下一个最好的方式来告知内容是(或声称是)HTML或XHTML。



像你说的,你可以检查公共标识符和/或URI,并从那里确定类型。


I want to manipulate HTML and XHTML documents with the PHP DOM implementation. I use the DOMDocument->loadHTML() method to load the content.

In want to know if the loaded content is either XHTML or HTML. DOMDocument has a doctype object which contains the DOCTYPE declaration from the document itself. So far I thought about comparing $dom->doctype->publicId which contains strings like "-//W3C//DTD HTML 4.01//ENtext/html"

Is there any better way anyone can think of?

Edit:

Sorry if my question was a bit unclear. I updated the question since it might have been confusing. But to make it clear now: This question is not about handling HTML with PHP DOM in general or whether XHTML is good or bad.

解决方案

If you're loading from an external source, you can check the file's MIME type and see if it's application/xhtml+xml; if it is, it's most definitely XHTML (of course it can lie and serve with that type, but with horribly malformed markup). Otherwise if it's text/html then it'll be parsed as HTML tag soup. Validity of the actual markup aside, the doctype declaration is your next best way of telling whether the content is (or claims to be) HTML or XHTML.

Like you say, you can check the public identifier and/or the URI and determine the type from there.

这篇关于使用PHP DOMDocument区分XHTML和HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆