XML,HTML和XHTML文档的有效内容类型 [英] Valid content-type for XML, HTML and XHTML documents

查看:137
本文介绍了XML,HTML和XHTML文档的有效内容类型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

XML,HTML和XHTML文档的正确内容类型是什么?



我需要编写一个简单的抓取工具来抓取这些文件。



时下 http://example.net/index.html可以为mod_rewrite提供一个JPEG文件,所以我需要检查响应头中的内容类型并将其与允许的内容类型列表进行比较。


解决方案

HTML: text / html ,full-stop。

XHTML: application / xhtml + xml 兼容性准则, text / html 。请参阅W3 媒体类型说明



XML: text / xml application / xml RFC 2376 )。

还有许多其他媒体类型围绕XML,例如 application / rss + xml image / svg + xml 。可以肯定的是,任何无法识别但在 + xml 中注册的结尾都是基于XML的。有关以 + xml结尾的注册媒体类型,请参阅 IANA列表



(对于未注册的 x - 类型,所有投注都关闭,但是您'希望 + xml 会受到尊重。)


What are the correct content-types for XML, HTML and XHTML documents?

I need to write a simple crawler that only fetches these kinds of files.

Nowadays http://example.net/index.html can serve for example a JPEG file due to mod_rewrite, so I need to check the content-type from response header and compare it with a list of allowed content-types.

Where can I get such a list from?

解决方案

HTML: text/html, full-stop.

XHTML: application/xhtml+xml, or only if following HTML compatbility guidelines, text/html. See the W3 Media Types Note.

XML: text/xml, application/xml (RFC 2376).

There are also many other media types based around XML, for example application/rss+xml or image/svg+xml. It's a safe bet that any unrecognised but registered ending in +xml is XML-based. See the IANA list for registered media types ending in +xml.

(For unregistered x- types, all bets are off, but you'd hope +xml would be respected.)

这篇关于XML,HTML和XHTML文档的有效内容类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆