将HTML转换为XML [英] Converting HTML to XML
问题描述
我得到了需要用XML传播的HTML文件。我们正在使用这些HTML为应用程序提供内容,但现在我们必须将这些内容作为XML提供。
HTML文件包含表格,div,图片,p,b或强标签等。
我搜索了一下,发现了一些应用程序,但是我还没有完成。
您能否提供一种将这些文件内容转换为XML的方法?
我成功地使用了 tidy
命令行实用程序。在linux上,我用 apt-get install tidy
快速安装了它。然后命令:
$ b $ p $ t $ c $ tidy -q -asxml --numeric-entities yes source.html> file.xml
给了一个xml文件,我可以用xslt处理器处理这个文件。不过,我需要正确设置xhtml1 dtds。
这是他们的主页: html-tidy.org (以及旧版: HTML Tidy )
I have got hundereds of HTML files that need to be conveted in XML. We are using these HTML to serve contents for applications but now we have to serve these contents as XML.
HTML files are contains, tables, div's, image's, p's, b or strong tags, etc..
I googled and found some applications but i couldn't achive yet.
Could you suggest a way to convert these file contents to XML?
I was successful using tidy
command line utility. On linux I installed it quickly with apt-get install tidy
. Then the command:
tidy -q -asxml --numeric-entities yes source.html >file.xml
gave an xml file, which I was able to process with xslt processor. However I needed to set up xhtml1 dtds correctly.
This is their homepage: html-tidy.org (and the legacy one: HTML Tidy)
这篇关于将HTML转换为XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!